diff options
Diffstat (limited to 'doc/cli.markdown')
-rw-r--r-- | doc/cli.markdown | 61 |
1 files changed, 19 insertions, 42 deletions
diff --git a/doc/cli.markdown b/doc/cli.markdown index 843bae9..214c3b7 100644 --- a/doc/cli.markdown +++ b/doc/cli.markdown @@ -1,39 +1,36 @@ # Quikr Command Line Utilities # - Quikr has three command-line utilities that mirror the behavior of the python module and the matlab implementation. The advantage of this is ease of scripting -and job management. These utilities are written in python and wrap the quikr -module. +and job management, as well as faster processing and lower memory usage. These +utilities are written in C and utilize OpenMP for multithreading. ## Quikr\_train ## - The quikr\_train is a tool to train a database for use with the quikr tool. -Before running the quikr utility, you need to generate the sensing matrix or +Before running the quikr utility, you need to generate the sensing matqrix or download a pretrained matrix from our database\_download.html. ### Usage ### -quikr\_train returns a custom trained matrix that can be used with the quikr -function. You must supply a kmer. +quikr\_train returns a custom sensing matrix that can be used with the quikr +function. quikr\_train's arguments: -i, --input, the database of sequences (fasta format) - -o, --output, the trained matrix (text file) - -k, --kmer, the kmer size, the default is 6 (integer) - -z, --compress compress the output matrix with gzip (flag) + -o, --output, the sensing matrix (text file) + -k, --kmer, specifiy wha size of kmer to use. (default value is 6) + -v, --verbose, verbose mode. ### Example ### Here is an example on how to train a database. This uses the -z flag to compress the output matrix since it can be very large. Because of the sparse nature of the database, the matrix easily achieves a high compression ratio, even with -gzip. It takes the gg94\_database.fasta as an input and outputs the trained -matrix as gg94\_trained\_databse.npy.gz +gzip. It takes the gg94\_database.fasta as an input and outputs the sensing +matrix as gg94\_sensing\_databse.npy.gz - quikr_train -i gg94_database.fasta -o gg94_trained_database.npy.gz -k 6 -z + quikr_train -i gg94_database.fasta -o gg94_sensing_database.matrix.gz -k 6 ## Quikr ## Quikr returns the estimated frequencies of batcteria present when given a -input FASTA file. A default trained matrix will be used if none is supplied -You must supply a kmer and default lambda if using a custom trained matrix. +input FASTA file. You need to train a matrix or download a new matrix ### Usage ### quikr returns the solution vector as a csv file. @@ -42,8 +39,8 @@ quikr's arguments: -f, --fasta, the sample's fasta file of NGS READS -o, --output OTU\_FRACTION\_PRESENT, a vector representing the percentage of database sequence's presence in sample (csv output) - -t, --trained-matrix, the trained matrix - -l, --lamb, the lambda size. (the default lambda value is 10,000) + -s, --sensing-matrix the sensing matrix. (generated by quikr\_train) + -l, --lambda, the lambda size. (the default lambda value is 10,000) -k, --kmer, this specifies the size of the kmer to use (default is 6) ## Multifasta\_to\_otu ## @@ -66,14 +63,14 @@ with aspecified number of jobs. Otherwise python with run one job per cpu core. ### Usage ### multifasta\_to\_otu's arguments: - -i, --input-directory, the directory containing the samples' fasta files of + -i, --input, the directory containing the samples' fasta files of reads (note each fasta file should correspond to a separate sample) -o, --otu-table, the OTU table, with OTU\_FRACTION\_PRESENT for each sample, which is compatible with QIIME's convert\_biom.py (or sequence table if not OTU's) - -t, --trained-matrix, the trained matrix - -f, --trained-fasta, the fasta file database of sequences - -l, --lamb, specify what size of lambda to use (the default value is 10,000) + -s, --sensing-matrix, the sensing matrix + -f, --sensing-fasta, the fasta file database of sequences + -l, --lambda, specify what size of lambda to use (the default value is 10,000) -k, --kmer, specify what size of kmer to use, (default value is 6) -j, --jobs, specifies how many jobs to run at once, (default=number of CPUs) @@ -98,12 +95,6 @@ The QIIME procedue: principal_coordinates.py -i beta_div/weighted_unifrac_<quikr_otu>.txt -o <quikr_otu_project_name>_weighted.txt make_3d_plots.py -i <quikr_otu_project_name>_weighted.txt -o <3d_pcoa_plotdirectory> -m <qiime_metadata_file> - -# Python Quikr Troubleshooting # - -If you are having trouble, and these solutions don't work. Please contact the -developers with questions and issues. - #### Broken Pipe Errors #### Make sure that you have the count-kmers and probablilties-by-read in your $PATH, and that they are executable. @@ -111,19 +102,5 @@ $PATH, and that they are executable. If you have not installed quikr system-wide, you'll need to add the folder location of these binaries in the terminal before running the command: + mv /path/to/quikr/src/nbc/count /path/to/quikr/src/nbc/count-kmers PATH = $PATH:/path/to/quikr/src/nbc/ - -Make sure that the binaries are executable by running: - - chmod +x probabilities-by-read - chmod +x count-kmers - -#### Python Cannot Find XYZ #### - -Ensure that you have Python 2.7, Scipy, Numpy, and BIOpython installed -and that python is setup correctly. You should be able to do this from a python -prompt without any errors: - >>> import numpy - >>> import scipy - >>> from Bio import SeqIO - |