diff options
| -rw-r--r-- | doc/cli.markdown | 63 | 
1 files changed, 50 insertions, 13 deletions
| diff --git a/doc/cli.markdown b/doc/cli.markdown index b065240..87df41b 100644 --- a/doc/cli.markdown +++ b/doc/cli.markdown @@ -18,13 +18,15 @@ function. You must supply a kmer.  quikr\_train's arguments:    -i, --input, the database of sequences (fasta format)    -o, --output, the trained matrix (text file) -  -k, --kmer, the kmer size (integer) +  -k, --kmer, the kmer size, the default is 6 (integer)    -z, --compress  compress the output matrix with gzip (flag)  ### Example ###  Here is an example on how to train a database. This uses the -z flag to compress -the output matrix since it can be very large. It takes the gg94\_database.fasta -as an input and outputs the trained matrix as gg94\_trained\_databse.npy.gz +the output matrix since it can be very large. Because of the sparse nature of +the database, the matrix easily achieves a high compression ratio, even with +gzip. It takes the gg94\_database.fasta as an input and outputs the trained  +matrix as gg94\_trained\_databse.npy.gz      quikr_train -i gg94_database.fasta -o gg94_trained_database.npy.gz -k 6 -z  @@ -37,11 +39,12 @@ You must supply a kmer and default lambda if using a custom trained matrix.  quikr returns the solution vector as a csv file.  quikr's arguments: -  -f, --fasta, the fasta file sample -  -o, --output OUTPUT, the output path (csv output) +  -f, --fasta, the sample's fasta file of NGS READS +  -o, --output OTU\_FRACTION\_PRESENT, a vector representing the percentage of +  database sequence's presence in sample (csv output)    -t, --trained-matrix, the trained matrix    -l, --lamb, the lambda size. (the default lambda value is 10,000) -  -k, --kmer, this specifies which kmer to use (default is 6) +  -k, --kmer, this specifies the size of the kmer to use (default is 6)  ## Multifasta\_to\_otu ##  The Multifasta\_to\_otu tool is a handy wrapper for quikr which lets the user @@ -52,18 +55,52 @@ Warning: this program will use a large amount of memory, and CPU time. You can  reduce the number of cores used, and thus memory, by specifying the -j flag  with aspecified number of jobs. Otherwise python with run one job per cpu core. +# Pre-processing of Multifasta\_to\_otu  # + +* Please name fasta files of sample reads with <sample id>.fa<*> and place them +  into one directory without any other file in that directory (for example, no +  hidden files that the operating system may generate, are allowed in that +  directory) +* Fasta files of reads must have a suffix that starts with .fa (e.g.: .fasta and +  .fa are valid while .fna is NOT) +  ### Usage ###  multifasta\_to\_otu's arguments: -  -i, --input-directory, the directory containing fasta files -  -o, --otu-table, the output OTU table -  -t, --trained-matrix, the trained database to use -  -f, --trained-fasta, the fasta file used to train your matrix +  -i, --input-directory, the directory containing the samples' fasta files of +  reads (note each fasta file should correspond to a separate sample) +  -o, --otu-table, the OTU table, with OTU\_FRACTION\_PRESENT for each sample, +  which is compatible with QIIME's convert\_biom.py (or sequence table if not +  OTU's) +  -t, --trained-matrix, the trained matrix +  -f, --trained-fasta, the fasta file database of sequences    -d, --output-directory, quikr output directory -  -l, --lamb, specify what lambda to use (the default value is 10,000) -  -k, --kmer, specify which kmer to use, (default value is 6) +  -l, --lamb, specify what size of lambda to use (the default value is 10,000) +  -k, --kmer, specify what size of kmer to use, (default value is 6)    -j, --jobs, specifies how many jobs to run at once, (default=number of CPUs) -# Troubleshooting # +# Post-processing of Multifasta\_to\_otu  # + +* Note: When making your QIIME Metadata file, the sample id's must match the +  sample fasta file prefix names + +4-step QIIME procedure after using Quikr to obtain 3D PCoA graphs: +(Note: Our code works much better with WEIGHTED Unifrac as opposed to +Unweighted.) + +Pre-requisites: +1. <quikr_otu_table.txt> +2. the tree of the database sequences that were used (e.g.  dp7\_mafft.fasttree, +   gg\_94\_otus\_4feb2011.tre, etc.) +3. your-defined <qiime_metadata_file.txt> + +The QIIME procedue: +    convert_biom.py -i <quikr_otu_table.txt> -o <quikr_otu>.biom --biom_table_type="otu table" +    beta_diversity.py -i <quikr_otu>.biom -m weighted_unifrac -o beta_div -t <tree file> (example: rdp7_mafft.fasttree)> +    principal_coordinates.py -i beta_div/weighted_unifrac_<quikr_otu>.txt -o <quikr_otu_project_name>_weighted.txt +    make_3d_plots.py -i <quikr_otu_project_name>_weighted.txt -o <3d_pcoa_plotdirectory> -m <qiime_metadata_file> + + +# Python Quikr Troubleshooting #  If you are having trouble, and these solutions don't work. Please contact the  developers with questions and issues. | 
