From 6208842318a6009308bd9abed921c6158b39f1f0 Mon Sep 17 00:00:00 2001 From: Calvin Date: Fri, 3 May 2013 12:25:15 -0400 Subject: added updated cli documentation --- doc/cli.markdown | 63 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 50 insertions(+), 13 deletions(-) (limited to 'doc') diff --git a/doc/cli.markdown b/doc/cli.markdown index b065240..87df41b 100644 --- a/doc/cli.markdown +++ b/doc/cli.markdown @@ -18,13 +18,15 @@ function. You must supply a kmer. quikr\_train's arguments: -i, --input, the database of sequences (fasta format) -o, --output, the trained matrix (text file) - -k, --kmer, the kmer size (integer) + -k, --kmer, the kmer size, the default is 6 (integer) -z, --compress compress the output matrix with gzip (flag) ### Example ### Here is an example on how to train a database. This uses the -z flag to compress -the output matrix since it can be very large. It takes the gg94\_database.fasta -as an input and outputs the trained matrix as gg94\_trained\_databse.npy.gz +the output matrix since it can be very large. Because of the sparse nature of +the database, the matrix easily achieves a high compression ratio, even with +gzip. It takes the gg94\_database.fasta as an input and outputs the trained +matrix as gg94\_trained\_databse.npy.gz quikr_train -i gg94_database.fasta -o gg94_trained_database.npy.gz -k 6 -z @@ -37,11 +39,12 @@ You must supply a kmer and default lambda if using a custom trained matrix. quikr returns the solution vector as a csv file. quikr's arguments: - -f, --fasta, the fasta file sample - -o, --output OUTPUT, the output path (csv output) + -f, --fasta, the sample's fasta file of NGS READS + -o, --output OTU\_FRACTION\_PRESENT, a vector representing the percentage of + database sequence's presence in sample (csv output) -t, --trained-matrix, the trained matrix -l, --lamb, the lambda size. (the default lambda value is 10,000) - -k, --kmer, this specifies which kmer to use (default is 6) + -k, --kmer, this specifies the size of the kmer to use (default is 6) ## Multifasta\_to\_otu ## The Multifasta\_to\_otu tool is a handy wrapper for quikr which lets the user @@ -52,18 +55,52 @@ Warning: this program will use a large amount of memory, and CPU time. You can reduce the number of cores used, and thus memory, by specifying the -j flag with aspecified number of jobs. Otherwise python with run one job per cpu core. +# Pre-processing of Multifasta\_to\_otu # + +* Please name fasta files of sample reads with .fa<*> and place them + into one directory without any other file in that directory (for example, no + hidden files that the operating system may generate, are allowed in that + directory) +* Fasta files of reads must have a suffix that starts with .fa (e.g.: .fasta and + .fa are valid while .fna is NOT) + ### Usage ### multifasta\_to\_otu's arguments: - -i, --input-directory, the directory containing fasta files - -o, --otu-table, the output OTU table - -t, --trained-matrix, the trained database to use - -f, --trained-fasta, the fasta file used to train your matrix + -i, --input-directory, the directory containing the samples' fasta files of + reads (note each fasta file should correspond to a separate sample) + -o, --otu-table, the OTU table, with OTU\_FRACTION\_PRESENT for each sample, + which is compatible with QIIME's convert\_biom.py (or sequence table if not + OTU's) + -t, --trained-matrix, the trained matrix + -f, --trained-fasta, the fasta file database of sequences -d, --output-directory, quikr output directory - -l, --lamb, specify what lambda to use (the default value is 10,000) - -k, --kmer, specify which kmer to use, (default value is 6) + -l, --lamb, specify what size of lambda to use (the default value is 10,000) + -k, --kmer, specify what size of kmer to use, (default value is 6) -j, --jobs, specifies how many jobs to run at once, (default=number of CPUs) -# Troubleshooting # +# Post-processing of Multifasta\_to\_otu # + +* Note: When making your QIIME Metadata file, the sample id's must match the + sample fasta file prefix names + +4-step QIIME procedure after using Quikr to obtain 3D PCoA graphs: +(Note: Our code works much better with WEIGHTED Unifrac as opposed to +Unweighted.) + +Pre-requisites: +1. +2. the tree of the database sequences that were used (e.g. dp7\_mafft.fasttree, + gg\_94\_otus\_4feb2011.tre, etc.) +3. your-defined + +The QIIME procedue: + convert_biom.py -i -o .biom --biom_table_type="otu table" + beta_diversity.py -i .biom -m weighted_unifrac -o beta_div -t (example: rdp7_mafft.fasttree)> + principal_coordinates.py -i beta_div/weighted_unifrac_.txt -o _weighted.txt + make_3d_plots.py -i _weighted.txt -o <3d_pcoa_plotdirectory> -m + + +# Python Quikr Troubleshooting # If you are having trouble, and these solutions don't work. Please contact the developers with questions and issues. -- cgit v1.2.3