From 6208842318a6009308bd9abed921c6158b39f1f0 Mon Sep 17 00:00:00 2001
From: Calvin <calvin@EESI>
Date: Fri, 3 May 2013 12:25:15 -0400
Subject: added updated cli documentation

---
 doc/cli.markdown | 63 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 50 insertions(+), 13 deletions(-)

diff --git a/doc/cli.markdown b/doc/cli.markdown
index b065240..87df41b 100644
--- a/doc/cli.markdown
+++ b/doc/cli.markdown
@@ -18,13 +18,15 @@ function. You must supply a kmer.
 quikr\_train's arguments:
   -i, --input, the database of sequences (fasta format)
   -o, --output, the trained matrix (text file)
-  -k, --kmer, the kmer size (integer)
+  -k, --kmer, the kmer size, the default is 6 (integer)
   -z, --compress  compress the output matrix with gzip (flag)
 
 ### Example ###
 Here is an example on how to train a database. This uses the -z flag to compress
-the output matrix since it can be very large. It takes the gg94\_database.fasta
-as an input and outputs the trained matrix as gg94\_trained\_databse.npy.gz
+the output matrix since it can be very large. Because of the sparse nature of
+the database, the matrix easily achieves a high compression ratio, even with
+gzip. It takes the gg94\_database.fasta as an input and outputs the trained 
+matrix as gg94\_trained\_databse.npy.gz
 
     quikr_train -i gg94_database.fasta -o gg94_trained_database.npy.gz -k 6 -z 
 
@@ -37,11 +39,12 @@ You must supply a kmer and default lambda if using a custom trained matrix.
 quikr returns the solution vector as a csv file.
 
 quikr's arguments:
-  -f, --fasta, the fasta file sample
-  -o, --output OUTPUT, the output path (csv output)
+  -f, --fasta, the sample's fasta file of NGS READS
+  -o, --output OTU\_FRACTION\_PRESENT, a vector representing the percentage of
+  database sequence's presence in sample (csv output)
   -t, --trained-matrix, the trained matrix
   -l, --lamb, the lambda size. (the default lambda value is 10,000)
-  -k, --kmer, this specifies which kmer to use (default is 6)
+  -k, --kmer, this specifies the size of the kmer to use (default is 6)
 
 ## Multifasta\_to\_otu ##
 The Multifasta\_to\_otu tool is a handy wrapper for quikr which lets the user
@@ -52,18 +55,52 @@ Warning: this program will use a large amount of memory, and CPU time. You can
 reduce the number of cores used, and thus memory, by specifying the -j flag
 with aspecified number of jobs. Otherwise python with run one job per cpu core.
 
+# Pre-processing of Multifasta\_to\_otu  #
+
+* Please name fasta files of sample reads with <sample id>.fa<*> and place them
+  into one directory without any other file in that directory (for example, no
+  hidden files that the operating system may generate, are allowed in that
+  directory)
+* Fasta files of reads must have a suffix that starts with .fa (e.g.: .fasta and
+  .fa are valid while .fna is NOT)
+
 ### Usage ###
 multifasta\_to\_otu's arguments:
-  -i, --input-directory, the directory containing fasta files
-  -o, --otu-table, the output OTU table
-  -t, --trained-matrix, the trained database to use
-  -f, --trained-fasta, the fasta file used to train your matrix
+  -i, --input-directory, the directory containing the samples' fasta files of
+  reads (note each fasta file should correspond to a separate sample)
+  -o, --otu-table, the OTU table, with OTU\_FRACTION\_PRESENT for each sample,
+  which is compatible with QIIME's convert\_biom.py (or sequence table if not
+  OTU's)
+  -t, --trained-matrix, the trained matrix
+  -f, --trained-fasta, the fasta file database of sequences
   -d, --output-directory, quikr output directory
-  -l, --lamb, specify what lambda to use (the default value is 10,000)
-  -k, --kmer, specify which kmer to use, (default value is 6)
+  -l, --lamb, specify what size of lambda to use (the default value is 10,000)
+  -k, --kmer, specify what size of kmer to use, (default value is 6)
   -j, --jobs, specifies how many jobs to run at once, (default=number of CPUs)
 
-# Troubleshooting #
+# Post-processing of Multifasta\_to\_otu  #
+
+* Note: When making your QIIME Metadata file, the sample id's must match the
+  sample fasta file prefix names
+
+4-step QIIME procedure after using Quikr to obtain 3D PCoA graphs:
+(Note: Our code works much better with WEIGHTED Unifrac as opposed to
+Unweighted.)
+
+Pre-requisites:
+1. <quikr_otu_table.txt>
+2. the tree of the database sequences that were used (e.g.  dp7\_mafft.fasttree,
+   gg\_94\_otus\_4feb2011.tre, etc.)
+3. your-defined <qiime_metadata_file.txt>
+
+The QIIME procedue:
+    convert_biom.py -i <quikr_otu_table.txt> -o <quikr_otu>.biom --biom_table_type="otu table"
+    beta_diversity.py -i <quikr_otu>.biom -m weighted_unifrac -o beta_div -t <tree file> (example: rdp7_mafft.fasttree)>
+    principal_coordinates.py -i beta_div/weighted_unifrac_<quikr_otu>.txt -o <quikr_otu_project_name>_weighted.txt
+    make_3d_plots.py -i <quikr_otu_project_name>_weighted.txt -o <3d_pcoa_plotdirectory> -m <qiime_metadata_file>
+
+
+# Python Quikr Troubleshooting #
 
 If you are having trouble, and these solutions don't work. Please contact the
 developers with questions and issues.
-- 
cgit v1.2.3