From 1d2becc9af591d37badfe0e77751bbb80932472f Mon Sep 17 00:00:00 2001 From: Calvin Date: Tue, 14 May 2013 21:12:46 -0400 Subject: updated docs --- doc/cli.markdown | 61 ++++++++++++++++------------------------------------ doc/index.markdown | 9 ++++---- doc/install.markdown | 35 +++++++++++++++++++++++++++--- doc/python.markdown | 22 +++++++++++++++++++ 4 files changed, 77 insertions(+), 50 deletions(-) create mode 100644 doc/python.markdown (limited to 'doc') diff --git a/doc/cli.markdown b/doc/cli.markdown index 843bae9..214c3b7 100644 --- a/doc/cli.markdown +++ b/doc/cli.markdown @@ -1,39 +1,36 @@ # Quikr Command Line Utilities # - Quikr has three command-line utilities that mirror the behavior of the python module and the matlab implementation. The advantage of this is ease of scripting -and job management. These utilities are written in python and wrap the quikr -module. +and job management, as well as faster processing and lower memory usage. These +utilities are written in C and utilize OpenMP for multithreading. ## Quikr\_train ## - The quikr\_train is a tool to train a database for use with the quikr tool. -Before running the quikr utility, you need to generate the sensing matrix or +Before running the quikr utility, you need to generate the sensing matqrix or download a pretrained matrix from our database\_download.html. ### Usage ### -quikr\_train returns a custom trained matrix that can be used with the quikr -function. You must supply a kmer. +quikr\_train returns a custom sensing matrix that can be used with the quikr +function. quikr\_train's arguments: -i, --input, the database of sequences (fasta format) - -o, --output, the trained matrix (text file) - -k, --kmer, the kmer size, the default is 6 (integer) - -z, --compress compress the output matrix with gzip (flag) + -o, --output, the sensing matrix (text file) + -k, --kmer, specifiy wha size of kmer to use. (default value is 6) + -v, --verbose, verbose mode. ### Example ### Here is an example on how to train a database. This uses the -z flag to compress the output matrix since it can be very large. Because of the sparse nature of the database, the matrix easily achieves a high compression ratio, even with -gzip. It takes the gg94\_database.fasta as an input and outputs the trained -matrix as gg94\_trained\_databse.npy.gz +gzip. It takes the gg94\_database.fasta as an input and outputs the sensing +matrix as gg94\_sensing\_databse.npy.gz - quikr_train -i gg94_database.fasta -o gg94_trained_database.npy.gz -k 6 -z + quikr_train -i gg94_database.fasta -o gg94_sensing_database.matrix.gz -k 6 ## Quikr ## Quikr returns the estimated frequencies of batcteria present when given a -input FASTA file. A default trained matrix will be used if none is supplied -You must supply a kmer and default lambda if using a custom trained matrix. +input FASTA file. You need to train a matrix or download a new matrix ### Usage ### quikr returns the solution vector as a csv file. @@ -42,8 +39,8 @@ quikr's arguments: -f, --fasta, the sample's fasta file of NGS READS -o, --output OTU\_FRACTION\_PRESENT, a vector representing the percentage of database sequence's presence in sample (csv output) - -t, --trained-matrix, the trained matrix - -l, --lamb, the lambda size. (the default lambda value is 10,000) + -s, --sensing-matrix the sensing matrix. (generated by quikr\_train) + -l, --lambda, the lambda size. (the default lambda value is 10,000) -k, --kmer, this specifies the size of the kmer to use (default is 6) ## Multifasta\_to\_otu ## @@ -66,14 +63,14 @@ with aspecified number of jobs. Otherwise python with run one job per cpu core. ### Usage ### multifasta\_to\_otu's arguments: - -i, --input-directory, the directory containing the samples' fasta files of + -i, --input, the directory containing the samples' fasta files of reads (note each fasta file should correspond to a separate sample) -o, --otu-table, the OTU table, with OTU\_FRACTION\_PRESENT for each sample, which is compatible with QIIME's convert\_biom.py (or sequence table if not OTU's) - -t, --trained-matrix, the trained matrix - -f, --trained-fasta, the fasta file database of sequences - -l, --lamb, specify what size of lambda to use (the default value is 10,000) + -s, --sensing-matrix, the sensing matrix + -f, --sensing-fasta, the fasta file database of sequences + -l, --lambda, specify what size of lambda to use (the default value is 10,000) -k, --kmer, specify what size of kmer to use, (default value is 6) -j, --jobs, specifies how many jobs to run at once, (default=number of CPUs) @@ -98,12 +95,6 @@ The QIIME procedue: principal_coordinates.py -i beta_div/weighted_unifrac_.txt -o _weighted.txt make_3d_plots.py -i _weighted.txt -o <3d_pcoa_plotdirectory> -m - -# Python Quikr Troubleshooting # - -If you are having trouble, and these solutions don't work. Please contact the -developers with questions and issues. - #### Broken Pipe Errors #### Make sure that you have the count-kmers and probablilties-by-read in your $PATH, and that they are executable. @@ -111,19 +102,5 @@ $PATH, and that they are executable. If you have not installed quikr system-wide, you'll need to add the folder location of these binaries in the terminal before running the command: + mv /path/to/quikr/src/nbc/count /path/to/quikr/src/nbc/count-kmers PATH = $PATH:/path/to/quikr/src/nbc/ - -Make sure that the binaries are executable by running: - - chmod +x probabilities-by-read - chmod +x count-kmers - -#### Python Cannot Find XYZ #### - -Ensure that you have Python 2.7, Scipy, Numpy, and BIOpython installed -and that python is setup correctly. You should be able to do this from a python -prompt without any errors: - >>> import numpy - >>> import scipy - >>> from Bio import SeqIO - diff --git a/doc/index.markdown b/doc/index.markdown index 93688bc..03672cd 100644 --- a/doc/index.markdown +++ b/doc/index.markdown @@ -13,17 +13,16 @@ accurate down to the genus level. ## How Do I Install Quikr ## Please read the directions on the [installation page](install.html). - ## How Do I use Quikr ## -We have several ways to use quikr. There is a python module, command line -scripts, and matlab scripts. +We have several ways to use quikr. Quikr is first and formost a command +line utility, but we also provide python and matlab scripts. ++ [Command Line Utilities](cli.html) + [Matlab documentation](matlab.html) + [Python documentation](python.html) -+ [Command Line Utilities](cli.html) ## Contact ## -For issues with the python implementation, contact gailro@gmail.com +For issues with the quikr software, contact gailro@gmail.com ## Contributors ## + David Koslicki diff --git a/doc/install.markdown b/doc/install.markdown index cbbbe21..82004aa 100644 --- a/doc/install.markdown +++ b/doc/install.markdown @@ -1,14 +1,32 @@ # How Can I Install The Quikr Utility? # To use Quikr there are several prerequisites. -Base Requirements: -+ Mac OS X or GNU/Linux or Unix-based operating system+ -+ Python 2.7, Scipy, Numpy, BIOPython modules +## Requirements ## ++ Mac OS X 10.6.8 or GNU/Linux + 4Gb of RAM minimum. Absolutely neccessary. ++ gcc that supports OpenMP + +### Python Requirements ### ++ Python 2.7 ++ Scipy ++ Numpy ++ BioPython + +### Mac Requirements ### ++ Mac OS X 10.6.8 (what we have tested) ++ GCC 4.7 or newer. (gcc 4.2 did not work, and is the default installation) ++ OCaml compiler mlton ++ OpenMP libraries (libgomp, usually comes with gcc) + +### Linux Requirements ### ++ GCC 4.7 or newer ++ OCaml compiler mlton ++ OpenMP libraries (libgomp, usually comes with gcc) We also have a Quikr implementation in Matlab so that you can easily integrate Quikr into your custom programs and scripts. +### Installation ### Our Quikr code is available on our sourceforge download page: [http://sourceforge.net/projects/quikr/](sourceforge project page) @@ -16,3 +34,14 @@ Our Quikr code is available on our sourceforge download page: Our development GIT repository is available here: [http://rosalind.ece.drexel.edu/git/quikr/](rosalind.ece.drexel.edu/git/quikr/) + +To install quikr, download our project and in the folder run: + + make + sudo make install + +This will install the quikr, quikr\_train and multifasta\_to\_otu utilities. +To install the python scripts and module systemwide, run + + make python + sudo make install_python diff --git a/doc/python.markdown b/doc/python.markdown new file mode 100644 index 0000000..df086d8 --- /dev/null +++ b/doc/python.markdown @@ -0,0 +1,22 @@ +# Python Documentation # +The python version comes with scripts that can be used like the regular quikr +program, and also a module called quikr so integration with python scripts +is easier. + +If you are switching to use the python scripts instead of the regular, you +will need to regenerate your trained databases with the python version of +quikr\_train. + +## Function documentation ## +If you want to use our quikr module, run help on the module: + + >>> import quikr + >>> help(quikr) +## Python Cannot Find XYZ ## + +Ensure that you have Python 2.7, Scipy, Numpy, and BIOpython installed +and that python is setup correctly. You should be able to do this from a python +prompt without any errors: + >>> import numpy + >>> import scipy + >>> from Bio import SeqIO -- cgit v1.2.3