summaryrefslogtreecommitdiff
path: root/doc/cli.markdown
blob: 3d7a6367eeb1cc5e05605bf00d2c525816d4b43f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# Quikr Command Line Utilities #
Quikr has three command-line utilities that mirror the behavior of the python
module and the matlab implementation. The advantage of this is ease of scripting
and job management, as well as faster processing and lower memory usage. These 
utilities are written in C and utilize OpenMP for multithreading.

## Quikr\_train ##
The quikr\_train is a tool to train a database for use with the quikr tool.
Before running the quikr utility, you need to generate the sensing matrix.

### Usage ###
quikr\_train returns a custom sensing matrix that can be used with the quikr
function.

quikr\_train's arguments:
  -i, --input, the database of sequences (fasta format)
  -o, --output, the sensing matrix (text file)
  -k, --kmer, specifiy wha size of kmer to use. (default value is 6)
  -v, --verbose, verbose mode.

### Example ###
Here is an example on how to train a database. This uses the -z flag to compress
the output matrix since it can be very large. Because of the sparse nature of
the database, the matrix easily achieves a high compression ratio, even with
gzip. It takes the gg94\_database.fasta as an input and outputs the sensing 
matrix as gg94\_sensing\_databse.npy.gz

    quikr_train -i gg94_database.fasta -o gg94_sensing_database.matrix.gz -k 6

## Quikr ##
Quikr returns the estimated frequencies of batcteria present when given a
input FASTA file. You need to train a matrix or download a new matrix 

### Usage ###
quikr returns the solution vector as a csv file.

quikr's arguments:
  -f, --fasta, the sample's fasta file of NGS READS
  -o, --output OTU\_FRACTION\_PRESENT, a vector representing the percentage of
  database sequence's presence in sample (csv output)
  -s, --sensing-matrix the sensing matrix. (generated by quikr\_train)
  -l, --lambda, the lambda size. (the default lambda value is 10,000)
  -k, --kmer, this specifies the size of the kmer to use (default is 6)

## Multifasta\_to\_otu ##
The Multifasta\_to\_otu tool is a handy wrapper for quikr which lets the user
to input as many fasta files as they like, and then returns an OTU table of the
number of times a specimen was seen in all of the samples 

Warning: this program will use a large amount of memory, and CPU time. You can
reduce the number of cores used, and thus memory, by specifying the -j flag
with aspecified number of jobs. Otherwise python with run one job per cpu core.

# Pre-processing of Multifasta\_to\_otu  #

* Please name fasta files of sample reads with <sample id>.fa<*> and place them
  into one directory without any other file in that directory (for example, no
  hidden files that the operating system may generate, are allowed in that
  directory)
* Fasta files of reads must have a suffix that starts with .fa (e.g.: .fasta and
  .fa are valid while .fna is NOT)

### Usage ###
multifasta\_to\_otu's arguments:
  -i, --input, the directory containing the samples' fasta files of
  reads (note each fasta file should correspond to a separate sample)
  -o, --otu-table, the OTU table, with OTU\_FRACTION\_PRESENT for each sample,
  which is compatible with QIIME's convert\_biom.py (or sequence table if not
  OTU's)
  -s, --sensing-matrix, the sensing matrix
  -f, --sensing-fasta, the fasta file database of sequences
  -l, --lambda, specify what size of lambda to use (the default value is 10,000)
  -k, --kmer, specify what size of kmer to use, (default value is 6)
  -j, --jobs, specifies how many jobs to run at once, (default=number of CPUs)

# Post-processing of Multifasta\_to\_otu  #

* Note: When making your QIIME Metadata file, the sample id's must match the
  sample fasta file prefix names

4-step QIIME procedure after using Quikr to obtain 3D PCoA graphs:
(Note: Our code works much better with WEIGHTED Unifrac as opposed to
Unweighted.)

Pre-requisites:
1. <quikr_otu_table.txt>
2. the tree of the database sequences that were used (e.g.  dp7\_mafft.fasttree,
   gg\_94\_otus\_4feb2011.tre, etc.)
3. your-defined <qiime_metadata_file.txt>

The QIIME procedue:
    convert_biom.py -i <quikr_otu_table.txt> -o <quikr_otu>.biom --biom_table_type="otu table"
    beta_diversity.py -i <quikr_otu>.biom -m weighted_unifrac -o beta_div -t <tree file> (example: rdp7_mafft.fasttree)>
    principal_coordinates.py -i beta_div/weighted_unifrac_<quikr_otu>.txt -o <quikr_otu_project_name>_weighted.txt
    make_3d_plots.py -i <quikr_otu_project_name>_weighted.txt -o <3d_pcoa_plotdirectory> -m <qiime_metadata_file>

#### Broken Pipe Errors #### 
Make sure that you have the count-kmers and probablilties-by-read in your
$PATH, and that they are executable. 

If you have not installed quikr system-wide, you'll need to add the folder
location of these binaries in the terminal before running the command:
 
    mv /path/to/quikr/src/nbc/count /path/to/quikr/src/nbc/count-kmers
    PATH = $PATH:/path/to/quikr/src/nbc/