summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/cli.markdown5
-rw-r--r--doc/matlab.markdown112
2 files changed, 116 insertions, 1 deletions
diff --git a/doc/cli.markdown b/doc/cli.markdown
index c7f8e11..8d25337 100644
--- a/doc/cli.markdown
+++ b/doc/cli.markdown
@@ -26,6 +26,9 @@ Quikr returns the estimated frequencies of batcteria present when given a
input FASTA file. A default trained matrix will be used if none is supplied
You must supply a kmer and default lambda if using a custom trained matrix.
+### Usage ###
+quikr returns the solution vector as a csv file.
+
quikr's optional arguments:
-f, --fasta, the fasta file sample
-o, --output OUTPUT, the output path (csv output)
@@ -34,7 +37,7 @@ quikr's optional arguments:
-k, --kmer, this specifies which kmer to use (default is 6)
-### Troubleshooting ###
+# Troubleshooting #
If you are having trouble, and these solutions don't work. Please contact the
developers with questions and issues.
diff --git a/doc/matlab.markdown b/doc/matlab.markdown
new file mode 100644
index 0000000..ca701b9
--- /dev/null
+++ b/doc/matlab.markdown
@@ -0,0 +1,112 @@
+# Quikr's Matlab Implementation #
+
+The Quikr implementation works in Matlab and also works in Octave, but the
+Octave version will run much slower
+
+## Quikr Example ##
+This is an example of how to run Quikr. Before you try the example please make
+make sure that you are in the quikr's matlab directory (src/matlab/):
+
+ cd quikr/src/matlab
+
+
+### Using Quikr with the default databse ###
+This is the full path name to your data file:
+
+ fastafilename='/path/to/quikr-code/testfastafile.fa';
+
+This will give the predicted reconstruction frequencies using the default
+training database trainset7\_112011.fa from RDP version 2.4
+Xstar will be on the same basis as trainset7\_112011.fa, so to get the sequences
+that are predicted to be present in your sample:
+
+ xstar=quikr(fastafilename);
+
+Read in the training database.
+
+_Note fastaread is not by default included in Octave. The fastaread.m file is
+included in the Quikr download directory and is directly compatible with
+Matlab._
+
+ [headers,~]=fastaread('trainset7\_112011.fa');
+
+
+Get the indicies of the sequences quikr predicts are in your sample
+
+ nonzeroentries=find(xstar);
+
+Convert the concentrations into a cell array
+
+ proportionscell=num2cell(xstar(nonzeroentries));
+
+Get the names of the sequences
+
+ namescell=headers(nonzeroentries);
+
+This cell array contains the (unsorted) names of the reconstructed sequences and
+their concentrations (in the first and second columns respectively) so to find
+which sequence is the most abundant in your mixture:
+
+ namesandproportions={namescell{:}; proportionscell{:}};
+
+Get the maximum value and it's position
+
+ [val,ind]=max(xstar(nonzeroentries));
+
+Note that this does not imply this specific strain or species is in your sample,
+just that phylum/class/order/family/genus this species belongs to is in your
+sample.
+
+ namesandproportions{1:2,ind}
+
+### Using Quikr With A Custom Trained Database ###
+If you would like to use a custom training database, follow the following steps:
+
+Full path name to your data file:
+
+ fastafilename='/path/to/your/fastafile.fasta';
+
+Full path to the FASTA file you wish to use as a training database
+
+ trainingdatabasefilename='/path/to/your/trainingdatabase.fasta';
+
+Pick a k-mer size (typically 6 )
+
+ k=6;
+
+This will return the training database then to do the reconstruction
+
+ trainingmatrix=quikrTrain(trainingdatabasefilename,k);
+
+Pick a lambda (larger lambda -> theoretically predicted concentrations are
+closer to actual concentrations), this depends on k-mer size picked, also size
+and condition of the TrainingMatrix
+
+ lambda=10000;
+
+Get the predicted reconstruction frequencies
+Again, xstar is on the same basis as the TrainingMatrix, so to get the sequences
+that are predicted to be present in your sample:
+
+ xstar=quikrCustomTrained(trainingmatrix,fastafilename,k,lambda);
+
+Read in the training database
+
+ [headers,~]=fastaread(trainingdatabasefilename);
+
+Get the indices of the sequences quikr predicts are in your sample
+
+ nonzeroentries=find(xstar);
+
+Convert the concentrations into a cell array
+
+ proportionscell=num2cell(xstar(nonzeroentries));
+
+Get the names of the sequences
+
+ namescell=headers(nonzeroentries);
+
+This cell array contains the (unsorted) names of the reconstructed sequences and
+their concentrations (in the first and second columns respectively)
+
+ namesandproportions={namescell{:}; proportionscell{:}};