aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md91
1 files changed, 61 insertions, 30 deletions
diff --git a/README.md b/README.md
index 9ec3d47..9039949 100644
--- a/README.md
+++ b/README.md
@@ -3,15 +3,31 @@ SelectiveGenomeAmplification
PI: http://brisson.bio.upenn.edu/
-
-
+## Table of Contents
+
+* [Requirements](#requirements)
+* [Setup](#setup)
+* [Example Usage](#example-usage)
+ * [SGA User Interface](#sga-user-interface)
+ * [Setting Tunable Parameters](#setting-tunable-parameters)
+ * [Running Individual Steps](#running-individual-steps)
+ * [Manually Scoring Specific Mer Combinations From List ](#manually-scoring-specific-mer-combinations-from-list)
+ * [Manually Score All Combinations From List](#manually-score-all-combinations-from-list)
+* [Table of Tunable Parameters](#tunable-parameters)
+* [Equations](#equations)
+ * [Mer Selectivity](#mer-selectivity)
+ * [Scoring Equation](#score-function)
+* [Output](#output)
+ * [Select Mers](#select_merspy-output)
+ * [Score Mers](#score_merspy-output)
+
## Requirements
To use this you'll need:
- - A unix environment
- - kmer_total_count, a kmer counter available here: http://github.com/mutantturkey/dna-utils/
+ - A Unix environment
+ - [dna-utils](http://github.com/mutantturkey/dna-utils/)
- bash or compliant shell.
-
+ - python 2.7.x
## Setup
@@ -20,7 +36,7 @@ To use this you'll need:
make
sudo make install
-## Usage Examples
+## Example Usage
Standard use of (SGA) SelectiveGenomeAmplification is easy. it takes two arguments,
the foreground and background
@@ -28,6 +44,24 @@ the foreground and background
SelectiveGenomeAmplification PfalciparumGenome.fasta HumanGenome.fasta;
less PfalciparumGenome_HumanGenome/final_mers
+### SGA User Interface
+SGA also comes with a easy to use user prompt called SelectiveGenomeAmplificationUI.
+It allows for a less experienced user to use
+SGA without issue. to run this all you need to do is run SelectiveGenomeAmiplifcationUI and you'll see a series of prompts asking the user about tunables like below
+
+ Where would you like to temporary files to be stored? (Default=$output_directory/.tmp):
+ Where would you like to count files to be stored? (Default=$output_directory/.tmp):
+ maximum mer size you would like to pick? (Default=12): 10
+ minimum mer size you would like to pick? (Default=6): 7
+ eliminate mers that appear less frequently on average than this number ? (Default=50000): 25000
+ .....
+ Input the path to your foreground file:target.fa
+ Input the path to your background file:humangenome.fa
+ Would you like to output your inserted variables to a string you can later paste? (Y/N/Default=y): n
+ Run SelectiveGenomeAmplification? (Y/N/Default=y): y
+
+### Setting Tunable Parameters
+
SGA allows for many tunable parameters, which are all explained in the chart
below. For user customizable variables, they need to be passed in as
environmental variables like so:
@@ -35,9 +69,6 @@ environmental variables like so:
max_mer_distance=5000 max_select=6 min_mer_range=6 max_mer_range=12 \
SelectiveGenomeAmplification.sh PfalciparumGenome.fasta half.fasta
-SGA also comes with a easy to use user prompt called SelectiveGenomeAmplificationUI.
-It allows for a less expereienced user to use
-SGA without issue.
### Running individual steps
@@ -59,7 +90,7 @@ valid steps are these:
This function does not try to be smart, so use it wisely.
-### Manually scoring specific mer combinations from file
+### Manually scoring specific mer combinations from list
Users can manually score combinations of mers they choose using the
score\_mers.py script.
@@ -71,12 +102,13 @@ The combination file should look like this:
ACGATATAT TACATAGA TATATATAT ACGTACCAT ATATTA
AAATTATCAGT ATACATA ATATACAT ATATACATA ACATA
- ATATACATA ATCATGATA CCAGATACATAT
+ ATATACATA ATCATGATA CCAGATACATAT
each row is combination to be scored.
-### Manually score all combinations from file
+### Manually score all combinations from list
+
Users can manually score all combinations of mers they choose using the
score\_mers.py script.
@@ -86,18 +118,16 @@ score\_mers.py script.
The mer file should look like this:
ATATAT
- TACATA
- TACATAGCA
- TATAGAATAC
- CGTAGATA
- TAGAAT
-
-each row is a seperate mer. do not put multiple mers on one line.
+ TACATA
+ TACATAGCA
+ TATAGAATAC
+ CGTAGATA
+ TAGAAT
+each row is a separate mer. do not put multiple mers on one line.
-## Customizable variables
-range of mers, min and max
+## Tunable Parameters
variable | default | notes
:---- | :---- | ---- | :----
@@ -110,29 +140,30 @@ counts\_directory | $output\_directory/.tmp | directory for counts directory
tmp\_directory | $output\_directory/.tmp | temporary files directory
max\_melting\_temp | 30° | maximum melting temp of mers
min\_melting\_temp | 0° | minimum melting temp of mers
-min\_foreground\_binding\_average | 50000 | elminate mers that appear less frequently than the average (length of foreground / # of occurances)
+min\_foreground\_binding\_average | 50000 | eliminate mers that appear less frequently than the average (length of foreground / # of occurrances)
max\_select | 15 | maximum number of mers to pick
max\_check | 35 | maximum number of mers to select (check the top #)
-ignore\_mers | Not Enabled | mers to explicitly ignore, space seperated ex. ignore\_mers="ACAGTA ACCATAA ATATATAT"
-ignore\_all\_mers\_from\_files | Not Enabled | ignore any mers found in these files. space seperated.
+ignore\_mers | Not Enabled | mers to explicitly ignore, space separated ex. ignore\_mers="ACAGTA ACCATAA ATATATAT"
+ignore\_all\_mers\_from\_files | Not Enabled | ignore any mers found in these files. space separated.
foreground | Not Enabled | path of foreground file
background | Not Enabled | path of background file
-max\_consecutive\_binding | 4 | The maxium number of consecutive binding nucleotides in homodimer and heterodimers
+max\_consecutive\_binding | 4 | The maximum number of consecutive binding nucleotides in homodimer and heterodimers
fg\_weight | 0 | How much extra weight to give higher frequency mers in fg. see "equations" (between 0 and 1)
-primer\_weight | 0 | How much extra weight to give to sets with a higher number of priemrs. (between 0 and 1)
+primer\_weight | 0 | How much extra weight to give to sets with a higher number of primers. (between 0 and 1)
output\_top\_nb | 10000 | How many scores do you want to output in your sorted output file?
+=======
## Equations
Here's what we are using to determine our scoring and selectivity
-### Selecivity
+### Mer Selectivity
Our selectivity is what we use to determine what top $max\_check mers are checked later
on in our scoring function. Currently we use this formula:
By default our fg\_weight is zero. This gives no extra weight to more
-frequently occuring mers, but can be set higher with the fg\_weight
+frequently occurring mers, but can be set higher with the fg\_weight
environmental variable if you wish to do so.
hit = abundance of primer X (ex. 'ATGTA') in background
@@ -144,7 +175,7 @@ environmental variable if you wish to do so.
The scoring function is this:
- fg_pts = all the points of each mer in the combination, and sequence ends4
+ fg_pts = all the points of each mer in the combination, and sequence ends
fg_mean_dist = mean distance between each point in fg_pts
fg_stddev = standard deviation of distance between each point in fg_pts
@@ -186,6 +217,6 @@ background count, and the mer selectivity value. (higher is better)
### score\_mers.py output
-score medrs outputs a tab delmited file with 6 columns:
+score mers outputs a tab delimited file with 6 columns:
nb_primers Combination Score FG_mean_dist FG_stdev_dist BG_ratio