aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 57860161d40aa247bb493aa65b3ee0e7f06f5800 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# dna-utils 


This repository contains general utilities for processing sequences in fasta files.


## Tools included ##

All of our tools use an alphebetical indexing scheme like this:

    AAAA = 0
    AAAC = 1
    AAAG = 2
    AAAT = 3
    AACA = 4
    ...
    

### kmer_total_count 

this program will count each kmer in a fasta file, and print to standard out.

#### Usage

    usage: kmer_total_count -i input_file -k kmer [-n] [-l] ...

    count mers in size k from a fasta file

      --input    -i  input fasta file to count
      --kmer     -k  size of mers to count
      --nonzero  -n  only print non-zero values
      --label    -l  print mer along with value

    Report all bugs to mutantturkey@gmail.com

    Copyright 2014 Calvin Morrison, Drexel University.

    If you are using any dna-utils tool for a publication
    please cite your usage:

    dna-utils. Drexel University, Philadelphia USA, 2014;
    software available at www.github.com/EESI/dna-utils/

#### Examples

a basic example, where we specify the k-mer size and input file.

    calvin@barnabas:~/dna-utils$ ./kmer_total_count -i SuperManSequences.fasta -k 8 
    2946
    1161
    14141
    ...

it also supports input from stdin, which is great for combining with compression programs

    calvin@barnabase:~/dna-utils$ gzip -dc ~/super_big_fasta_file.fasta.gz | ./kmer_total_count -k 8
    234523
    121612
    123161
    294282
    ...
    
we can also have only nonzero results (great for large mers), which prints the index, then the value

    calvin@barnabas:~/src/dna-utils$ ./kmer_total_count --nonzero -k 9 < ~/input/sample\=700013596.fa
    no input file specified with -i, reading from stdin
    0	3
    1	2
    3	3
    5	1
    ...

lastly a useful tool is having the labels generated for us, so grepping, searching and other things are easier.

    calvin@barnabas:~/src/dna/utils$ ./kmer_total_count -i ~/sample.fa -k 6 -l
    AAAAAA 552
    AAAAAC 246
    ...
    TTTTTC 102
    TTTTTG 924
    TTTTTT 4961

### kmer_counts_per_sequence



#### Licensing and Citation
Report all bugs to mutantturkey@gmail.com

Copyright 2014 Calvin Morrison, Drexel University.

If you are using any dna-utils tool for a publication
please cite your usage:

dna-utils. Drexel University, Philadelphia USA, 2014;
software available at www.github.com/EESI/dna-utils/