count_codons
tabulates the frequency of all 64 possible codons across
input coding sequences. This function provides the foundation for most codon
usage bias analyses in the cubar package.
Arguments
- seqs
Coding sequences as a DNAStringSet object, or compatible input that can be coerced to DNAStringSet.
- ...
Additional arguments passed to
Biostrings::trinucleotideFrequency
.
Value
A matrix where rows represent individual CDS sequences and columns represent the 64 possible codons. Each cell contains the frequency count of the corresponding codon in the respective sequence.
Examples
# Count codon frequencies across all yeast CDS sequences
cf_all <- count_codons(yeast_cds)
dim(cf_all)
#> [1] 6600 64
cf_all[1:5, 1:5]
#> AAA AAC AAG AAT ACA
#> YPL071C 10 4 5 10 2
#> YLL050C 6 3 5 3 0
#> YMR172W 16 37 25 48 21
#> YOR185C 8 4 10 8 1
#> YLL032C 39 26 20 44 17
# Count codons for a single sequence
count_codons(yeast_cds[1])
#> AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC AGG AGT ATA ATC ATG ATT CAA CAC
#> YPL071C 10 4 5 10 2 1 0 3 5 1 2 3 6 2 5 6 2 1
#> CAG CAT CCA CCC CCG CCT CGA CGC CGG CGT CTA CTC CTG CTT GAA GAC GAG GAT
#> YPL071C 0 4 2 2 0 0 0 1 1 1 3 0 1 1 4 5 3 15
#> GCA GCC GCG GCT GGA GGC GGG GGT GTA GTC GTG GTT TAA TAC TAG TAT TCA TCC
#> YPL071C 3 0 1 3 3 2 1 0 2 1 2 1 1 1 0 5 0 3
#> TCG TCT TGA TGC TGG TGT TTA TTC TTG TTT
#> YPL071C 1 1 0 0 5 1 3 2 2 2