Skip to contents

count_codons tabulates the frequency of all 64 possible codons across input coding sequences. This function provides the foundation for most codon usage bias analyses in the cubar package.

Usage

count_codons(seqs, ...)

Arguments

seqs

Coding sequences as a DNAStringSet object, or compatible input that can be coerced to DNAStringSet.

...

Additional arguments passed to Biostrings::trinucleotideFrequency.

Value

A matrix where rows represent individual CDS sequences and columns represent the 64 possible codons. Each cell contains the frequency count of the corresponding codon in the respective sequence.

Examples

# Count codon frequencies across all yeast CDS sequences
cf_all <- count_codons(yeast_cds)
dim(cf_all)
#> [1] 6600   64
cf_all[1:5, 1:5]
#>         AAA AAC AAG AAT ACA
#> YPL071C  10   4   5  10   2
#> YLL050C   6   3   5   3   0
#> YMR172W  16  37  25  48  21
#> YOR185C   8   4  10   8   1
#> YLL032C  39  26  20  44  17

# Count codons for a single sequence
count_codons(yeast_cds[1])
#>         AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC AGG AGT ATA ATC ATG ATT CAA CAC
#> YPL071C  10   4   5  10   2   1   0   3   5   1   2   3   6   2   5   6   2   1
#>         CAG CAT CCA CCC CCG CCT CGA CGC CGG CGT CTA CTC CTG CTT GAA GAC GAG GAT
#> YPL071C   0   4   2   2   0   0   0   1   1   1   3   0   1   1   4   5   3  15
#>         GCA GCC GCG GCT GGA GGC GGG GGT GTA GTC GTG GTT TAA TAC TAG TAT TCA TCC
#> YPL071C   3   0   1   3   3   2   1   0   2   1   2   1   1   1   0   5   0   3
#>         TCG TCT TGA TGC TGG TGT TTA TTC TTG TTT
#> YPL071C   1   1   0   0   5   1   3   2   2   2