est_rscu
calculates the Relative Synonymous Codon Usage (RSCU) values
for codons, which quantify the bias in synonymous codon usage. RSCU values
indicate whether a codon is used more (>1) or less (<1) frequently than
expected under uniform usage within its synonymous group.
Usage
est_rscu(
cf,
weight = 1,
pseudo_cnt = 1,
codon_table = get_codon_table(),
level = "subfam",
incl_stop = FALSE
)
Arguments
- cf
A matrix of codon frequencies as calculated by
count_codons()
. Rows represent sequences and columns represent codons.- weight
A numeric vector of the same length as the number of sequences in
cf
, providing different weights for sequences when calculating codon frequencies. For example, gene expression levels. Default is 1 (equal weights).- pseudo_cnt
Numeric pseudo count added to avoid division by zero when few sequences are available for RSCU calculation (default: 1).
- codon_table
A codon table defining the genetic code, derived from
get_codon_table()
orcreate_codon_table()
.- level
Character string specifying the analysis level: "subfam" (default, analyzes codon subfamilies) or "amino_acid" (analyzes at amino acid level).
- incl_stop
Logical. Whether to include RSCU values for stop codons in the output (default: FALSE).
Value
A data.table containing the codon table with additional columns for RSCU analysis: usage frequency counts (cts), frequency proportions (prop), CAI weights (w_cai), and RSCU values (rscu). The table includes amino acid codes, full amino acid names, codons, and subfamily classifications.
References
Sharp PM, Tuohy TM, Mosurski KR. 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14:5125-5143.
Examples
# Calculate RSCU for all yeast genes
cf_all <- count_codons(yeast_cds)
rscu_all <- est_rscu(cf_all)
head(rscu_all)
#> aa_code amino_acid codon subfam cts prop w_cai rscu
#> <char> <char> <char> <char> <num> <num> <num> <num>
#> 1: F Phe TTT Phe_TT 79149 0.5946835 1.0000000 1.1893671
#> 2: F Phe TTC Phe_TT 53945 0.4053165 0.6815666 0.8106329
#> 3: L Leu TTA Leu_TT 77584 0.4968747 0.9875765 0.9937494
#> 4: L Leu TTG Leu_TT 78560 0.5031253 1.0000000 1.0062506
#> 5: S Ser TCT Ser_TC 68480 0.3590299 1.0000000 1.4361195
#> 6: S Ser TCC Ser_TC 41295 0.2165053 0.6030286 0.8660211
# Calculate RSCU for highly expressed genes (top 500)
heg <- head(yeast_exp[order(-yeast_exp$fpkm), ], n = 500)
cf_heg <- count_codons(yeast_cds[heg$gene_id])
rscu_heg <- est_rscu(cf_heg)
head(rscu_heg)
#> aa_code amino_acid codon subfam cts prop w_cai rscu
#> <char> <char> <char> <char> <num> <num> <num> <num>
#> 1: F Phe TTT Phe_TT 2681 0.4000597 0.6668324 0.8001193
#> 2: F Phe TTC Phe_TT 4021 0.5999403 1.0000000 1.1998807
#> 3: L Leu TTA Leu_TT 3178 0.3213383 0.4734882 0.6426766
#> 4: L Leu TTG Leu_TT 6713 0.6786617 1.0000000 1.3573234
#> 5: S Ser TCT Ser_TC 4602 0.4891605 1.0000000 1.9566419
#> 6: S Ser TCC Ser_TC 2885 0.3066950 0.6269824 1.2267800