Skip to contents

est_rscu calculates the Relative Synonymous Codon Usage (RSCU) values for codons, which quantify the bias in synonymous codon usage. RSCU values indicate whether a codon is used more (>1) or less (<1) frequently than expected under uniform usage within its synonymous group.

Usage

est_rscu(
  cf,
  weight = 1,
  pseudo_cnt = 1,
  codon_table = get_codon_table(),
  level = "subfam",
  incl_stop = FALSE
)

Arguments

cf

A matrix of codon frequencies as calculated by count_codons(). Rows represent sequences and columns represent codons.

weight

A numeric vector of the same length as the number of sequences in cf, providing different weights for sequences when calculating codon frequencies. For example, gene expression levels. Default is 1 (equal weights).

pseudo_cnt

Numeric pseudo count added to avoid division by zero when few sequences are available for RSCU calculation (default: 1).

codon_table

A codon table defining the genetic code, derived from get_codon_table() or create_codon_table().

level

Character string specifying the analysis level: "subfam" (default, analyzes codon subfamilies) or "amino_acid" (analyzes at amino acid level).

incl_stop

Logical. Whether to include RSCU values for stop codons in the output (default: FALSE).

Value

A data.table containing the codon table with additional columns for RSCU analysis: usage frequency counts (cts), frequency proportions (prop), CAI weights (w_cai), and RSCU values (rscu). The table includes amino acid codes, full amino acid names, codons, and subfamily classifications.

References

Sharp PM, Tuohy TM, Mosurski KR. 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14:5125-5143.

Examples

# Calculate RSCU for all yeast genes
cf_all <- count_codons(yeast_cds)
rscu_all <- est_rscu(cf_all)
head(rscu_all)
#>    aa_code amino_acid  codon subfam   cts      prop     w_cai      rscu
#>     <char>     <char> <char> <char> <num>     <num>     <num>     <num>
#> 1:       F        Phe    TTT Phe_TT 79149 0.5946835 1.0000000 1.1893671
#> 2:       F        Phe    TTC Phe_TT 53945 0.4053165 0.6815666 0.8106329
#> 3:       L        Leu    TTA Leu_TT 77584 0.4968747 0.9875765 0.9937494
#> 4:       L        Leu    TTG Leu_TT 78560 0.5031253 1.0000000 1.0062506
#> 5:       S        Ser    TCT Ser_TC 68480 0.3590299 1.0000000 1.4361195
#> 6:       S        Ser    TCC Ser_TC 41295 0.2165053 0.6030286 0.8660211

# Calculate RSCU for highly expressed genes (top 500)
heg <- head(yeast_exp[order(-yeast_exp$fpkm), ], n = 500)
cf_heg <- count_codons(yeast_cds[heg$gene_id])
rscu_heg <- est_rscu(cf_heg)
head(rscu_heg)
#>    aa_code amino_acid  codon subfam   cts      prop     w_cai      rscu
#>     <char>     <char> <char> <char> <num>     <num>     <num>     <num>
#> 1:       F        Phe    TTT Phe_TT  2681 0.4000597 0.6668324 0.8001193
#> 2:       F        Phe    TTC Phe_TT  4021 0.5999403 1.0000000 1.1998807
#> 3:       L        Leu    TTA Leu_TT  3178 0.3213383 0.4734882 0.6426766
#> 4:       L        Leu    TTG Leu_TT  6713 0.6786617 1.0000000 1.3573234
#> 5:       S        Ser    TCT Ser_TC  4602 0.4891605 1.0000000 1.9566419
#> 6:       S        Ser    TCC Ser_TC  2885 0.3066950 0.6269824 1.2267800