est_optimal_codons
identifies optimal codons within each codon family
or amino acid group using binomial regression. Optimal codons are those whose
usage correlates positively with high gene expression or negatively with
codon usage bias (ENC), suggesting they are preferred for efficient translation.
Usage
est_optimal_codons(
cf,
codon_table = get_codon_table(),
level = "subfam",
gene_score = NULL,
fdr = 0.001
)
Arguments
- cf
A matrix of codon frequencies as calculated by
count_codons()
. Rows represent sequences and columns represent codons.- codon_table
A codon table defining the genetic code, derived from
get_codon_table()
orcreate_codon_table()
.- level
Character string specifying the analysis level: "subfam" (default, analyzes codon subfamilies) or "amino_acid" (analyzes at amino acid level).
- gene_score
A numeric vector of gene-level scores used to identify optimal codons. Length must equal the number of rows in
cf
. Common choices include:Gene expression levels (RPKM, TPM, FPKM) - optionally log-transformed
Protein abundance measurements
Custom gene importance scores
If not provided, the negative of ENC values will be used (lower ENC = higher bias).
- fdr
Numeric value specifying the false discovery rate threshold for determining statistical significance of codon optimality (default depends on method).
Value
A data.table containing the input codon table with additional columns indicating codon optimality status, statistical significance, and effect sizes from the regression analysis. The columns include single-letter abbreviation of the amino acid, three-letter abbreviation, codon, codon subfamily, regression coefficient, regression P-value, Benjamini and Hochberg corrected Q-value, and indication of whether the codon is optimal.
References
Presnyak V, Alhusaini N, Chen YH, Martin S, Morris N, Kline N, Olson S, Weinberg D, Baker KE, Graveley BR, et al. 2015. Codon optimality is a major determinant of mRNA stability. Cell 160:1111-1124.
Examples
# perform binomial regression for optimal codon estimation
cf_all <- count_codons(yeast_cds)
codons_opt <- est_optimal_codons(cf_all)
codons_opt <- codons_opt[optimal == TRUE]
codons_opt
#> aa_code amino_acid codon subfam coef pvalue qvalue
#> <char> <char> <char> <char> <num> <num> <num>
#> 1: A Ala GCT Ala_GC 0.08454964 0.000000e+00 0.000000e+00
#> 2: A Ala GCC Ala_GC 0.01621930 2.127082e-32 2.359128e-32
#> 3: R Arg AGA Arg_AG 0.12902657 0.000000e+00 0.000000e+00
#> 4: R Arg CGT Arg_CG 0.20090361 0.000000e+00 0.000000e+00
#> 5: N Asn AAC Asn_AA 0.04208269 8.024342e-185 1.223712e-184
#> 6: D Asp GAC Asp_GA 0.01574961 3.398292e-28 3.636768e-28
#> 7: C Cys TGT Cys_TG 0.09889375 4.697718e-150 6.512746e-150
#> 8: Q Gln CAA Gln_CA 0.11196536 0.000000e+00 0.000000e+00
#> 9: E Glu GAA Glu_GA 0.08458541 0.000000e+00 0.000000e+00
#> 10: G Gly GGT Gly_GG 0.16530194 0.000000e+00 0.000000e+00
#> 11: H His CAC His_CA 0.03127977 7.294628e-42 8.240228e-42
#> 12: I Ile ATT Ile_AT 0.03956734 1.625599e-208 2.754487e-208
#> 13: I Ile ATC Ile_AT 0.03975891 1.099697e-188 1.765303e-188
#> 14: L Leu CTT Leu_CT 0.02178829 6.897132e-23 7.253880e-23
#> 15: L Leu CTA Leu_CT 0.05101078 7.732994e-124 1.025462e-123
#> 16: L Leu TTG Leu_TT 0.03514392 7.751784e-158 1.125854e-157
#> 17: K Lys AAG Lys_AA 0.05853116 0.000000e+00 0.000000e+00
#> 18: F Phe TTC Phe_TT 0.05451940 3.720900e-254 7.092965e-254
#> 19: P Pro CCA Pro_CC 0.10328272 0.000000e+00 0.000000e+00
#> 20: S Ser AGT Ser_AG 0.02452355 2.109510e-19 2.144669e-19
#> 21: S Ser TCT Ser_TC 0.06070916 0.000000e+00 0.000000e+00
#> 22: S Ser TCC Ser_TC 0.02605206 1.324126e-70 1.583759e-70
#> 23: T Thr ACT Thr_AC 0.04838553 2.506592e-292 5.272486e-292
#> 24: T Thr ACC Thr_AC 0.04684950 2.157821e-230 3.760774e-230
#> 25: Y Tyr TAC Tyr_TA 0.04206093 1.244976e-121 1.582157e-121
#> 26: V Val GTT Val_GT 0.05787243 0.000000e+00 0.000000e+00
#> 27: V Val GTC Val_GT 0.04995247 1.700719e-281 3.458128e-281
#> aa_code amino_acid codon subfam coef pvalue qvalue
#> optimal
#> <lgcl>
#> 1: TRUE
#> 2: TRUE
#> 3: TRUE
#> 4: TRUE
#> 5: TRUE
#> 6: TRUE
#> 7: TRUE
#> 8: TRUE
#> 9: TRUE
#> 10: TRUE
#> 11: TRUE
#> 12: TRUE
#> 13: TRUE
#> 14: TRUE
#> 15: TRUE
#> 16: TRUE
#> 17: TRUE
#> 18: TRUE
#> 19: TRUE
#> 20: TRUE
#> 21: TRUE
#> 22: TRUE
#> 23: TRUE
#> 24: TRUE
#> 25: TRUE
#> 26: TRUE
#> 27: TRUE
#> optimal