est_optimal_codons
determine optimal codon of each codon family with binomial regression.
Usage of optimal codons should correlate negatively with enc.
Usage
est_optimal_codons(
cf,
codon_table = get_codon_table(),
level = "subfam",
gene_score = NULL,
fdr = 0.001
)
Arguments
- cf
matrix of codon frequencies as calculated by
count_codons()
.- codon_table
a table of genetic code derived from
get_codon_table
orcreate_codon_table
.- level
"subfam" (default) or "amino_acid". For which level to determine optimal codons.
- gene_score
a numeric vector of scores for genes. The order of values should match with gene orders in the codon frequency matrix. The length of the vector should be equal to the number of rows in the matrix. The scores could be gene expression levels (RPKM or TPM) that are optionally log-transformed (for example, with
log1p
). The opposite of ENC will be used by default ifgene_score
is not provided.- fdr
false discovery rate used to determine optimal codons.
Examples
# perform binomial regression for optimal codon estimation
cf_all <- count_codons(yeast_cds)
codons_opt <- est_optimal_codons(cf_all)
codons_opt <- codons_opt[optimal == TRUE]
codons_opt
#> aa_code amino_acid codon subfam coef pvalue qvalue
#> <char> <char> <char> <char> <num> <num> <num>
#> 1: A Ala GCT Ala_GC 0.08586353 0.000000e+00 0.000000e+00
#> 2: A Ala GCC Ala_GC 0.01660752 1.373963e-32 1.523850e-32
#> 3: R Arg AGA Arg_AG 0.13131610 0.000000e+00 0.000000e+00
#> 4: R Arg CGT Arg_CG 0.20942840 0.000000e+00 0.000000e+00
#> 5: N Asn AAC Asn_AA 0.04243915 9.095455e-182 1.387057e-181
#> 6: D Asp GAC Asp_GA 0.01548768 1.543207e-26 1.651503e-26
#> 7: C Cys TGT Cys_TG 0.10173880 8.440487e-153 1.144155e-152
#> 8: Q Gln CAA Gln_CA 0.11358343 0.000000e+00 0.000000e+00
#> 9: E Glu GAA Glu_GA 0.08514744 0.000000e+00 0.000000e+00
#> 10: G Gly GGT Gly_GG 0.16885153 0.000000e+00 0.000000e+00
#> 11: H His CAC His_CA 0.03106030 1.117772e-39 1.262668e-39
#> 12: I Ile ATT Ile_AT 0.04103142 2.627944e-216 4.452905e-216
#> 13: I Ile ATC Ile_AT 0.04053426 2.660338e-189 4.270543e-189
#> 14: L Leu CTT Leu_CT 0.02079531 3.122680e-20 3.174725e-20
#> 15: L Leu CTA Leu_CT 0.05336602 1.790310e-130 2.374107e-130
#> 16: L Leu TTG Leu_TT 0.03574780 9.730992e-158 1.413311e-157
#> 17: K Lys AAG Lys_AA 0.05928741 0.000000e+00 0.000000e+00
#> 18: F Phe TTC Phe_TT 0.05586432 1.505846e-257 2.870519e-257
#> 19: P Pro CCA Pro_CC 0.10530399 0.000000e+00 0.000000e+00
#> 20: S Ser AGT Ser_AG 0.02607544 6.281898e-21 6.494844e-21
#> 21: S Ser TCT Ser_TC 0.06174128 0.000000e+00 0.000000e+00
#> 22: S Ser TCC Ser_TC 0.02651792 1.955211e-70 2.338586e-70
#> 23: T Thr ACT Thr_AC 0.04947163 2.352299e-294 4.947939e-294
#> 24: T Thr ACC Thr_AC 0.04730168 8.546350e-226 1.489507e-225
#> 25: Y Tyr TAC Tyr_TA 0.04275237 2.098384e-121 2.666697e-121
#> 26: V Val GTT Val_GT 0.05920949 0.000000e+00 0.000000e+00
#> 27: V Val GTC Val_GT 0.05086915 2.480125e-280 5.042921e-280
#> aa_code amino_acid codon subfam coef pvalue qvalue
#> optimal
#> <lgcl>
#> 1: TRUE
#> 2: TRUE
#> 3: TRUE
#> 4: TRUE
#> 5: TRUE
#> 6: TRUE
#> 7: TRUE
#> 8: TRUE
#> 9: TRUE
#> 10: TRUE
#> 11: TRUE
#> 12: TRUE
#> 13: TRUE
#> 14: TRUE
#> 15: TRUE
#> 16: TRUE
#> 17: TRUE
#> 18: TRUE
#> 19: TRUE
#> 20: TRUE
#> 21: TRUE
#> 22: TRUE
#> 23: TRUE
#> 24: TRUE
#> 25: TRUE
#> 26: TRUE
#> 27: TRUE
#> optimal