Skip to contents

est_optimal_codons determine optimal codon of each codon family with binomial regression. Usage of optimal codons should correlate negatively with enc.

Usage

est_optimal_codons(
  cf,
  codon_table = get_codon_table(),
  level = "subfam",
  gene_score = NULL,
  fdr = 0.001
)

Arguments

cf

matrix of codon frequencies as calculated by count_codons().

codon_table

a table of genetic code derived from get_codon_table or create_codon_table.

level

"subfam" (default) or "amino_acid". For which level to determine optimal codons.

gene_score

a numeric vector of scores for genes. The order of values should match with gene orders in the codon frequency matrix. The length of the vector should be equal to the number of rows in the matrix. The scores could be gene expression levels (RPKM or TPM) that are optionally log-transformed (for example, with log1p). The opposite of ENC will be used by default if gene_score is not provided.

fdr

false discovery rate used to determine optimal codons.

Value

data.table of optimal codons.

Examples

# perform binomial regression for optimal codon estimation
cf_all <- count_codons(yeast_cds)
codons_opt <- est_optimal_codons(cf_all)
codons_opt <- codons_opt[optimal == TRUE]
codons_opt
#>     aa_code amino_acid  codon subfam       coef        pvalue        qvalue
#>      <char>     <char> <char> <char>      <num>         <num>         <num>
#>  1:       A        Ala    GCT Ala_GC 0.08586353  0.000000e+00  0.000000e+00
#>  2:       A        Ala    GCC Ala_GC 0.01660752  1.373963e-32  1.523850e-32
#>  3:       R        Arg    AGA Arg_AG 0.13131610  0.000000e+00  0.000000e+00
#>  4:       R        Arg    CGT Arg_CG 0.20942840  0.000000e+00  0.000000e+00
#>  5:       N        Asn    AAC Asn_AA 0.04243915 9.095455e-182 1.387057e-181
#>  6:       D        Asp    GAC Asp_GA 0.01548768  1.543207e-26  1.651503e-26
#>  7:       C        Cys    TGT Cys_TG 0.10173880 8.440487e-153 1.144155e-152
#>  8:       Q        Gln    CAA Gln_CA 0.11358343  0.000000e+00  0.000000e+00
#>  9:       E        Glu    GAA Glu_GA 0.08514744  0.000000e+00  0.000000e+00
#> 10:       G        Gly    GGT Gly_GG 0.16885153  0.000000e+00  0.000000e+00
#> 11:       H        His    CAC His_CA 0.03106030  1.117772e-39  1.262668e-39
#> 12:       I        Ile    ATT Ile_AT 0.04103142 2.627944e-216 4.452905e-216
#> 13:       I        Ile    ATC Ile_AT 0.04053426 2.660338e-189 4.270543e-189
#> 14:       L        Leu    CTT Leu_CT 0.02079531  3.122680e-20  3.174725e-20
#> 15:       L        Leu    CTA Leu_CT 0.05336602 1.790310e-130 2.374107e-130
#> 16:       L        Leu    TTG Leu_TT 0.03574780 9.730992e-158 1.413311e-157
#> 17:       K        Lys    AAG Lys_AA 0.05928741  0.000000e+00  0.000000e+00
#> 18:       F        Phe    TTC Phe_TT 0.05586432 1.505846e-257 2.870519e-257
#> 19:       P        Pro    CCA Pro_CC 0.10530399  0.000000e+00  0.000000e+00
#> 20:       S        Ser    AGT Ser_AG 0.02607544  6.281898e-21  6.494844e-21
#> 21:       S        Ser    TCT Ser_TC 0.06174128  0.000000e+00  0.000000e+00
#> 22:       S        Ser    TCC Ser_TC 0.02651792  1.955211e-70  2.338586e-70
#> 23:       T        Thr    ACT Thr_AC 0.04947163 2.352299e-294 4.947939e-294
#> 24:       T        Thr    ACC Thr_AC 0.04730168 8.546350e-226 1.489507e-225
#> 25:       Y        Tyr    TAC Tyr_TA 0.04275237 2.098384e-121 2.666697e-121
#> 26:       V        Val    GTT Val_GT 0.05920949  0.000000e+00  0.000000e+00
#> 27:       V        Val    GTC Val_GT 0.05086915 2.480125e-280 5.042921e-280
#>     aa_code amino_acid  codon subfam       coef        pvalue        qvalue
#>     optimal
#>      <lgcl>
#>  1:    TRUE
#>  2:    TRUE
#>  3:    TRUE
#>  4:    TRUE
#>  5:    TRUE
#>  6:    TRUE
#>  7:    TRUE
#>  8:    TRUE
#>  9:    TRUE
#> 10:    TRUE
#> 11:    TRUE
#> 12:    TRUE
#> 13:    TRUE
#> 14:    TRUE
#> 15:    TRUE
#> 16:    TRUE
#> 17:    TRUE
#> 18:    TRUE
#> 19:    TRUE
#> 20:    TRUE
#> 21:    TRUE
#> 22:    TRUE
#> 23:    TRUE
#> 24:    TRUE
#> 25:    TRUE
#> 26:    TRUE
#> 27:    TRUE
#>     optimal