codon_optimize
takes a coding sequence (without stop codon) and replace
each codon to the corresponding synonymous optimal codon.
Usage
codon_optimize(
seq,
optimal_codons = optimal_codons,
cf = NULL,
codon_table = get_codon_table(),
level = "subfam",
method = "naive",
num_sequences = 1,
organism = NULL,
envname = "cubar_env",
attention_type = "original_full",
deterministic = TRUE,
temperature = 0.2,
top_p = 0.95,
match_protein = FALSE,
spliceai = FALSE
)
Arguments
- seq
DNAString, or an object that can be coerced to a DNAString.
- optimal_codons
table optimze codons as generated by
est_optimal_codons
.- cf
matrix of codon frequencies as calculated by
count_codons()
. Required when method is set to "IDT".- codon_table
a table of genetic code derived from
get_codon_table
orcreate_codon_table
.- level
"subfam" (default) or "amino_acid". Optimize codon usage at which level. Required when method is set to "naive" or "IDT".
- method
"naive" (default), "IDT" or "CodonTransformer". For which method to estimate optimal codons. The "IDT" method derives from the Codon Optimization Tool of INTEGRATED DNA TECHNOLOGIES. The "CodonTransformer" method derives from the tool CodonTransformer.
- num_sequences
number of different DNA sequences to generate. Default is 1. Required when method is set to "IDT" or "CodonTransformer". When greater than 1, identical duplicate sequences will be retained as a single copy, potentially resulting in a final sequence count that is less than the specified value. With the method "CodonTransformer", only works when deterministic=False, and each sequence will be sampled based on the temperature and top_p parameters.
- organism
organism ID (integer) or name (string) (e.g., "Escherichia coli general", must be from ORGANISM2ID in CodonUtils). Required when method is set to "CodonTransformer".
- envname
the name of an environment when using the method "CodonTransformer" or when "spliceai" is TRUE. Maintain consistency with user-defined conda environment name (default: cubar_env).
- attention_type
type of attention mechanism to use in model - 'block_sparse' for memory efficient or 'original_full' (default) for standard attention. Required when method is set to "CodonTransformer".
- deterministic
if TRUE (default), uses deterministic decoding (picks most likely tokens). If "False", samples tokens based on probabilities adjusted by temperature. Required when method is set to "CodonTransformer".
- temperature
controls randomness in non-deterministic mode. Lower values (0.2) are conservative and pick high probability tokens, while higher values (0.8) allow more diversity. Must be positive. Required when method is set to "CodonTransformer". Default is 0.2.
- top_p
nucleus sampling threshold - only tokens with cumulative probability up to this value are considered. Balances diversity and quality of predictions. Must be between 0 and 1. Required when method is set to "CodonTransformer". Default is 0.95.
- match_protein
constrains predictions to only use codons that translate back to the exact input protein sequence. Only recommended when using high temperatures or error prone input proteins (e.g. not starting with methionine or having numerous repetitions). Default is FALSE.
- spliceai
TRUE or FALSE (default). whether to run spliceai to predict possible splice junction sites. This option derives from the tool SpliceAI.
Value
a DNAString of the optimized coding sequence when num_sequences is set to 1 and spliceai is FALSE, or a DNAStringSet of the optimized coding sequences when num_sequences is large than 1 and spliceai is FALSE, or a data.table object, including columns of candidate optimized sequences and columns indicating the possibility of splice sites when spliceai is TRUE.
References
Fallahpour A, Gureghian V, Filion GJ, Lindner AB, Pandi A. CodonTransformer: a multispecies codon optimizer using context-aware neural networks. Nat Commun. 2025 Apr 3;16(1):3205.
Jaganathan K, Panagiotopoulou S K, McRae J F, et al. Predicting splicing from primary sequence with deep learning[J].Cell, 2019, 176(3): 535-548. e24.
Examples
cf_all <- count_codons(yeast_cds)
optimal_codons <- est_optimal_codons(cf_all)
seq <- 'ATGCTACGA'
# method "naive":
codon_optimize(seq, optimal_codons)
#> 9-letter DNAString object
#> seq: ATGCTACGT
# method "IDT":
codon_optimize(seq, cf = cf_all, method = "IDT")
#> 9-letter DNAString object
#> seq: ATGCTACGT
codon_optimize(seq, cf = cf_all, method = "IDT", num_sequences = 10)
#> DNAStringSet object of length 8:
#> width seq
#> [1] 9 ATGCTTCGT
#> [2] 9 ATGCTACGT
#> [3] 9 ATGCTACGC
#> [4] 9 ATGCTACGG
#> [5] 9 ATGCTACGA
#> [6] 9 ATGCTGCGT
#> [7] 9 ATGCTTCGC
#> [8] 9 ATGCTTCGA
# method "CodonTransformer":
seq_opt <- codon_optimize(seq, method = "CodonTransformer", organism = "Saccharomyces cerevisiae")
#> Error in reticulate::use_condaenv(envname): Unable to locate conda environment 'cubar_env'.
print(seq_opt)
#> Error: object 'seq_opt' not found
seqs_opt <- codon_optimize(seq, method = "CodonTransformer", organism = "Saccharomyces cerevisiae",
num_sequences = 10, deterministic =FALSE, temperature = 0.4)
#> Error in reticulate::use_condaenv(envname): Unable to locate conda environment 'cubar_env'.
print(seqs_opt)
#> Error: object 'seqs_opt' not found
seqs_opt <- codon_optimize(seq, cf = cf_all, method = "IDT", num_sequences = 10, spliceai = TRUE)
#> Error in reticulate::use_condaenv(envname): Unable to locate conda environment 'cubar_env'.
print(seqs_opt)
#> Error: object 'seqs_opt' not found
seq_opt <- codon_optimize(seq, method = "CodonTransformer", organism = "Saccharomyces cerevisiae",
spliceai = TRUE)
#> Error in reticulate::use_condaenv(envname): Unable to locate conda environment 'cubar_env'.
print(seq_opt)
#> Error: object 'seq_opt' not found