codon_optimize
redesigns a coding sequence by replacing each codon
with its optimal synonymous alternative, while maintaining the same amino
acid sequence. This function supports multiple optimization methods and is
useful for improving protein expression in heterologous systems.
Usage
codon_optimize(
seq,
optimal_codons = optimal_codons,
cf = NULL,
codon_table = get_codon_table(),
level = "subfam",
method = "naive",
num_sequences = 1,
organism = NULL,
envname = "cubar_env",
attention_type = "original_full",
deterministic = TRUE,
temperature = 0.2,
top_p = 0.95,
match_protein = FALSE,
spliceai = FALSE
)
Arguments
- seq
A coding sequence as a DNAString object or any object that can be coerced to DNAString. The sequence should not include stop codons.
- optimal_codons
A table of optimal codons as generated by
est_optimal_codons()
, containing optimality information for each codon.- cf
Matrix of codon frequencies from
count_codons()
. Required for "IDT" method to determine codon frequency distributions.- codon_table
A codon table defining the genetic code, derived from
get_codon_table()
orcreate_codon_table()
.- level
Character string specifying optimization level: "subfam" (default, within codon subfamilies) or "amino_acid" (within amino acid groups). Required for "naive" and "IDT" methods.
- method
Character string specifying the optimization algorithm:
"naive"
(default): Simple replacement with optimal codons"IDT"
: Method from Integrated DNA Technologies tool"CodonTransformer"
: Neural network-based optimization
- num_sequences
Integer. Number of different optimized sequences to generate (default: 1). For "CodonTransformer" with deterministic=FALSE, each sequence is independently sampled.
- organism
Organism identifier (integer ID or string name) for "CodonTransformer" method. Must be from ORGANISM2ID in CodonUtils (e.g., "Escherichia coli general").
- envname
Environment name for "CodonTransformer" method. Should match your conda environment name (default: "cubar_env").
- attention_type
Attention mechanism type for "CodonTransformer":
"original_full"
(default): Standard attention"block_sparse"
: Memory-efficient sparse attention
- deterministic
Logical. For "CodonTransformer" method:
TRUE
(default): Deterministic decoding (most likely tokens)FALSE
: Probabilistic sampling based on temperature
- temperature
Numeric. Controls randomness in non-deterministic mode for "CodonTransformer". Lower values (0.2, default) are conservative; higher values (0.8) increase diversity. Must be positive.
- top_p
Numeric. Nucleus sampling threshold (0-1) for "CodonTransformer". Only tokens with cumulative probability up to this value are considered. Default: 0.95.
- match_protein
Logical. For "CodonTransformer", constrains predictions to maintain exact amino acid sequence. Recommended for unusual proteins or high temperature settings (default: FALSE).
- spliceai
Logical. Whether to predict splice sites using SpliceAI (default: FALSE). Requires appropriate environment setup.
Value
The return type depends on parameters:
Single DNAString: When num_sequences=1 and spliceai=FALSE
DNAStringSet: When num_sequences>1 and spliceai=FALSE
data.table: When spliceai=TRUE (includes sequences and splice predictions)
References
Fallahpour A, Gureghian V, Filion GJ, Lindner AB, Pandi A. CodonTransformer: a multispecies codon optimizer using context-aware neural networks. Nat Commun. 2025 Apr 3;16(1):3205.
Jaganathan K, Panagiotopoulou S K, McRae J F, et al. Predicting splicing from primary sequence with deep learning. Cell, 2019, 176(3): 535-548.e24.
Examples
cf_all <- count_codons(yeast_cds)
optimal_codons <- est_optimal_codons(cf_all)
seq <- 'ATGCTACGA'
# method "naive":
codon_optimize(seq, optimal_codons)
#> 9-letter DNAString object
#> seq: ATGCTACGT
# method "IDT":
codon_optimize(seq, cf = cf_all, method = "IDT")
#> 9-letter DNAString object
#> seq: ATGCTACGT
codon_optimize(seq, cf = cf_all, method = "IDT", num_sequences = 10)
#> DNAStringSet object of length 8:
#> width seq
#> [1] 9 ATGCTTCGT
#> [2] 9 ATGCTACGT
#> [3] 9 ATGCTACGC
#> [4] 9 ATGCTACGG
#> [5] 9 ATGCTACGA
#> [6] 9 ATGCTGCGT
#> [7] 9 ATGCTTCGC
#> [8] 9 ATGCTTCGA
# # The following examples requires pre-installation of python package SpliceAI or Codon
# # Transformer. see the codon optimization vignette for further details.
# seq_opt <- codon_optimize(seq, method = "CodonTransformer",
# organism = "Saccharomyces cerevisiae")
# seqs_opt <- codon_optimize(seq, method = "CodonTransformer",
# organism = "Saccharomyces cerevisiae", num_sequences = 10,
# deterministic =FALSE, temperature = 0.4)
# seqs_opt <- codon_optimize(seq, cf = cf_all, method = "IDT",
# num_sequences = 10, spliceai = TRUE)
# seq_opt <- codon_optimize(seq, method = "CodonTransformer",
# organism = "Saccharomyces cerevisiae", spliceai = TRUE)