Skip to contents

codon_optimize redesigns a coding sequence by replacing each codon with its optimal synonymous alternative, while maintaining the same amino acid sequence. This function supports multiple optimization methods and is useful for improving protein expression in heterologous systems.

Usage

codon_optimize(
  seq,
  optimal_codons = optimal_codons,
  cf = NULL,
  codon_table = get_codon_table(),
  level = "subfam",
  method = "naive",
  num_sequences = 1,
  organism = NULL,
  envname = "cubar_env",
  attention_type = "original_full",
  deterministic = TRUE,
  temperature = 0.2,
  top_p = 0.95,
  match_protein = FALSE,
  spliceai = FALSE
)

Arguments

seq

A coding sequence as a DNAString object or any object that can be coerced to DNAString. The sequence should not include stop codons.

optimal_codons

A table of optimal codons as generated by est_optimal_codons(), containing optimality information for each codon.

cf

Matrix of codon frequencies from count_codons(). Required for "IDT" method to determine codon frequency distributions.

codon_table

A codon table defining the genetic code, derived from get_codon_table() or create_codon_table().

level

Character string specifying optimization level: "subfam" (default, within codon subfamilies) or "amino_acid" (within amino acid groups). Required for "naive" and "IDT" methods.

method

Character string specifying the optimization algorithm:

  • "naive" (default): Simple replacement with optimal codons

  • "IDT": Method from Integrated DNA Technologies tool

  • "CodonTransformer": Neural network-based optimization

num_sequences

Integer. Number of different optimized sequences to generate (default: 1). For "CodonTransformer" with deterministic=FALSE, each sequence is independently sampled.

organism

Organism identifier (integer ID or string name) for "CodonTransformer" method. Must be from ORGANISM2ID in CodonUtils (e.g., "Escherichia coli general").

envname

Environment name for "CodonTransformer" method. Should match your conda environment name (default: "cubar_env").

attention_type

Attention mechanism type for "CodonTransformer":

  • "original_full" (default): Standard attention

  • "block_sparse": Memory-efficient sparse attention

deterministic

Logical. For "CodonTransformer" method:

  • TRUE (default): Deterministic decoding (most likely tokens)

  • FALSE: Probabilistic sampling based on temperature

temperature

Numeric. Controls randomness in non-deterministic mode for "CodonTransformer". Lower values (0.2, default) are conservative; higher values (0.8) increase diversity. Must be positive.

top_p

Numeric. Nucleus sampling threshold (0-1) for "CodonTransformer". Only tokens with cumulative probability up to this value are considered. Default: 0.95.

match_protein

Logical. For "CodonTransformer", constrains predictions to maintain exact amino acid sequence. Recommended for unusual proteins or high temperature settings (default: FALSE).

spliceai

Logical. Whether to predict splice sites using SpliceAI (default: FALSE). Requires appropriate environment setup.

Value

The return type depends on parameters:

  • Single DNAString: When num_sequences=1 and spliceai=FALSE

  • DNAStringSet: When num_sequences>1 and spliceai=FALSE

  • data.table: When spliceai=TRUE (includes sequences and splice predictions)

References

Fallahpour A, Gureghian V, Filion GJ, Lindner AB, Pandi A. CodonTransformer: a multispecies codon optimizer using context-aware neural networks. Nat Commun. 2025 Apr 3;16(1):3205.

Jaganathan K, Panagiotopoulou S K, McRae J F, et al. Predicting splicing from primary sequence with deep learning. Cell, 2019, 176(3): 535-548.e24.

Examples

cf_all <- count_codons(yeast_cds)
optimal_codons <- est_optimal_codons(cf_all)
seq <- 'ATGCTACGA'
# method "naive":
codon_optimize(seq, optimal_codons)
#> 9-letter DNAString object
#> seq: ATGCTACGT
# method "IDT":
codon_optimize(seq, cf = cf_all, method = "IDT")
#> 9-letter DNAString object
#> seq: ATGCTACGT
codon_optimize(seq, cf = cf_all, method = "IDT", num_sequences = 10)
#> DNAStringSet object of length 8:
#>     width seq
#> [1]     9 ATGCTTCGT
#> [2]     9 ATGCTACGT
#> [3]     9 ATGCTACGC
#> [4]     9 ATGCTACGG
#> [5]     9 ATGCTACGA
#> [6]     9 ATGCTGCGT
#> [7]     9 ATGCTTCGC
#> [8]     9 ATGCTTCGA

# # The following examples requires pre-installation of python package SpliceAI or Codon
#  # Transformer. see the codon optimization vignette for further details.
# seq_opt <- codon_optimize(seq, method = "CodonTransformer",
#     organism = "Saccharomyces cerevisiae")
# seqs_opt <- codon_optimize(seq, method = "CodonTransformer",
#     organism = "Saccharomyces cerevisiae", num_sequences = 10,
#     deterministic =FALSE, temperature = 0.4)
# seqs_opt <- codon_optimize(seq, cf = cf_all, method = "IDT",
#     num_sequences = 10, spliceai = TRUE)
# seq_opt <- codon_optimize(seq, method = "CodonTransformer",
#     organism = "Saccharomyces cerevisiae", spliceai = TRUE)