The codon_optimize
function optimizes sequences
according to the synonymous optimal codon while also integrating IDT’s
sequence optimization approach. Additionally, it employs the
CodonTransformer method for sequence optimization and utilizes SpliceAI
to identify potential splice sites within the optimized sequences. It is
essential for users to set up a command-line environment using
conda
or mamba
prior to applying the
CodonTransformer method and SpliceAI in cubar:
conda create -n cubar_env python=3.12 r-base blas=*=netlib r-reticulate
conda activate cubar_env
# install CodonTransformer and SpliceAI
pip install CodonTransformer tensorflow spliceai
The default “naive” method involves optimizing the sequence through the use of synonymous codons. Any codon that is not currently optimized will be substituted with an appropriate optimized codon from its respective family or subfamily, if available.
library(cubar)
seq <- 'ATGCTACGA'
cf_all <- count_codons(yeast_cds)
#> Loading required namespace: Biostrings
optimal_codons <- est_optimal_codons(cf_all)
seq_opt <- codon_optimize(seq, optimal_codons)
print(seq_opt)
#> 9-letter DNAString object
#> seq: ATGCTACGT
The “IDT” method originates from the principle of the codon optimization tool of Integrated DNA Technologies. It randomly selects codons based on the frequency of codons at the family or subfamily level, but excludes rare codons below 10%.
seq_opt <- codon_optimize(seq, cf = cf_all, method = "IDT")
print(seq_opt)
#> 9-letter DNAString object
#> seq: ATGCTGCGA
The “codontransformer” method originates from the machine learning tool CodonTransformer, which integrates its predict_rna_Sequence function and uses its pre trained model to optimize sequences.
seq_opt <- codon_optimize(seq, method = "CodonTransformer", organism = "Saccharomyces cerevisiae")
print(seq_opt)
Cubar can generate several optimized sequences at the same time using
the argument num_sequences
with the method “IDT” and
“CodonTransformer”. When num_sequences
is greater than 1,
identical duplicate sequences will be retained as a single copy,
potentially resulting in a final sequence count less than the specified
value.
seqs_opt <- codon_optimize(seq, cf = cf_all, method = "IDT", num_sequences = 10)
print(seqs_opt)
seqs_opt <- codon_optimize(seq, method = "CodonTransformer", organism = "Saccharomyces cerevisiae",
num_sequences = 10, deterministic =FALSE, temperature = 0.4)
print(seqs_opt)
In addition, cubar integrated the deep learning tool SpliceAI to
identify potential splice sites with the argument spliceai
.
When the probability scores of non-splice site for each base are greater
than 0.5, it is considered that there are no potential splice junction
sites, and the Possible_splice_junction
in the output is
marked as FALSE, otherwise it is marked as TRUE.
seqs_opt <- codon_optimize(seq, cf = cf_all, method = "IDT", num_sequences = 10, spliceai = TRUE)
print(seqs_opt)
seq_opt <- codon_optimize(seq, method = "CodonTransformer", organism = "Saccharomyces cerevisiae", spliceai = TRUE)
print(seq_opt)