Overview
cubar is a package for codon usage bias analysis in R. Main features are as follows:
- Codon level analyses
- Calculate codon weights based on gene expression, tRNA availability, and mRNA stability;
- Calculate relative synonymous codon usage (RSCU);
- Machine learning-based inference of optimal codons;
- Visualization codon-anticodon pairing relationships;
- Gene level analyses
- Tabulate codon frequency of each coding sequence;
- Measure codon usage similarity to highly expressed genes with Codon Adaptation Index (CAI);
- Quantify the influence of codon usage on mRNA stability with Mean Codon Stabilization Coefficients (CSCg);
- Measure codon usage bias with the nonparametric index Effective number of codons (ENC);
- Measure the fraction of pre-determined optimal codons (Fop) in each sequence;
- Overall GC content (GC) or that of 3rd synonymous positions (GC3s) or 4-fold degenerate sites (GC4d);
- Quantify whether codon usage matches tRNA availability using tRNA Adaptation Index (tAI);
- Measure the deviation from porportionality (Dp) of viral synonymous codon usage from host tRNA supply;
- Utilities
- Sliding window analysis of codon usage within a coding sequence;
- Optimize codon usage based on optimal codons for heterologous expression;
- Test differential usage of codons between two sets of sequences;
Main advantages of cubar
are as follows: - Process large datasets (>10,0000 sequences) efficiently using the Biostrings
and data.table
backends; - Support genetic codes cataloged by NCBI as well as custom ones; - Integrate with other data analysis or bioinformatic packages in the R ecosystem;
Dependencies
Depends
-
R
(>= 4.1.0)
Imports
-
Biostrings
(>= 2.60.0), -
IRanges
(>= 2.34.0), -
data.table
(>= 1.14.0), -
ggplot2
(>= 3.3.5), -
rlang
(>= 0.4.11)
Installation
The latest release of cubar
can be installed with:
install.packages("cubar")
The latest developmental version of cubar
can be installed with:
devtools::install_github("mt1022/cubar", dependencies = TRUE)
Usage
Documentation can be found within R (by typing ?function_name
). The following tutorials are available from our website:
-
Get Started: A brief introduction demonstrating the basic usage of
cubar
; -
Non-standard Genetic Code: How to use
cubar
with non-standard genetic codes; -
Theories behind cubar: The mathematical details behind the core functions in
cubar
;
Getting help
Please use GitHub issues for bug reports, questions, and feature requests.
Suggests
- Biostrings for sequence input/output and manipulation;
- Peptides for peptide- or protein-related indices;
Acknowledgements
GitHub Copilot was used to suggest code snippets in the development of this package. Thanks the GitHub Education teacher program for providing free access to GitHub Copilot.