Skip to contents

Overview

cubar is a package for codon usage bias analysis in R. Main features are as follows:

  • Codon level analyses
    • Calculate tRNA weights;
    • Calculate relative synonymous codon usage (RSCU);
    • Machine learning-based inference of optimal codons;
    • Visualization codon-anticodon pairing relationships;
  • Gene level analyses
    • Tabulate codon frequency of each coding sequence;
    • Measure codon usage similarity to highly expressed genes with Codon Adaptation Index (CAI);
    • Quantify the influnce of codon usage on mRNA stability with Mean Codon Stabilization Coefficients (CSCg);
    • Measure codon usage bias with the nonparametric index Effective number of codons (ENC);
    • Measure the fraction of pre-determined optimal codons (Fop) in each sequence;
    • Overall GC content (GC) or that of 3rd synonymous positions (GC3s) or 4-fold degenerate sites (GC4d);
    • Quantify whether codon usage matches tRNA availability using tRNA Adaptation Index (tAI);
  • Utilities
    • Sliding window analysis of codon usage within a coding sequence;
    • Optimize codon usage based on optimal codons for heterologous expression;
    • Test differential usage of codons between two sets of sequences;

Main advantages of cubar are as follows: - Process large datasets (>10,0000 sequences) efficiently using the Biostrings and data.table backends; - Support genetic codes cataloged by NCBI as well as custom ones; - Integrate with other data analysis or bioinformatic packages in the R ecosystem;

Dependencies

Depends

  • R (>= 4.1.0)

Imports

  • Biostrings (>= 2.60.0),
  • IRanges (>= 2.34.0),
  • data.table (>= 1.14.0),
  • ggplot2 (>= 3.3.5),
  • rlang (>= 0.4.11)

Installation

The latest release of cubar can be installed with:

The latest developmental version of cubar can be installed with:

devtools::install_github("mt1022/cubar", dependencies = TRUE)

Usage

Documentation can be found within R (by typing ?function_name). The following tutorials are available from our website:

Getting help

Please use GitHub issues for bug reports, questions, and feature requests.

Suggests

  • Biostrings for sequence input/output and manipulation;
  • Peptides for peptide- or protein-related indices;

Acknowledgements

GitHub Copilot was used to suggest code snippets in the development of this package. Thanks the GitHub Education teacher program for providing free access to GitHub Copilot.