check_cds
performs comprehensive quality control on coding sequences (CDS)
by filtering sequences based on various criteria and optionally removing start
or stop codons. This function ensures that sequences meet the requirements for
downstream codon usage analysis.
Usage
check_cds(
seqs,
codon_table = get_codon_table(),
min_len = 6,
check_len = TRUE,
check_start = TRUE,
check_stop = TRUE,
check_istop = TRUE,
rm_start = TRUE,
rm_stop = TRUE,
start_codons = c("ATG")
)
Arguments
- seqs
Input CDS sequences as a DNAStringSet or compatible object.
- codon_table
Codon table matching the genetic code of the input sequences. Generated using
get_codon_table()
orcreate_codon_table()
.- min_len
Minimum CDS length in nucleotides (default: 6).
- check_len
Logical. Check whether CDS length is divisible by 3 (default: TRUE).
- check_start
Logical. Check whether CDSs begin with valid start codons (default: TRUE).
- check_stop
Logical. Check whether CDSs end with valid stop codons (default: TRUE).
- check_istop
Logical. Check for internal stop codons (default: TRUE).
- rm_start
Logical. Remove start codons from the sequences (default: TRUE).
- rm_stop
Logical. Remove stop codons from the sequences (default: TRUE).
- start_codons
Character vector specifying valid start codons (default: "ATG").
Value
A DNAStringSet containing filtered and optionally trimmed CDS sequences that pass all quality control checks.
Examples
# Perform CDS sequence quality control for a sample of yeast genes
s <- head(yeast_cds, 10)
print(s)
#> DNAStringSet object of length 10:
#> width seq names
#> [1] 471 ATGAGTTCCCGGTTTGCAAGAAG...TGATGTGGATATGGATGCGTAA YPL071C
#> [2] 432 ATGTCTAGATCTGGTGTTGCTGT...CAGAGGCGCTGGTTCTCATTAA YLL050C
#> [3] 2160 ATGTCTGGAATGGGTATTGCGAT...AGAGAGCCTTGCTGGAATATAG YMR172W
#> [4] 663 ATGTCAGCACCTGCTCAAAACAA...TGAAGACGATGCTGATTTATAA YOR185C
#> [5] 2478 ATGGATAACTTCAAAATTTACAG...ATATCAAAATGGCAGAAAATGA YLL032C
#> [6] 2703 ATGGGCTCCAATAAGGAAGCAAA...AAAGCTGCCATATACCAAATAA YBR225W
#> [7] 1488 ATGAAAACTGATAGATTACTGAT...TCAGGCTCATTTTGCAATCTAA YEL041W
#> [8] 1305 ATGTCTCAACACGCAAGCTCATC...GGAGAACGAAATTACTATATAA YOR237W
#> [9] 1413 ATGACTATCCCTGGAAGATTTAT...CTGCTCTGGTATACATAAATAA YMR027W
#> [10] 195 ATGAAGATTTTCACGCTGTATAC...TGGCACTCACACTACGCACTAG YBR182C-A
check_cds(s)
#> DNAStringSet object of length 10:
#> width seq names
#> [1] 465 AGTTCCCGGTTTGCAAGAAGTAA...TACTGATGTGGATATGGATGCG YPL071C
#> [2] 426 TCTAGATCTGGTGTTGCTGTTGC...CAGCAGAGGCGCTGGTTCTCAT YLL050C
#> [3] 2154 TCTGGAATGGGTATTGCGATTCT...GCAAGAGAGCCTTGCTGGAATA YMR172W
#> [4] 657 TCAGCACCTGCTCAAAACAATGC...TGATGAAGACGATGCTGATTTA YOR185C
#> [5] 2472 GATAACTTCAAAATTTACAGTAC...TAAATATCAAAATGGCAGAAAA YLL032C
#> [6] 2697 GGCTCCAATAAGGAAGCAAAAAA...GCCAAAGCTGCCATATACCAAA YBR225W
#> [7] 1482 AAAACTGATAGATTACTGATTAA...TCGTCAGGCTCATTTTGCAATC YEL041W
#> [8] 1299 TCTCAACACGCAAGCTCATCTTC...GAGGGAGAACGAAATTACTATA YOR237W
#> [9] 1407 ACTATCCCTGGAAGATTTATGAC...TTTCTGCTCTGGTATACATAAA YMR027W
#> [10] 189 AAGATTTTCACGCTGTATACCAT...TAGTGGCACTCACACTACGCAC YBR182C-A