check_cds
performs quality control of CDS sequences by filtering some
peculiar sequences and optionally remove start or stop codons.
Usage
check_cds(
seqs,
codon_table = get_codon_table(),
min_len = 6,
check_len = TRUE,
check_start = TRUE,
check_stop = TRUE,
check_istop = TRUE,
rm_start = TRUE,
rm_stop = TRUE,
start_codons = c("ATG")
)
Arguments
- seqs
input CDS sequences
- codon_table
codon table matching the genetic code of
seqs
- min_len
minimum CDS length in nt
- check_len
check whether CDS length is divisible by 3
- check_start
check whether CDSs have start codons
- check_stop
check whether CDSs have stop codons
- check_istop
check internal stop codons
- rm_start
whether to remove start codons
- rm_stop
whether to remove stop codons
- start_codons
vector of start codons
Examples
# CDS sequence QC for a sample of yeast genes
s <- head(yeast_cds, 10)
print(s)
#> DNAStringSet object of length 10:
#> width seq names
#> [1] 471 ATGAGTTCCCGGTTTGCAAGAAG...TGATGTGGATATGGATGCGTAA YPL071C
#> [2] 432 ATGTCTAGATCTGGTGTTGCTGT...CAGAGGCGCTGGTTCTCATTAA YLL050C
#> [3] 2160 ATGTCTGGAATGGGTATTGCGAT...AGAGAGCCTTGCTGGAATATAG YMR172W
#> [4] 663 ATGTCAGCACCTGCTCAAAACAA...TGAAGACGATGCTGATTTATAA YOR185C
#> [5] 2478 ATGGATAACTTCAAAATTTACAG...ATATCAAAATGGCAGAAAATGA YLL032C
#> [6] 2703 ATGGGCTCCAATAAGGAAGCAAA...AAAGCTGCCATATACCAAATAA YBR225W
#> [7] 1488 ATGAAAACTGATAGATTACTGAT...TCAGGCTCATTTTGCAATCTAA YEL041W
#> [8] 1305 ATGTCTCAACACGCAAGCTCATC...GGAGAACGAAATTACTATATAA YOR237W
#> [9] 1413 ATGACTATCCCTGGAAGATTTAT...CTGCTCTGGTATACATAAATAA YMR027W
#> [10] 195 ATGAAGATTTTCACGCTGTATAC...TGGCACTCACACTACGCACTAG YBR182C-A
check_cds(s)
#> DNAStringSet object of length 10:
#> width seq names
#> [1] 465 AGTTCCCGGTTTGCAAGAAGTAA...TACTGATGTGGATATGGATGCG YPL071C
#> [2] 426 TCTAGATCTGGTGTTGCTGTTGC...CAGCAGAGGCGCTGGTTCTCAT YLL050C
#> [3] 2154 TCTGGAATGGGTATTGCGATTCT...GCAAGAGAGCCTTGCTGGAATA YMR172W
#> [4] 657 TCAGCACCTGCTCAAAACAATGC...TGATGAAGACGATGCTGATTTA YOR185C
#> [5] 2472 GATAACTTCAAAATTTACAGTAC...TAAATATCAAAATGGCAGAAAA YLL032C
#> [6] 2697 GGCTCCAATAAGGAAGCAAAAAA...GCCAAAGCTGCCATATACCAAA YBR225W
#> [7] 1482 AAAACTGATAGATTACTGATTAA...TCGTCAGGCTCATTTTGCAATC YEL041W
#> [8] 1299 TCTCAACACGCAAGCTCATCTTC...GAGGGAGAACGAAATTACTATA YOR237W
#> [9] 1407 ACTATCCCTGGAAGATTTATGAC...TTTCTGCTCTGGTATACATAAA YMR027W
#> [10] 189 AAGATTTTCACGCTGTATACCAT...TAGTGGCACTCACACTACGCAC YBR182C-A