check_cds
performs quality control of CDS sequences by filtering some
peculiar sequences and optionally remove start or stop codons.
Usage
check_cds(
seqs,
codon_table = get_codon_table(),
min_len = 6,
check_len = TRUE,
check_start = TRUE,
check_stop = TRUE,
check_istop = TRUE,
rm_start = TRUE,
rm_stop = TRUE,
start_codons = c("ATG")
)
Arguments
- seqs
input CDS sequences
- codon_table
codon table matching the genetic code of
seqs
- min_len
minimum CDS length in nt
- check_len
check whether CDS length is divisible by 3
- check_start
check whether CDSs have start codons
- check_stop
check whether CDSs have stop codons
- check_istop
check internal stop codons
- rm_start
whether to remove start codons
- rm_stop
whether to remove stop codons
- start_codons
vector of start codons
Examples
# CDS sequence QC for a sample of yeast genes
s <- head(yeast_cds, 10)
#> Loading required package: Biostrings
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: XVector
#> Loading required package: GenomeInfoDb
#>
#> Attaching package: ‘Biostrings’
#> The following object is masked from ‘package:base’:
#>
#> strsplit
print(s)
#> DNAStringSet object of length 10:
#> width seq names
#> [1] 471 ATGAGTTCCCGGTTTGCAAGAAG...TGATGTGGATATGGATGCGTAA YPL071C
#> [2] 432 ATGTCTAGATCTGGTGTTGCTGT...CAGAGGCGCTGGTTCTCATTAA YLL050C
#> [3] 2160 ATGTCTGGAATGGGTATTGCGAT...AGAGAGCCTTGCTGGAATATAG YMR172W
#> [4] 663 ATGTCAGCACCTGCTCAAAACAA...TGAAGACGATGCTGATTTATAA YOR185C
#> [5] 2478 ATGGATAACTTCAAAATTTACAG...ATATCAAAATGGCAGAAAATGA YLL032C
#> [6] 2703 ATGGGCTCCAATAAGGAAGCAAA...AAAGCTGCCATATACCAAATAA YBR225W
#> [7] 1488 ATGAAAACTGATAGATTACTGAT...TCAGGCTCATTTTGCAATCTAA YEL041W
#> [8] 1305 ATGTCTCAACACGCAAGCTCATC...GGAGAACGAAATTACTATATAA YOR237W
#> [9] 1413 ATGACTATCCCTGGAAGATTTAT...CTGCTCTGGTATACATAAATAA YMR027W
#> [10] 195 ATGAAGATTTTCACGCTGTATAC...TGGCACTCACACTACGCACTAG YBR182C-A
check_cds(s)
#> DNAStringSet object of length 10:
#> width seq names
#> [1] 465 AGTTCCCGGTTTGCAAGAAGTAA...TACTGATGTGGATATGGATGCG YPL071C
#> [2] 426 TCTAGATCTGGTGTTGCTGTTGC...CAGCAGAGGCGCTGGTTCTCAT YLL050C
#> [3] 2154 TCTGGAATGGGTATTGCGATTCT...GCAAGAGAGCCTTGCTGGAATA YMR172W
#> [4] 657 TCAGCACCTGCTCAAAACAATGC...TGATGAAGACGATGCTGATTTA YOR185C
#> [5] 2472 GATAACTTCAAAATTTACAGTAC...TAAATATCAAAATGGCAGAAAA YLL032C
#> [6] 2697 GGCTCCAATAAGGAAGCAAAAAA...GCCAAAGCTGCCATATACCAAA YBR225W
#> [7] 1482 AAAACTGATAGATTACTGATTAA...TCGTCAGGCTCATTTTGCAATC YEL041W
#> [8] 1299 TCTCAACACGCAAGCTCATCTTC...GAGGGAGAACGAAATTACTATA YOR237W
#> [9] 1407 ACTATCCCTGGAAGATTTATGAC...TTTCTGCTCTGGTATACATAAA YMR027W
#> [10] 189 AAGATTTTCACGCTGTATACCAT...TAGTGGCACTCACACTACGCAC YBR182C-A