R/quality.filter.R
quality.filter.Rd
This function takes an LTRpred
output table as input
and eliminates false positive predictions.
quality.filter(pred, sim, n.orfs, strategy = "default")
pred |
|
---|---|
sim | LTR similarity threshold. Only putative LTR transposons that fulfill this LTR similarity threshold will be retained. |
n.orfs | minimum number of ORFs detected in the putative LTR transposon. |
strategy | quality filter strategy. Options are
|
A quality filtered LTRpred.tbl
.
Quality Control
ltr.similarity
: Minimum similarity between LTRs. All TEs not matching this
criteria are discarded.
n.orfs
: minimum number of Open Reading Frames that must be found between the
LTRs. All TEs not matching this criteria are discarded.
PBS or Protein Match
: elements must either have a predicted Primer Binding
Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.
The relative number of N's (= nucleotide not known) in TE <= 0.1. The relative number of N's is computed as follows: absolute number of N's in TE / width of TE.
Hajk-Georg Drost
# example prediction file generated by LTRpred pred.file <- system.file("Athaliana_TAIR10_chr_all_LTRpred_DataSheet.csv", package = "LTRpred") # read LTRpred generated prediction file (data sheet) pred <- read.ltrpred(pred.file) # apply quality filter pred <- quality.filter(pred, sim = 70, n.orfs = 1) #> The LTRpred prediction table has been filtered (default) to remove potential false positives. Predicted LTRs must have an PBS or Protein Domain and must fulfill thresholds: sim = 70%; #orfs = 1. Furthermore, TEs having more than 10% of N's in their sequence have also been removed. #> Input #TEs: 458 #> Output #TEs: 202