This function takes an LTRpred output table as input and eliminates false positive predictions.

quality.filter(pred, sim, n.orfs, strategy = "default")

Arguments

pred

LTRpred.tbl generated with LTRpred

sim

LTR similarity threshold. Only putative LTR transposons that fulfill this LTR similarity threshold will be retained.

n.orfs

minimum number of ORFs detected in the putative LTR transposon.

strategy

quality filter strategy. Options are

  • strategy = "default" : see section Quality Control

  • strategy = "stringent" : in addition to filter criteria specified in section Quality Control, the filter criteria !is.na(protein_domain)) | (dfam_target_name != "unknown") is applied

Value

A quality filtered LTRpred.tbl.

Details

Quality Control

  • ltr.similarity: Minimum similarity between LTRs. All TEs not matching this criteria are discarded.

  • n.orfs: minimum number of Open Reading Frames that must be found between the LTRs. All TEs not matching this criteria are discarded.

  • PBS or Protein Match: elements must either have a predicted Primer Binding Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.

  • The relative number of N's (= nucleotide not known) in TE <= 0.1. The relative number of N's is computed as follows: absolute number of N's in TE / width of TE.

See also

Author

Hajk-Georg Drost

Examples

# example prediction file generated by LTRpred 
pred.file <- system.file("Athaliana_TAIR10_chr_all_LTRpred_DataSheet.csv", package = "LTRpred")
# read LTRpred generated prediction file (data sheet)
pred <- read.ltrpred(pred.file)
# apply quality filter
pred <- quality.filter(pred, sim = 70, n.orfs = 1)
#> The LTRpred prediction table has been filtered (default) to remove potential false positives. Predicted LTRs must have an PBS or Protein Domain and must fulfill thresholds: sim = 70%; #orfs = 1. Furthermore, TEs having more than 10% of N's in their sequence have also been removed.
#> Input #TEs: 458
#> Output #TEs: 202