LTRpred.meta
runR/meta.summarize.R
meta.summarize.Rd
Crawl through all genome predictions performed with LTRpred.meta
and concatenate the prediction files for each species in the meta result folder
generated by LTRpred.meta
to a meta-species data.frame
.
meta.summarize( result.folder, ltr.similarity = 70, quality.filter = TRUE, n.orfs = 0, strategy = "default" )
result.folder | path to meta result folder generated by |
---|---|
ltr.similarity | only count elements that have an LTR similarity >= this threshold. |
quality.filter | optimize search to remove potential false positives (e.g. duplicated genes, etc.). See |
n.orfs | minimum number of Open Reading Frames that must be found between the LTRs (if |
strategy | quality filter strategy. Options are
|
a LTRpred.tbl
storing the LTRpred
prediction data.frames
for all species in the meta result folder generated by LTRpred.meta
.
This function crawls through each genome stored in the meta result folder
generated by LTRpred.meta
and performs the following procedures:
Step 1: For each genome: Read the *._LTRpred_DataSheet.csv
file generated by LTRpred
.
Step 2: For each genome: Perform quality filtering and selection of elements having at least ltr.similarity
sequence similarity between their LTRs (if quality.filter = TRUE
). Otherwise no quality filtering is performed.
Step 3: Summarize all genome predictions in the meta-folder to one meta-species data.frame
.
Quality Filtering
The aim of the quality filtering step is to reduce the potential false positive
LTR transposons that were predicted by LTRpred
. These false positives can be
duplicated genes, or other homologous repetitive elements that fulfill the LTR similarity
criteria, but do not have any Primer Binding Site, Open Reading Frames, Gag and Pol
proteins, etc. To reduce the number of false positives, the following filters are applied
to discard false positive LTR transposons.
ltr.similarity
: Minimum similarity between LTRs. All TEs not matching this
criteria are discarded.
n.orfs
: minimum number of Open Reading Frames that must be found between the
LTRs. All TEs not matching this criteria are discarded.
PBS or Protein Match
: elements must either have a predicted Primer Binding
Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.
Hajk-Georg Drost