Summarize (concatenate) all predictions of a LTRpred.meta run

Crawl through all genome predictions performed with LTRpred.meta and concatenate the prediction files for each species in the meta result folder generated by LTRpred.meta to a meta-species data.frame.

meta.summarize(
  result.folder,
  ltr.similarity = 70,
  quality.filter = TRUE,
  n.orfs = 0,
  strategy = "default"
)

Arguments

result.folder	path to meta result folder generated by `LTRpred.meta`.
ltr.similarity	only count elements that have an LTR similarity >= this threshold.
quality.filter	optimize search to remove potential false positives (e.g. duplicated genes, etc.). See `Details` for further information on the filter criteria.
n.orfs	minimum number of Open Reading Frames that must be found between the LTRs (if `quality.filter = TRUE`). See `Details` for further information on quality control.
strategy	quality filter strategy. Options are `strategy = "default"` : see section `Quality Control` `strategy = "stringent"` : in addition to filter criteria specified in section `Quality Control`, the filter criteria `!is.na(protein_domain)) \| (dfam_target_name != "unknown")` is applied

Value

a LTRpred.tbl storing the LTRpred prediction data.frames for all species in the meta result folder generated by LTRpred.meta.

Details

This function crawls through each genome stored in the meta result folder generated by LTRpred.meta and performs the following procedures:

Step 1: For each genome: Read the *._LTRpred_DataSheet.csv file generated by LTRpred.
Step 2: For each genome: Perform quality filtering and selection of elements having at least ltr.similarity sequence similarity between their LTRs (if quality.filter = TRUE). Otherwise no quality filtering is performed.
Step 3: Summarize all genome predictions in the meta-folder to one meta-species data.frame.

Quality Filtering

The aim of the quality filtering step is to reduce the potential false positive LTR transposons that were predicted by LTRpred. These false positives can be duplicated genes, or other homologous repetitive elements that fulfill the LTR similarity criteria, but do not have any Primer Binding Site, Open Reading Frames, Gag and Pol proteins, etc. To reduce the number of false positives, the following filters are applied to discard false positive LTR transposons.

ltr.similarity: Minimum similarity between LTRs. All TEs not matching this criteria are discarded.
n.orfs: minimum number of Open Reading Frames that must be found between the LTRs. All TEs not matching this criteria are discarded.
PBS or Protein Match: elements must either have a predicted Primer Binding Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.

Author

Hajk-Georg Drost