quality.filter.meta
for several different ltr similarity thresholds.R/generate.multi.quality.filter.meta.R
generate.multi.quality.filter.meta.Rd
A helper function to apply the quality.filter
function to diverse LTRpred
annotations while probing different ltr similarity thresholds.
generate.multi.quality.filter.meta( kingdom, genome.folder, ltrpred.meta.folder, sim.options, cut.range.options, n.orfs = 0, strategy = "default", update = FALSE )
kingdom | the taxonomic kingdom of the species for which |
---|---|
genome.folder | a file path to a folder storing the genome assembly files in fasta format that
were used to generate |
ltrpred.meta.folder | a file path to a folder storing |
sim.options | a numeric vector storing the ltr similarity thresholds that shall be probed. |
cut.range.options | a numeric vector storing the similarity cut range thresholds that shall be probed. |
n.orfs | minimum number of open reading frames a predicted retroelement shall possess. |
strategy | quality filter strategy. Options are
|
update | shall already existing |
A list with to list elements sim_file
and gm_file
. Each list element stores a data.frame
:
sim_file
(similarity file)
gm_file
(genome metrics file)
Quality Control
ltr.similarity
: Minimum similarity between LTRs. All TEs not matching this
criteria are discarded.
n.orfs
: minimum number of Open Reading Frames that must be found between the
LTRs. All TEs not matching this criteria are discarded.
PBS or Protein Match
: elements must either have a predicted Primer Binding
Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.
The relative number of N's (= nucleotide not known) in TE <= 0.1. The relative number of N's is computed as follows: absolute number of N's in TE / width of TE.
Hajk-Georg Drost