This function implements an interface between R and the LTRharvest command line tool to predict putative LTR retrotransposons from R.
LTRharvest( genome.file, index.file = NULL, range = c(0, 0), seed = 30, minlenltr = 100, maxlenltr = 3500, mindistltr = 4000, maxdistltr = 25000, similar = 70, mintsd = 4, maxtsd = 20, vic = 60, overlaps = "no", xdrop = 5, mat = 2, mis = -2, ins = -3, del = -3, motif = NULL, motifmis = 0, output.path = NULL, verbose = TRUE )
genome.file | path to the genome file in |
---|---|
index.file | specify the name of the enhanced suffix array index file that is computed
by |
range | define the genomic interval in which predicted LTR transposons shall be reported
. In case |
seed | the minimum length for the exact maximal repeats. Only repeats with the specified minimum length are considered in all subsequent analyses. Default is |
minlenltr | minimum LTR length. Default is |
maxlenltr | maximum LTR length. Default is |
mindistltr | minimum distance of LTR starting positions. Default is |
maxdistltr | maximum distance of LTR starting positions. Default is |
similar | minimum similarity value between the two LTRs in percent. |
mintsd | minimum target site duplications (TSDs) length. If no search for TSDs
shall be performed, then specify |
maxtsd | maximum target site duplications (TSDs) length. If no search for TSDs
shall be performed, then specify |
vic | number of nucleotide positions left and right (the vicinity) of the predicted
boundary of a LTR that will be searched for TSDs and/or one motif (if specified).
Default is |
overlaps | specify how overlapping LTR retrotransposon predictions shall be treated.
If |
xdrop | specify the xdrop value (> 0) for extending a seed repeat in both directions
allowing for matches, mismatches, insertions, and deletions. The xdrop extension process
stops as soon as the extension involving matches, mismatches, insersions, and deletions
has a score smaller than T -X, where T denotes the largest score seen so far. Default is |
mat | specify the positive match score for the X-drop extension process. Default is |
mis | specify the negative mismatch score for the X-drop extension process. Default is |
ins | specify the negative insertion score for the X-drop extension process. Default is |
del | specify the negative deletion score for the X-drop extension process. Default is |
motif | specify 2 nucleotides for the starting motif and 2 nucleotides for the ending
motif at the beginning and the ending of each LTR, respectively.
Only palindromic motif sequences - where the motif sequence is equal to its complementary
sequence read backwards - are allowed, e.g. |
motifmis | allowed number of mismatches in the TSD motif specified in |
output.path | a path/folder to store all results returned by |
verbose | logical value indicating whether or not detailed information shall be printed on the console. |
The LTRharvest
function generates the following output files:
*_BetweenLTRSeqs.fsa : DNA sequences of the region between the LTRs in fasta format.
*_Details.tsv : A spread sheet containing detailed information about the predicted LTRs.
*_FullLTRRetrotransposonSeqs.fsa : DNA sequences of the entire predicted LTR retrotransposon.
*_index.fsa : The suffixarray index file used to predict putative LTR retrotransposonswith LTRharvest
.
*_Prediction.gff : A spread sheet containing detailed additional information about the predicted LTRs (partially redundant with the *_Details.tsv file).
The ' * ' is an place holder for the name of the input genome file.
The LTRharvest
function provides an interface to the LTRharvest
command line
tool and furthermore takes care of the entire folder handling, output parsing, and data
processing of the LTRharvest
prediction.
Internally a folder named output.path
_ltrharvest is generated and all computations
returned by LTRharvest
are then stored in this folder. These files (see section Value
) are then parsed and returned as list of data.frames by this function.
LTRharvest
can be used as independently or as initial pre-computation step
to sufficiently detect LTR retrotransposons with LTRdigest
.
D Ellinghaus, S Kurtz and U Willhoeft. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics (2008). 9:18.
Most argument specifications are adapted from the User manual of LTRharvest.
Hajk-Georg Drost
if (FALSE) { # Run LTRharvest for H sapines partial Y chromosome using standard parameters LTRharvest(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred")) }