Detect solo LTR copies and genomic locations of predicted LTR transposons using a BLAST search strategy.

ltr.cn(
  data.sheet,
  LTR.fasta_3ltr,
  LTR.fasta_5ltr,
  genome,
  ltr.similarity = 70,
  scope.cutoff = 0.85,
  perc.ident.cutoff = 70,
  output = NULL,
  max.hits = 500,
  eval = 1e-10,
  cores = 1
)

Arguments

data.sheet

path to the *_LTRpred_DataSheet.csv file generated by LTRpred.

LTR.fasta_3ltr

path to fasta file storing sequences of 3 prime LTRs generated by LTRpred.

LTR.fasta_5ltr

path to fasta file storing sequences of 5 prime LTRs generated by LTRpred.

genome

file path to the reference genome in which solo LTRs shall be found (in fasta format).

ltr.similarity

similarity threshold for defining LTR similarity.

scope.cutoff

similarity threshold for the scope variable. The scope of a BLAST hit is defined by abs(s_len - alig_length) / s_len and aims to quantify the scope ('length similarity between hit and query') of the alignment. Default is scope.cutoff = 0.85 meaning that at least 85% of the length of the query sequence must match with the length of the subject sequence.

perc.ident.cutoff

choose the minimum sequence identity between the query sequence and subject hit sequence that have a scope >= scope.cutoff (e.g. 0.85). Default is perc.ident.cutoff = 70. This threshold aims to detect BLAST hits that have similar sequence lengths (controlled by the scope.cutoff variable; e.g. scope.cutoff = 0.85) and within this scope at least e.g. 70% sequence identity (controlled by the perc.ident.cutoff variable).

output

file name of the BLAST output. If output = NULL (default) then the BLAST output file will be deleted after the result data.frame is returned by this function.

max.hits

maximum number of hits that shall be retrieved that still fulfill the e-value criterium. Default is max.hits = 65000.

eval

e-value threshold for BLAST hit detection. Default is eval = 1E-10.

cores

number of cores for parallel computations.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

Gish, W. & States, D.J. (1993) "Identification of protein coding regions by database similarity search." Nature Genet. 3:266-272.

Madden, T.L., Tatusov, R.L. & Zhang, J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141.

Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.

Zhang Z., Schwartz S., Wagner L., & Miller W. (2000), "A greedy algorithm for aligning DNA sequences" J Comput Biol 2000; 7(1-2):203-14.

Author

Hajk-Georg Drost