## LTRpred(ict): a de novo functional annotation pipeline to dectect potentially mobile LTR retrotransposons in any genome assembly

Transposable elements (TEs) can comprise vast parts of eukaryotic genomes. In the past, TEs were seen as selfish mobile elements capable of populating a host genome to increase their chances for survival. By doing so they leave traces of junk DNA in host genomes that are usually regarded as annoying by-products when sequencing, assembling, and annotating new genomes.

However, this picture is slowly changing (Drost & Sanchez, 2019) and TEs have been shown to be involved in generating a diverse range of novel phenotypes, e.g. such as the tomato fruit shape (Benoit, Drost et al., 2019), moth adaptive cryptic colouration that occurred during the industrial revolution Chuong et al., 2016, and inner cell mass development in human embryonic stem cells Chuong et al., 2016.

Today, the de novo detection of transposable elements is performed by annotation tools which try to detect any type of repeated sequence, TE family, or remnand DNA locus that can be associated with a known transposable element within a genome assembly. The main goal of such efforts is to retrieve a maximum amount of loci that can be associated with TEs. If successful, such annotation can then be used to mask host genomes and to perform classic (phylo-)genomics studies focusing on host genes.

More than 300 repeat and TE annotation tools have been developed so far. Most of them are designed and optimized to annotate either the entire repeat space or specific superfamilies of TEs and their DNA remnants.

The LTRpred pipeline has a different goal than all other annotation tools. It focuses particularly on LTR retrotransposons and aims to annotate only functional and potentially mobile elements. Such type of annotation is crucial for studying retrotransposon activity in eukaryotic genomes and to understand whether specific retrotransposon families can be activated artificially and harnessed to mutagenize genomes at much faster speed.

In detail, LTRpred will take any genome assembly file in fasta format as input and will generate a detailed annotation of functional and potentially mobile LTR retrotransposons.

Users can consult a comprehensive Introduction to the LTRpred pipeline to get familiar with the tool.

## Install

Install prerequisite CRAN and Bioconductor packages:

install.packages(c("tidyverse", "data.table", "seqinr", "biomartr", "ape", "dtplyr", "devtools"))

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install()

BiocManager::install(c("rtracklayer", "GenomicFeatures", "GenomicRanges", "GenomeInfoDb", "biomaRt", "ggbio"))

devtools::install_github("HajkD/metablastr", build_vignettes = TRUE, dependencies = TRUE)

install.packages(c("BSDA", "ggrepel", "gridExtra")) 

Now users may install LTRpred as follows:

# install.packages("devtools")
devtools::install_github("HajkD/LTRpred")

## Tutorials

### Quick Start

The fastest way to generate a LTR retrotransposon prediction for a genome of interest (after installing all prerequisite command line tools) is to use the LTRpred() function and relying on the default parameters. In the following example, a LTR transposon prediction is performed for parts of the Human Y chromosome.

# Perform de novo LTR transposon prediction for the Human Y chromosome
LTRpred::LTRpred(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred"))

When running your own genome, please specify genome.file = "path/to/your/genome.fasta instead of system.file(..., package = "LTRpred"). The command system.file(..., package = "LTRpred") merely references the path to the example file stored in the LTRpred package itself.

## Citation

The LTRpred package is not formally published yet, but a manuscript is in preparation. For now, please cite one of the the following paper when using LTRpred for your own research. LTRpred is part of these studies and helped to predict potentially active retrotransposons that were later confirmed experimentally.

M Benoit, HG Drost, M Catoni, Q Gouil, S Lopez-Gomollon, DC Baulcombe, J Paszkowski. Environmental and epigenetic regulation of Rider retrotransposons in tomato. PloS Genetics, 15(9): e1008370 (2019).

or

J Cho, M Benoit, M Catoni, HG Drost, A Brestovitsky, M Oosterbeek and J Paszkowski. Sensitive detection of pre-integration intermediates of LTR retrotransposons in crop plants. Nature Plants, 5, 26-33 (2019).

This tutorial introduces users to LTRpred:

Users can also read the tutorials within (RStudio) :

library(LTRpred)
browseVignettes("LTRpred")

## Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:

https://github.com/HajkD/LTRpred/issues

## Acknowledgement

I would like to thank the Paszkowski team for incredible support and motivating discussions that led to the realization of this project.