Filter the output of the repbase.query
function to quantify
the number of hits for each query LTR transposon (duplicates) and retain
only hits found in Repbase that span the annotation sequence in Repbase
to a certain percentage (scope
).
repbase.filter(query.output, scope.value = 0.7, verbose = TRUE)
query.output | a |
---|---|
scope.value | a value between [0,1] qunatifying the percentage of minimum sequence similariy between the LTR transposon and the corresponding annotated sequence found in Repbase. |
verbose | a logical value indicating whether or not additional information shall be printed to the console while executing this function. |
A data.frame
storing the filtered output returned by repbase.query
.
Hajk-Georg Drost
if (FALSE) { # PreProcess Repbase: A thaliana # and save the output into the file "Athaliana_repbase.ref" repbase.clean(repbase.file = "athrep.ref", output.file = "Athaliana_repbase.ref") # perform blastn search against A thaliana repbase annotation AthalianaRepBaseAnnotation <- repbase.query(ltr.seqs = "TAIR10_chr_all-ltrdigest_complete.fas", repbase.path = "Athaliana_repbase.ref", cores = 1) # filter the annotation query output AthalianaAnnot.HighMatches <- repbase.filter(AthalianaRepBaseAnnotation, scope = 0.9) Ath.TE.Matches.Families <- sort(table( unlist(lapply(stringr::str_split( names(table(AthalianaAnnot.HighMatches$subject_id)),"_"), function(x) paste0(x[2:3],collapse = ".")))), decreasing = TRUE) # visualize the hits found to have a scope of 90% barplot(Ath.TE.Matches.Families, las = 3, cex.names = 0.8, col = bcolor(length(Ath.TE.Matches.Families)), main = "RepBase Annotation: A. thaliana") }