Read a file in blast6out format generated by either USEARCH or VSEARCH.

read.blast6out(blast6out.file)

Arguments

blast6out.file

to blast6out file (*.blast6out extension).

Value

A dataframe storing the following columns:

  • query: query id.

  • subject: subject id.

  • perc_ident: pecent identity between query and subject.

  • align_len: alignment length between query and subject.

  • n_mismatch: number of mismathces between query and subject.

  • n_gap_open: number of gap openings between query and subject.

  • start_q: start position in query. Query coordinates start with 1 at the first base in the sequence as it appears in the input file. For translated searches (nucleotide queries, protein targets), query start < end for +ve frame and start > end for -ve frame.

  • end_q: end position in query.

  • start_s: start position in subject. Subject coordinates start with 1 at the first base in the sequence as it appears in the database. For untranslated nucleotide searches, subject start < end for plus strand, start > end for a reverse-complement alignment.

  • end_s: end position in subject.

  • evalue: evalue calculated using Karlin-Altschul statistics.

  • bit_score: bit score calculated using Karlin-Altschul statistics.

Author

Hajk-Georg Drost

Examples

# read example *.blast6out file
test.blast6out <- read.blast6out(system.file("test.blast6out", package = "LTRpred"))

# look at the format in R
head(test.blast6out)
#> # A tibble: 6 × 12
#>   query subject perc_ident align_len n_mismatch n_gap_open start_q end_q start_s
#>   <chr> <chr>        <dbl>     <int>      <int>      <int>   <int> <int>   <int>
#> 1 2_CH… mitoch…       99.6     18868         22         19   18831     1       1
#> 2 mito… 2_CHRO…       99.3     18390         29         35       1 18354       1
#> 3 3_CH… 1_CHRO…       91.3      9070        317         20       1  8652       1
#> 4 4_CH… 3_CHRO…       92.2      8301        429         19       1  8152       1
#> 5 3_CH… 3_CHRO…       92        8337        292         15       1  8080       1
#> 6 4_CH… 3_CHRO…       93        8292        284         12       1  8053       1
#> # … with 3 more variables: end_s <int>, evalue <dbl>, bit_score <dbl>