Skip to content

choice between fasta and HMM for alignment to a genome #13

@dcopetti

Description

@dcopetti

Hi,
I have a set of protein sequences of TE genes in a fasta file (there is some redundancy of close sequences, but there are proteins from all TE classes, so high variation) that I am using to query genome assemblies (200 Mb to 20 Gb) to find the location of TE coding regions.
I am running BATH as
bathsearch --cpu 20 -o output_bath --tblout output_tab TE_proteins.fa assembly.fa
and it is taking more than a week now.
I wonder if creating clusters of sequences belonging to the same TE family and making HMM models will be useful to speed up the searches I will do with the next assemblies - would the time invested in this pay off in the long term? do you have an estimation of the difference between the two options?
Thanks,
Dario

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions