-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi,
I have a set of protein sequences of TE genes in a fasta file (there is some redundancy of close sequences, but there are proteins from all TE classes, so high variation) that I am using to query genome assemblies (200 Mb to 20 Gb) to find the location of TE coding regions.
I am running BATH as
bathsearch --cpu 20 -o output_bath --tblout output_tab TE_proteins.fa assembly.fa
and it is taking more than a week now.
I wonder if creating clusters of sequences belonging to the same TE family and making HMM models will be useful to speed up the searches I will do with the next assemblies - would the time invested in this pay off in the long term? do you have an estimation of the difference between the two options?
Thanks,
Dario
Metadata
Metadata
Assignees
Labels
No labels