-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
Description
I noticed this when parsing a list of insect taxa from genbank with available genomic information. When I attempted to make a TaxDict object with the list of taxa, it failed.
host = ['Unclassified Trichoceridae']
resolved_host = Resolver(terms=host)
resolved_host.main()
taxonomy = ['subspecies', 'species', 'genus',
'family', 'order', 'class', 'phylum', 'kingdom']
idents = resolved_host.retrieve('query_name')
lineages = resolved_host.retrieve('classification_path')
ranks = resolved_host.retrieve('classification_path_ranks')
print([(ranks[0][x],lineages[0][x]) for x in range(len(ranks[0]))])[('superkingdom', 'Eukaryota'), ('', 'Opisthokonta'), ('kingdom', 'Metazoa'), ('', 'Eumetazoa'), ('', 'Bilateria'), ('', 'Protostomia'), ('', 'Ecdysozoa'), ('', 'Panarthropoda'), ('phylum', 'Arthropoda'), ('', 'Mandibulata'), ('', 'Pancrustacea'), ('superclass', 'Hexapoda'), ('class', 'Insecta'), ('', 'Dicondylia'), ('', 'Pterygota'), ('subclass', 'Neoptera'), ('infraclass', 'Endopterygota'), ('order', 'Diptera'), ('suborder', 'Nematocera'), ('infraorder', 'Psychodomorpha'), ('superfamily', 'Trichoceroidea'), ('family', 'Trichoceridae'), ('', 'Unclassified')]
In this case, the terminal lineage entry is 'Unclassified' with no assigned rank, causing _getLevel to fail on initialization of the TaxRef object:
taxdict = TaxDict(idents=idents, ranks=ranks, lineages=lineages,
taxonomy=taxonomy)---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-0af07c3e75e4> in <module>()
1 taxdict = TaxDict(idents=idents, ranks=ranks, lineages=lineages,
----> 2 taxonomy=taxonomy)
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, idents, ranks, lineages, taxonomy, **kwargs)
115 # create taxref
116 taxref = TaxRef(ident=idents[i], rank=ranks[i][-1],
--> 117 taxonomy=self.taxonomy)
118 # create key for ident and insert a dictionary of:
119 # lineage, taxref, cident, ident and rank
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, ident, rank, taxonomy)
34 except ValueError as e:
35 print('Error in taxon ident: {}'.format(ident))
---> 36 raise e
37 super(TaxRef, self).__setattr__('counter', 0) # count ident changes
38
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, ident, rank, taxonomy)
31 try:
32 super(TaxRef, self).__setattr__('level',
---> 33 self._getLevel(rank, taxonomy))
34 except ValueError as e:
35 print('Error in taxon ident: {}'.format(ident))
/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in _getLevel(self, rank, taxonomy)
54 return taxonomy.index(rank)
55 # else find its closest by using the default taxonomy
---> 56 dlevel = default_taxonomy.index(rank)
57 i = 1
58 d = dlevel + i
ValueError: '' is not in list
Not sure what the best way to resolve this should be.
-
Could add a catch in
_getLevelto make sure the query rank is present in the default taxonomy before it tries to index, otherwise return 'Unknown' or similar. -
Rather than simply passing the terminal rank to the
TaxRefconstructor, look for the most terminal labeled rank that is present in either the provided or default taxonomy.
Any thoughts?