Skip to content

TaxDict breaks if passed lineage with unlabeled terminal rank #4

@tanaes

Description

@tanaes

I noticed this when parsing a list of insect taxa from genbank with available genomic information. When I attempted to make a TaxDict object with the list of taxa, it failed.

host = ['Unclassified Trichoceridae']

resolved_host = Resolver(terms=host)
resolved_host.main()

taxonomy = ['subspecies', 'species', 'genus',
            'family', 'order', 'class', 'phylum', 'kingdom']

idents = resolved_host.retrieve('query_name')

lineages = resolved_host.retrieve('classification_path')

ranks = resolved_host.retrieve('classification_path_ranks')

print([(ranks[0][x],lineages[0][x]) for x in range(len(ranks[0]))])
[('superkingdom', 'Eukaryota'), ('', 'Opisthokonta'), ('kingdom', 'Metazoa'), ('', 'Eumetazoa'), ('', 'Bilateria'), ('', 'Protostomia'), ('', 'Ecdysozoa'), ('', 'Panarthropoda'), ('phylum', 'Arthropoda'), ('', 'Mandibulata'), ('', 'Pancrustacea'), ('superclass', 'Hexapoda'), ('class', 'Insecta'), ('', 'Dicondylia'), ('', 'Pterygota'), ('subclass', 'Neoptera'), ('infraclass', 'Endopterygota'), ('order', 'Diptera'), ('suborder', 'Nematocera'), ('infraorder', 'Psychodomorpha'), ('superfamily', 'Trichoceroidea'), ('family', 'Trichoceridae'), ('', 'Unclassified')]

In this case, the terminal lineage entry is 'Unclassified' with no assigned rank, causing _getLevel to fail on initialization of the TaxRef object:

taxdict = TaxDict(idents=idents, ranks=ranks, lineages=lineages,
                  taxonomy=taxonomy)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-0af07c3e75e4> in <module>()
      1 taxdict = TaxDict(idents=idents, ranks=ranks, lineages=lineages,
----> 2                   taxonomy=taxonomy)

/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, idents, ranks, lineages, taxonomy, **kwargs)
    115             # create taxref
    116             taxref = TaxRef(ident=idents[i], rank=ranks[i][-1],
--> 117                             taxonomy=self.taxonomy)
    118             # create key for ident and insert a dictionary of:
    119             #  lineage, taxref, cident, ident and rank

/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, ident, rank, taxonomy)
     34         except ValueError as e:
     35             print('Error in taxon ident: {}'.format(ident))
---> 36             raise e
     37         super(TaxRef, self).__setattr__('counter', 0)  # count ident changes
     38 

/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in __init__(self, ident, rank, taxonomy)
     31         try:
     32             super(TaxRef, self).__setattr__('level',
---> 33                                         self._getLevel(rank, taxonomy))
     34         except ValueError as e:
     35             print('Error in taxon ident: {}'.format(ident))

/Users/jonsanders/Development/git_sw/TaxonNamesResolver/taxon_names_resolver/manip_tools.py in _getLevel(self, rank, taxonomy)
     54             return taxonomy.index(rank)
     55         # else find its closest by using the default taxonomy
---> 56         dlevel = default_taxonomy.index(rank)
     57         i = 1
     58         d = dlevel + i

ValueError: '' is not in list

Not sure what the best way to resolve this should be.

  1. Could add a catch in _getLevel to make sure the query rank is present in the default taxonomy before it tries to index, otherwise return 'Unknown' or similar.

  2. Rather than simply passing the terminal rank to the TaxRef constructor, look for the most terminal labeled rank that is present in either the provided or default taxonomy.

Any thoughts?

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions