Skip to content

Bug report: update_codelist_version_compatibility's comparison of Hierarchies is erroneously order-sensitive #2909

@Jongmassey

Description

@Jongmassey

Summary

update_codelist_version_compatibility runs when a new release of a coding system is imported.
This updates the compatible_versions property of relevant CodelistVersions.
As part of its compatibility checking, it compares the Hierarchy of a CodelistVersion within the previous CodingSystemRelease to the incoming release using _check_version_by_hierarchy.
This function compares parts of the hierarchies in an order-sensitive manner when an order-insensitive approach is more appropriate, leading to false assertions of incompatibility of a CodelistVersion with a CodingSystemRelease.
There are now several codelist versions presented on OpenCodelists with inaccurate coding system release compatibility information, which misleads researchers as to their suitability for use.

What did you do and what did you expect to happen?

  • Visited this codelist version page
  • Expected to see it was compatible with the latest release of SNOMED CT (confirmed by cloning, creating new version, and comparing).
  • It was not stated to be compatible with this release

Impact of the bug

Renders large parts of the analysis I have done on usage of out-of-date codelists unreliable.
Provides inaccurate information on appropriate codelist use to researchers.

Worked Example

Get the most recent SNOMED CT release, and the previous one

coding_system_id="snomedct"
new_coding_system = CODING_SYSTEMS[coding_system_id](
    ...:         database_alias=new_database_alias
    ...:     )
coding_system=new_coding_system
previous_release = (
    ...:     CodingSystemRelease.objects.filter(
    ...:         coding_system=coding_system.id,
    ...:         valid_from__lt=coding_system.release.valid_from,
    ...:     )
    ...:     .exclude(id=coding_system.release.id)
    ...:     .latest("valid_from")
    ...: )

Ascertain the versions to have their compatibility checked as per update_codelist_version_compatibility, isolate above-linked version of interest.

versions = set(
    ...:     CodelistVersion.objects.exclude(status=Status.DRAFT).filter(
    ...:     Q(compatible_releases__id=previous_release.id)
    ...:     | Q(coding_system_release=previous_release)
    ...:     )
    ...:     )
version = [v for v in versions if v.hash == "7e32e792"][0]

Call the version search and hierarchy compatibility inner functions

_check_version_by_search(
    ...:         coding_system, version
    ...:     )
Out[40]: True
_check_version_by_hierarchy(coding_system, version)
Out[41]: False

Manually compare state of original and new hierarchies as generated within _check_version_by_hierarchy

In [47]: original_hierarchy_data
Out[47]: 
{'root': '138875005',
 'nodes': ['73211009',
  '362965005',
  '170745003',
  '164971000119101',
  '64572001',
  '404684003',
  '170763003',
  '308537004',
  '441742003',
  '24481000000101',
  '75934005',
  '243860001',
  '44054006',
  '126877002',
  '362969004',
  '307824009',
  '444110003',
  '127325009',
  '243851009',
  '138875005',
  '20957000',
  '243815002'],
 'child_map': {'307824009': ['243815002'],
  '362965005': ['362969004'],
  '404684003': ['307824009', '127325009', '64572001'],
  '441742003': ['170763003'],
  '126877002': ['73211009'],
  '170763003': ['444110003'],
  '64572001': ['362965005', '75934005'],
  '138875005': ['404684003'],
  '243851009': ['243860001'],
  '170745003': ['24481000000101'],
  '243815002': ['308537004'],
  '362969004': ['73211009'],
  '20957000': ['126877002'],
  '308537004': ['243851009'],
  '127325009': ['441742003'],
  '444110003': ['164971000119101'],
  '44054006': ['444110003'],
  '75934005': ['20957000'],
  '73211009': ['44054006'],
  '243860001': ['170745003']},
 'parent_map': {'243815002': ['307824009'],
  '362969004': ['362965005'],
  '127325009': ['404684003'],
  '170763003': ['441742003'],
  '73211009': ['362969004', '126877002'],
  '444110003': ['44054006', '170763003'],
  '75934005': ['64572001'],
  '307824009': ['404684003'],
  '404684003': ['138875005'],
  '243860001': ['243851009'],
  '24481000000101': ['170745003'],
  '308537004': ['243815002'],
  '64572001': ['404684003'],
  '126877002': ['20957000'],
  '243851009': ['308537004'],
  '441742003': ['127325009'],
  '164971000119101': ['444110003'],
  '20957000': ['75934005'],
  '44054006': ['73211009'],
  '362965005': ['64572001'],
  '170745003': ['243860001']}}

In [48]: new_hierarchy_data
Out[48]: 
{'root': '138875005',
 'nodes': ['64572001',
  '308537004',
  '307824009',
  '75934005',
  '170763003',
  '20957000',
  '24481000000101',
  '126877002',
  '444110003',
  '362965005',
  '362969004',
  '73211009',
  '138875005',
  '170745003',
  '243815002',
  '44054006',
  '243860001',
  '164971000119101',
  '243851009',
  '127325009',
  '404684003',
  '441742003'],
 'child_map': {'444110003': ['164971000119101'],
  '126877002': ['73211009'],
  '404684003': ['307824009', '127325009', '64572001'],
  '307824009': ['243815002'],
  '64572001': ['75934005', '362965005'],
  '127325009': ['441742003'],
  '243851009': ['243860001'],
  '138875005': ['404684003'],
  '308537004': ['243851009'],
  '170763003': ['444110003'],
  '75934005': ['20957000'],
  '362965005': ['362969004'],
  '362969004': ['73211009'],
  '441742003': ['170763003'],
  '73211009': ['44054006'],
  '20957000': ['126877002'],
  '243815002': ['308537004'],
  '44054006': ['444110003'],
  '170745003': ['24481000000101'],
  '243860001': ['170745003']},
 'parent_map': {'164971000119101': ['444110003'],
  '73211009': ['362969004', '126877002'],
  '127325009': ['404684003'],
  '243815002': ['307824009'],
  '75934005': ['64572001'],
  '441742003': ['127325009'],
  '243860001': ['243851009'],
  '404684003': ['138875005'],
  '243851009': ['308537004'],
  '307824009': ['404684003'],
  '444110003': ['44054006', '170763003'],
  '20957000': ['75934005'],
  '64572001': ['404684003'],
  '362969004': ['362965005'],
  '170763003': ['441742003'],
  '44054006': ['73211009'],
  '126877002': ['20957000'],
  '308537004': ['243815002'],
  '362965005': ['64572001'],
  '24481000000101': ['170745003'],
  '170745003': ['243860001']}}

In [50]: original_hierarchy_data["nodes"] == new_hierarchy_data["nodes"]
Out[50]: False

In [51]: set(original_hierarchy_data["nodes"]) == set(new_hierarchy_data["nodes"])
Out[51]: True

In [55]: set(original_hierarchy_data["child_map"].keys()).difference(set(new_hierarchy_data["child_map"].keys()))
Out[55]: set()

In [56]: set(original_hierarchy_data["parent_map"].keys()).difference(set(new_hierarchy_data["parent_map"].keys()))
Out[56]: set()


In [62]: [(k,v) for k,v in original_hierarchy_data["child_map"].items() if v != new_hierarchy_data["child_map"][k]]
Out[62]: [('64572001', ['362965005', '75934005'])]
In [64]: new_hierarchy_data["child_map"]["64572001"]
Out[64]: ['75934005', '362965005']

We can see that the nodes are the same between these hierarchies, just ordered differently. Since they are compared as lists rather than sets the comparison is order-sensitive.

Equally, the keys for the parent and child maps are the same, and where there are differing values, the codes within those values are the same just in a different order.

Sub-issues

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdeck-scrubbingTech debt or other between-initiative tidy-up work

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions