Skip to content

Conversation

@Jongmassey
Copy link
Contributor

@Jongmassey Jongmassey commented Dec 30, 2025

When the September release was moved to the historical releases section with the release of the October release,
the HTML for its link was in a new and unexpected format:

<p>September<a href="/sites/default/files/2025-11/BNF%20Snomed%20Mapping%20data%2020251120.zip"> 2025 (ZIP file: 18.77MB)</a></p>

vs

<p><a href="/sites/default/files/2025-10/BNF%20Snomed%20Mapping%20data%2020251017.zip">August 2025 (ZIP file: 18.2MB)</a></p>

The previous code assumed that if the link text didn't fit the "month YYYY" pattern then only the month part of the "valid from" date was part of the link text and the year part was in the following element. This was the only form of malformed historical release link we'd seen, but this fails with this new case since the missing date part is in the preceding element (and there is no following element).

This commit accounts for both these types of malformed link, with an accompanying test.

Fixes #2942

When the September release was moved to the historical releases section
with the release of the October release,
the HTML for its link was in a new and unexpected format:

<p>September<a href="/sites/default/files/2025-11/BNF%20Snomed%20Mapping%20data%2020251120.zip"> 2025 (ZIP file: 18.77MB)</a></p>

<p><a href="/sites/default/files/2025-10/BNF%20Snomed%20Mapping%20data%2020251017.zip">August 2025 (ZIP file: 18.2MB)</a></p>

The previous code assumed that if the link text didn't fit the
"month YYYY" pattern then only the month part of the "valid from" date
was part of the link text and the year part was in the following element.
This was the only form of malformed historical release link we'd seen,
but this fails with this new case since the missing date part is in the
preceding element (and there is no following element).

This commit accounts for both these types of malformed link,
with an accompanying test.
@Jongmassey Jongmassey force-pushed the Jongmassey/fix-bnf-dmd-mapping-scraper branch from 73e4a2a to b67c424 Compare December 30, 2025 12:44
@Jongmassey
Copy link
Contributor Author

(additional line added to test fixture uses September 2023 rather than 2025 as not to break the existing "latest release" test logic)

@Jongmassey
Copy link
Contributor Author

If you're wondering why this didn't affect the OpenPrescribing code that this was based on, the error is in the "valid from" date parsing, which OP doesn't do but OpenCodelists requires.

@lucyb lucyb merged commit 76a4d2e into main Jan 6, 2026
6 checks passed
@lucyb lucyb deleted the Jongmassey/fix-bnf-dmd-mapping-scraper branch January 6, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BNF-dmd mapping site scraping fails to parse historical releases

3 participants