Only constrain using stash code or field name in the iris.load call #1718

SGallagherMet · 2025-09-24T16:03:16Z

Iris has a built in 'shortcut' mechanism if the only constraint passed in to a an iris.load call is a single stash code or a single field name. This PR changes the loading process of the recipes that only use a single parameter to first load the data, then apply other constraints using a filter step.

For models with many parameters and frequent output this can significantly speed up the loading process as Iris can skip fields without fully loading their metadata to check the remaining constraints. A quick test on one of our UKV trial outputs suggested the run of the aggregation step was approximately halved.

Contribution checklist

Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.

Documentation has been updated to reflect change.
New code has tests, and affected old tests have been updated.
All tests and CI checks pass.
Ensured the pull request title is descriptive.
Conda lock files have been updated if dependencies have changed.
Attributed any Generative AI, such as GitHub Copilot, used in this PR.
Marked the PR as ready to review.

…any recipes to configure.

…he stash code or field name during the initial call to iris.load.

…'t find any recipes to configure." as this should be part of a separate branch/PR. This reverts commit 80d9cc3.

… used on multiple models at once.

jfrost-mo · 2025-10-29T10:01:15Z

Given this is a performance change, I'd ideally like to do it without changing the API. Specifically that will involve doing it inside of read.read_cubes, by first doing a load with the varname constraint, then filtering the remaining cubes with the other constraints. This may required splitting apart the combined constraints object passed in, which I'm not sure is possible.

The other aspect to this is that we currently apply a bunch of load-time callbacks to the loaded cubes, which crucially apply before they are filtered. Therefore we are already paying a significant price to look at all that metadata. We may need to split that processing into two as well, so we only do the variable name normalisation. Otherwise we might be able to avoid it by changing from normalising the cube's variable names to instead filtering on all possible names for a phenomena. Would that require our name list to be comprehensive however?

That said, if you are already seeing a significant performance improvement with this change already, it is definitely worth investigation. Load speed has always been a bit slow in CSET, especially when loading multiple files. We previously had a pre-processor that munged each case into a single file before doing the diagnostics on it, which gave a significant speedup when running many tasks, but slowed down small numbers of tasks due to having to do a copy of the whole dataset. It also added significant complexity (and bugs) to the loading, so it was dropped.

SGallagherMet added 5 commits September 18, 2025 16:10

Use custom cylc triggers to skip 'baking' when 'parbake' didn't find …

80d9cc3

…any recipes to configure.

Update recipes that load a single parameter to constrain using only t…

aa94b1f

…he stash code or field name during the initial call to iris.load.

Revert "Use custom cylc triggers to skip 'baking' when 'parbake' didn…

a6bb7dd

…'t find any recipes to configure." as this should be part of a separate branch/PR. This reverts commit 80d9cc3.

Merge branch 'main' of github.com:MetOffice/CSET into fast_loading

beeb584

Add a per_model option to the filter_cubes operator so that it can be…

767062d

… used on multiple models at once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only constrain using stash code or field name in the iris.load call #1718

Only constrain using stash code or field name in the iris.load call #1718

Uh oh!

SGallagherMet commented Sep 24, 2025

Uh oh!

jfrost-mo commented Oct 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Only constrain using stash code or field name in the iris.load call #1718

Are you sure you want to change the base?

Only constrain using stash code or field name in the iris.load call #1718

Uh oh!

Conversation

SGallagherMet commented Sep 24, 2025

Contribution checklist

Uh oh!

jfrost-mo commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jfrost-mo commented Oct 29, 2025 •

edited

Loading