Skip to content

Conversation

@SGallagherMet
Copy link
Contributor

Iris has a built in 'shortcut' mechanism if the only constraint passed in to a an iris.load call is a single stash code or a single field name. This PR changes the loading process of the recipes that only use a single parameter to first load the data, then apply other constraints using a filter step.

For models with many parameters and frequent output this can significantly speed up the loading process as Iris can skip fields without fully loading their metadata to check the remaining constraints. A quick test on one of our UKV trial outputs suggested the run of the aggregation step was approximately halved.

Contribution checklist

Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.

  • Documentation has been updated to reflect change.
  • New code has tests, and affected old tests have been updated.
  • All tests and CI checks pass.
  • Ensured the pull request title is descriptive.
  • Conda lock files have been updated if dependencies have changed.
  • Attributed any Generative AI, such as GitHub Copilot, used in this PR.
  • Marked the PR as ready to review.

@jfrost-mo
Copy link
Member

jfrost-mo commented Oct 29, 2025

Given this is a performance change, I'd ideally like to do it without changing the API. Specifically that will involve doing it inside of read.read_cubes, by first doing a load with the varname constraint, then filtering the remaining cubes with the other constraints. This may required splitting apart the combined constraints object passed in, which I'm not sure is possible.

The other aspect to this is that we currently apply a bunch of load-time callbacks to the loaded cubes, which crucially apply before they are filtered. Therefore we are already paying a significant price to look at all that metadata. We may need to split that processing into two as well, so we only do the variable name normalisation. Otherwise we might be able to avoid it by changing from normalising the cube's variable names to instead filtering on all possible names for a phenomena. Would that require our name list to be comprehensive however?

That said, if you are already seeing a significant performance improvement with this change already, it is definitely worth investigation. Load speed has always been a bit slow in CSET, especially when loading multiple files. We previously had a pre-processor that munged each case into a single file before doing the diagnostics on it, which gave a significant speedup when running many tasks, but slowed down small numbers of tasks due to having to do a copy of the whole dataset. It also added significant complexity (and bugs) to the loading, so it was dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants