Skip to content

Conversation

@victorlin
Copy link
Member

@victorlin victorlin commented Oct 20, 2025

Note

Based on nextstrain/shared#63

Description of proposed changes

This PR has a working implementation of custom config merging at workflow startup time within Snakemake, a practical solution to remove the need for custom_subsample.

Related issue(s)

Checklist

These should go in nextstrain/shared.
With a custom config merge, it is now possible to use 'subsample'
directly without defaults leaking into user config.
@victorlin victorlin force-pushed the victorlin/custom-config-merge branch from 3980377 to e78aadc Compare October 29, 2025 22:17
@joverlee521 joverlee521 self-requested a review October 31, 2025 20:57
Comment on lines +18 to +40
def load_config():
"""
Load Snakemake configuration.
This approach differs from Snakemake's built-in merge logic in that it uses
overrides for dicts in addition to other types. That is, when an overlay
contains a dict key, it completely replaces the existing value rather than
merging with it.
"""
global workflow

merged_config = {}

# Add values from --configfile(s).
for config_file in get_config_files():
if not Path(config_file).is_absolute():
config_file = Path(os.getcwd()) / Path(config_file)
merged_config.update(load(config_file))

# Add values from --config.
merged_config.update(workflow.config_settings.config)

return merged_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying this aloud to think through the changes

This is replacing the merge for all nested dicts within the config, which means that overriding any of the nested params requires copying the whole dict. E.g. if I wanted to change one of the files, I have to include the full files dict in my custom configfile. Similarly, overriding via --config will be more verbose (reminding me of my aversion to nested configs before Snakemake fixed their config merging bug).

As mentioned in nextstrain/public#28, if this custom merge is only used for the subsample param (and other future params that require this merge), then we have the problem of explaining how some keys are merged differently than others. Either way, I think this will be surprising for people who are familiar with Snakemake config merging behavior...

I think it'd be worth comparing this approach with the alternative of just sticking with Snakemake's merge behavior and allowing users to sidestep the merge with the --replace-workflow-config flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to (re-)echo this sentiment (see discussion in related issue).

The solution which we seem to have completely discarded is the normal snakemake merge behaviour with the ability to use null (~) keys. This would require documentation of course, but I still think it could work just fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jover and James. I don't have very much experience with workflow configs in practice, so I appreciate your perspectives. Point taken that no single merge approach works well, and that this PR is just making it simpler for subsample (which naturally wants dict overrides) while complicating things for files (which naturally wants dict merges).

I think it'd be worth comparing this approach with the alternative of just sticking with Snakemake's merge behavior and allowing users to sidestep the merge with the --replace-workflow-config flag.

For the alternative: how will users know when to sidestep the merge with --replace-workflow-config? It doesn't seem obvious. It might be simpler to always ignore the merge and normalize the flow with --replace-workflow-config. As mentioned in nextstrain/public#28 (comment), I'll look into prototyping this flow.

The solution which we seem to have completely discarded is the normal snakemake merge behaviour with the ability to use null (~) keys. This would require documentation of course, but I still think it could work just fine.

I explored this in nextstrain/rsv#106 (options 2 and 3) and didn't think it worked well. Maybe I've overlooked something though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants