-
Notifications
You must be signed in to change notification settings - Fork 8
Remove custom_subsample with custom config merge
#89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
These should go in nextstrain/shared.
With a custom config merge, it is now possible to use 'subsample' directly without defaults leaking into user config.
3980377 to
e78aadc
Compare
| def load_config(): | ||
| """ | ||
| Load Snakemake configuration. | ||
| This approach differs from Snakemake's built-in merge logic in that it uses | ||
| overrides for dicts in addition to other types. That is, when an overlay | ||
| contains a dict key, it completely replaces the existing value rather than | ||
| merging with it. | ||
| """ | ||
| global workflow | ||
|
|
||
| merged_config = {} | ||
|
|
||
| # Add values from --configfile(s). | ||
| for config_file in get_config_files(): | ||
| if not Path(config_file).is_absolute(): | ||
| config_file = Path(os.getcwd()) / Path(config_file) | ||
| merged_config.update(load(config_file)) | ||
|
|
||
| # Add values from --config. | ||
| merged_config.update(workflow.config_settings.config) | ||
|
|
||
| return merged_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying this aloud to think through the changes
This is replacing the merge for all nested dicts within the config, which means that overriding any of the nested params requires copying the whole dict. E.g. if I wanted to change one of the files, I have to include the full files dict in my custom configfile. Similarly, overriding via --config will be more verbose (reminding me of my aversion to nested configs before Snakemake fixed their config merging bug).
As mentioned in nextstrain/public#28, if this custom merge is only used for the subsample param (and other future params that require this merge), then we have the problem of explaining how some keys are merged differently than others. Either way, I think this will be surprising for people who are familiar with Snakemake config merging behavior...
I think it'd be worth comparing this approach with the alternative of just sticking with Snakemake's merge behavior and allowing users to sidestep the merge with the --replace-workflow-config flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also like to (re-)echo this sentiment (see discussion in related issue).
The solution which we seem to have completely discarded is the normal snakemake merge behaviour with the ability to use null (~) keys. This would require documentation of course, but I still think it could work just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jover and James. I don't have very much experience with workflow configs in practice, so I appreciate your perspectives. Point taken that no single merge approach works well, and that this PR is just making it simpler for subsample (which naturally wants dict overrides) while complicating things for files (which naturally wants dict merges).
I think it'd be worth comparing this approach with the alternative of just sticking with Snakemake's merge behavior and allowing users to sidestep the merge with the
--replace-workflow-configflag.
For the alternative: how will users know when to sidestep the merge with --replace-workflow-config? It doesn't seem obvious. It might be simpler to always ignore the merge and normalize the flow with --replace-workflow-config. As mentioned in nextstrain/public#28 (comment), I'll look into prototyping this flow.
The solution which we seem to have completely discarded is the normal snakemake merge behaviour with the ability to use null (
~) keys. This would require documentation of course, but I still think it could work just fine.
I explored this in nextstrain/rsv#106 (options 2 and 3) and didn't think it worked well. Maybe I've overlooked something though.
Note
Based on nextstrain/shared#63
Description of proposed changes
This PR has a working implementation of custom config merging at workflow startup time within Snakemake, a practical solution to remove the need for
custom_subsample.Related issue(s)
Checklist