-
Notifications
You must be signed in to change notification settings - Fork 7
Use augur subsample #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
augur subsample is built to replace the simple augur filter call with a command that supports more configuration options. Note that this is a breaking change and the old configuration will no longer work.
|
Locally, I did a test run with the full input and confirmed the output messages are the same as before with an extra validation line: |
|
Also confirmed that |
j23414
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for:
- concordance with WNV sampling (also being used by WA-DOH)
- these changes do not break anything in the existing pipeline
- the mumps pipeline is simple enough (only two) that the sampling config is not unreasonably long or tedious to edit
Local testing of the following does indeed work:
mkdir phylo-out
nextstrain run mumps@victorlin/use-augur-subsample phylogenetic phylo-out| north-america: | ||
| samples: | ||
| all: | ||
| min_length: 8000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[non-blocking commentary]
Okay, I can see how this approach makes every filter parameter explicit in the config.yaml. I'm still not a fan of the duplication (e.g., identical settings for min_length, group_by, max_sequences, etc.) under each build scope (north-america and global). But since there are only two builds here (compared to dengue with 10, norovirus with 14, and flu potentially with 3×8), I can live with it for mumps.
I could see an improvement by using something like YAML anchors at some point in the future:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine someone adding "SH" gene builds (would double the builds to 4) but as of now, we only do full genome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After using YAML anchors in nextstrain/rsv#103, I don't think it's great and am seeking alternatives in nextstrain/public#27.
| metadata = "data/metadata.tsv", | ||
| exclude = resolve_config_path(config["filter"]["exclude"]), | ||
| include = resolve_config_path(config["filter"]["include"]), | ||
| config = "results/run_config.yaml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[non-blocking commentary]
Will this no longer detect changes to the exclude.txt or include.txt files?
Perhaps this is a moot point — I’m not sure how often people simply update the exclude list and expect to rerun from cache, versus just forcing a rerun each time anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will no longer detect changes, this is a good point. I mentioned this in nextstrain/pathogen-repo-guide#98, and I think the functionality could potentially be brought back with a workflow-level helper function that dynamically adds inputs.
|
[non-blocking optional task] Could also update subsampling in the nextclade workflow for concordance to phylogenetic mumps/nextclade/defaults/config.yaml Lines 20 to 24 in b06a43d
|
|
I'll wait for a decision in nextstrain/public#27 before continuing here. |
Description of proposed changes
augur subsample is built to replace the simple augur filter call with a command that supports more configuration options.
Note that this is a breaking change and the old configuration will no longer work.
Related issue(s)
Closes #44
Checklist
Update changelogno changelog, might be good to start one