-
Notifications
You must be signed in to change notification settings - Fork 0
Add support to model AA haplotypes #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cd111d6 to
1b72d97
Compare
Reorganizes workflow to support fitting the MLR with different variant classifications as in the forecasts-ncov workflow. In addition to the original "emerging_haplotype" variant classification, I've added a "aa_haplotype" classification which uses more granular, automated haplotype assignments based on current clade annotation and all HA1 substitutions from that parent clade. We will likely need to tune the minimum number of "clade" sequences per AA haplotype to allow rarer haplotypes to appear in the analysis. For now, I've kept the same thresholds for both variant classifications, though. As part of this reorganization, I've also added support for different data provenances. I also realized that it did not make sense to implement different date thresholds for each potential model output; all of the analyses we run should represent the same time span. For this reason, I've moved "min_date" and "max_date" to the top-level config.
More regions or countries are close to the original 150-sequence threshold but don't get included. Lowering the threshold allows more regions to be included while keeping the same minimum for clade inclusion.
7261427 to
1c17e4b
Compare
jameshadfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The viz part is working well. A nice way to finish 2025, hopefully we can make 2026 the year we move this forward
In terms of the modelling, I was surprised at the ~1y forecast window of given (only) ~6m of fitted data. The CIs also look a little odd, e.g. H3N2 / Spain / K:88I is unexpectedly jagged, and many CIs rise before then abruptly ending (maybe because the mean has reached zero?)
joverlee521
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment on the actual results, but the workflow + viz changes look reasonable to me! Left a question about whether we should centralize where emerging/aa haplotypes get derived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script looked familiar to me and I realized it's almost exactly the same as add_derived_haplotypes.py in seasonal-flu. Should we be running this (and assign_haplotypes.py) as part of run-nextclade.smk in seasonal-flu so that they can just be part of the Nextclade TSV that is downloaded here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, @joverlee521. That's probably the right place to end up. For now, I like having the flexibility to reannotate the original Nextclade file in each workflow, but I made an issue in the seasonal-flu repo for this proposed change.
|
Thanks, @jameshadfield!
The 1-year horizon is there to match our original eLife model's horizon. We will most likely change this in the near future (ha!), since there is so much uncertainty that it's often not informative to have that long of a horizon.
I'll look into this more, but I think these are separate issues. I only notice the last one you mention when the HPDIs converge to the same value as the median as a variant is predicted to fix. There is some jaggedness in the log-transform that you don't see as much in the standard view which probably reflects some numerical rounding issues. |
b61cf5a to
25b85fa
Compare
Update forecasts viz interface to display MLR results for multiple variant classifications on a single page per subtype and geographic resolution, following the pattern from forecasts-ncov [1]. Since the `useModelData` functions lives outside of this repo and expects a single object with a `modelUrl` attribute, this commit creates two separate config objects with one per variant classification. We call the model URL function once per config after the initial config object is copied for each variant classification. This function now accepts arguments for the variant classification (to get the correct S3 URL) and for model date (so we only need the date-based update logic in one place). As I implemented this expanded version of the display, I found it simpler to write the panel titles (the `h2` tags) and descriptions right in the HTML instead of maintaining a separate title per variant classification, subtype, and geographic resolution. This new implementation mimics the forecasts-ncov at the expense of some flexibility in defining different titles and descriptions per subtype (which we have never needed to do). [1] https://github.com/nextstrain/forecasts-ncov/blob/940791b/viz/src/App.jsx Co-authored-by: james hadfield <hadfield.james@gmail.com>
25b85fa to
b25557b
Compare
Description of proposed changes
The main goal of this PR is to automatically fit the MLR to more granular amino acid haplotypes based on current clade annotation and all HA1 substitutions from that parent clade. This finer granularity will allow us to detect new haplotypes that we should be tracking.
To accomplish this goal, I've reorganized the workflow to support fitting the MLR with different variant classifications as in the forecasts-ncov workflow.
Changes include:
emerging_haplotypeand newaa_haplotype)gisaid)results/{data_provenance}/{variant_classification}/{lineage}/{geo_resolution}/mlr/MLR_results.jsons3://nextstrain-data/files/workflows/forecasts-flu/trial/forecast-aa-haplotypes/gisaid/emerging_haplotype/h3n2/region/mlr/MLR_results.jsons3://nextstrain-data/files/workflows/forecasts-flu/gisaid/emerging_haplotype/h3n2/region/mlr/MLR_results.jsonmin_dateandmax_dateto top-level config, since we want the same time periods for all models in a given runTesting locally
To test the new visualization interface locally, run the following commands from inside this repo's directory and this branch:
Open http://127.0.0.1:8000/
The following screenshot shows H1N1pdm regional results with log transform and raw data turned on for both variant classifications:
Outstanding issues
Checklist