Fixup: Add date annotations for rare genotypes #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
This PR adds collection dates to the ingest metadata output for six samples.
These samples were force-included in the Nextclade dataset tree to increase the representation of rare genotypes in the tree. However, these samples have empty date fields in the metadata output from NCBI Datasets. This results in the samples being removed by the TreeTime clock filter.
Fortunately, the NCBI metadata includes strain names for these six samples, and the collection dates can be extracted from the strain names.
This PR adds the collection dates (which were extracted manually from the strain names) for the six samples to
ingest/defaults/annotations.tsv, which results in collection dates being included in the ingest metadata output, and also results in the samples being included by TreeTime in the Nextclade dataset tree.Related issue(s)
#28
Checklist