Skip to content

Conversation

@joverlee521
Copy link
Contributor

Description of proposed changes

Modify ingest workflow to upload OPEN and RESTRICTED sequences in separate files to remove the duplicate OPEN records hosted on S3. Updates the phylogenetic inputs to start from the separate files.

Related issue(s)

Follow up to #342
Similar changes to nextstrain/rsv#112

Checklist

  • Checks pass
  • Update changelog

Extract "OPEN" and "RESTRICTED" data into separate files that are
uploaded to S3 separately. This will reduce the amount of duplicate data
that we host on S3.

Outside of the changes in the workflow, we should delete the previously
uploaded "*_with_restricted" files from S3 so that they are not confused
with the new "*_restricted" files added here.
Since the previous commit separates the OPEN and RESTRICTED files on S3, 
update the phylo config to start from these multiple inputs.
@joverlee521
Copy link
Contributor Author

Ah, an issue with splitting into separate files is that we then need to modify trigger_build to work with the separate files. (Would also need to update notify_on_metadata_diff, but maybe that just gets removed depending on discussion in #346).

I'm considering taking this opportunity to update mpox to follow the Standardize GitHub Action workflows for pathogen repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants