Skip to content

Conversation

@tsibley
Copy link
Contributor

@tsibley tsibley commented Sep 17, 2025

Resolves: #420

Checklist

  • Checks pass
  • Update changelog

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked as expected for me when I tried it locally with zika.

It did flag an unexpected behavior for workflows that are not fully nextstrain run compatible, specifically ones that define workdir in the Snakefile (e.g. rabies/ingest). The outputs were written to the unmanaged pathogen repository instead of the analysis directory. As expected, I see the same behavior with "normally" setup pathogens

nextstrain setup rabies@main 
nextstrain run rabies@main ingest /tmp/rabies-ingest

The results end up in ~/.nextstrain/pathogens/rabies/main\=NVQWS3Q\=/ingest/.

Should nextstrain run specifically guard against this by specifying --directory=build_volume?

self.registration_path = self.path / "nextstrain-pathogen.yaml"

if self.registration_path.exists():
self.registration = read_pathogen_registration(self.registration_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breifly wondered if this should also check for compatibility similar to test_compatibility. Probably not since, we already get a warning if the workflow is not registered

if opts.workflow not in pathogen.registered_workflows():
print(f"The {opts.workflow!r} workflow is not registered as a compatible workflow, but trying to run anyways.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we get #472 in then yeah, I'd add a check for nextstrain run compatibility too.

@tsibley
Copy link
Contributor Author

tsibley commented Sep 23, 2025

The outputs were written to the unmanaged pathogen repository instead of the analysis directory. As expected, I see the same behavior with "normally" setup pathogens […]

The results end up in ~/.nextstrain/pathogens/rabies/main\=NVQWS3Q\=/ingest/.

Well, that's an undesirable behaviour! Good catch.

In general, there's only so much we can do if you're ignoring warnings and trying to run non-compatible workflows under nextstrain run. The workflow can do arbitrary things, so we can really only trust what it declares about itself.

That said, I think your suggestion for avoiding this behaviour is good and is a measure we can take.

Should nextstrain run specifically guard against this by specifying --directory=build_volume?

Probably, yeah.

(Explaining aloud, I know you know this…)

This essentially tells Snakemake to ignore the workflow's own workdir directive. For generated inputs/outputs that'll be fine, but it's gonna break workflow-provided inputs (e.g. data/reference.fasta or what not). For example, when I implement this guard:

diff --git a/nextstrain/cli/command/run.py b/nextstrain/cli/command/run.py
index d278fb9..d307780 100644
--- a/nextstrain/cli/command/run.py
+++ b/nextstrain/cli/command/run.py
@@ -274,6 +274,14 @@ def run(opts):
         *(["--forceall"]
             if opts.force else []),
 
+        # Explicitly use Snakemake's current working directory as the
+        # workflow's workdir, overriding any "workdir:" directive the workflow
+        # may include.  Snakemake uses the cwd by default in the absence of any
+        # "workdir:" directive, but we want to _always_ use it to avoid writing
+        # into the pathogen/workflow source directories if a non-compatible
+        # workflow is run.
+        "--directory=.",
+
         # Workdir will be the analysis volume (/nextstrain/build in a
         # containerized runtime), so explicitly point to the Snakefile.
         "--snakefile=%s/Snakefile" % (

the behaviour becomes:

$ nextstrain run rabies@main ingest /tmp/rabies
The 'ingest' workflow is not registered as a compatible workflow, but trying to run anyways.
Running the 'ingest' workflow for pathogen rabies@main
WorkflowError in file "/nextstrain/pathogen/ingest/Snakefile", line 9:
Workflow defines configfile defaults/config.yaml but it is not present or accessible (full checked path: /nextstrain/build/defaults/config.yaml).

which is probably better even though (because!) it doesn't "work"?

…rectives in workflows

Suggested by @joverlee521 in review¹ to avoid writing into the
pathogen/workflow source directories when non-compatible (or broken)
workflows are used with `nextstrain run`.

¹ <#476 (review)>

Co-authored-by: Jover Lee <joverlee521@gmail.com>
@tsibley
Copy link
Contributor Author

tsibley commented Sep 23, 2025

For example, when I implement this guard:

Pushed as acd485d.

@tsibley tsibley merged commit 0dd4d40 into master Sep 23, 2025
1 check passed
@tsibley tsibley deleted the trs/run/unmanaged-pathogens branch September 23, 2025 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nextstrain run: unmanaged pathogen support?

4 participants