Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
default pickle protocol.
- {pull}`???` adapts the interactive debugger integration to Python 3.14's
updated `pdb` behaviour and keeps pytest-style capturing intact.
- {pull}`???` updates the comparison to other tools documentation and adds a section on
the Common Workflow Language (CWL) and WorkflowHub.

## 0.5.7 - 2025-11-22

Expand Down
159 changes: 73 additions & 86 deletions docs/source/explanations/comparison_to_other_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,124 +10,111 @@ in other WMFs.

## [snakemake](https://github.com/snakemake/snakemake)

Pros

- Very mature library and probably the most adapted library in the realm of scientific
workflow software.
- Can scale to clusters and use Docker images.
- Supports Python and R.
- Automatic test case generation.

Cons

- Need to learn snakemake's syntax which is a mixture of Make and Python.
- No debug mode.
- Seems to have no plugin system.
Snakemake is one of the most widely adopted workflow systems in scientific computing. It
scales from local execution to clusters and cloud environments, with built-in support
for containers and conda environments. Workflows are defined using a DSL that combines
Make-style rules with Python, and can be exported to CWL for portability.

## [ploomber](https://github.com/ploomber/ploomber)

General

- Strong focus on machine learning pipelines, training, and deployment.
- Integration with tools such as MLflow, Docker, AWS Batch.
- Tasks can be defined in yaml, python files, Jupyter notebooks or SQL.

Pros

- Conversion from Jupyter notebooks to tasks via
[soorgeon](https://github.com/ploomber/soorgeon).

Cons

- Programming in Jupyter notebooks increases the risk of coding errors (e.g.
side-effects).
- Supports parametrizations in form of cartesian products in `yaml` files, but not more
powerful parametrizations.
Ploomber focuses on machine learning pipelines with strong integration into MLflow,
Docker, and AWS Batch. Tasks can be defined in YAML, Python files, Jupyter notebooks, or
SQL, and it can convert notebooks into pipeline tasks.

## [Waf](https://waf.io)

Pros

- Mature library.
- Can be extended.

Cons

- Focus on compiling binaries, not research projects.
- Bus factor of 1.
Waf is a mature build system primarily designed for compiling software projects. It
handles complex build dependencies and can be extended with Python.

## [nextflow](https://github.com/nextflow-io/nextflow)

- Tasks are scripted using Groovy which is a superset of Java.
- Supports AWS, Google, Azure.
- Supports Docker, Shifter, Podman, etc.
Nextflow is a workflow system popular in bioinformatics that runs on AWS, Google Cloud,
and Azure. It uses Groovy (a JVM language) for scripting and has strong support for
containers including Docker, Singularity, and Podman.

## [Kedro](https://github.com/kedro-org/kedro)

Pros

- Mature library, used by some institutions and companies. Created inside McKinsey.
- Provides the full package: templates, pipelines, deployment
Kedro is a mature workflow framework developed at McKinsey that provides project
templates, data catalogs, and deployment tooling. It is designed for production machine
learning pipelines with a focus on software engineering best practices.

## [pydoit](https://github.com/pydoit/doit)

General

- A general task runner which focuses on command line tools.
- You can think of it as an replacement for make.
- Powers Nikola, a static site generator.
pydoit is a general-purpose task runner that serves as a Python replacement for Make. It
focuses on executing command-line tools and powers projects like Nikola, a static site
generator.

## [Luigi](https://github.com/spotify/luigi)

General

- A build system written by Spotify.
- Designed for any kind of long-running batch processes.
- Integrates with many other tools like databases, Hadoop, Spark, etc..

Cons

- Very complex interface and a lot of stuff you probably don't need.
- [Development](https://github.com/spotify/luigi/graphs/contributors) seems to stall.
Luigi is a workflow system built by Spotify for long-running batch processes. It
integrates with Hadoop, Spark, and various databases for large-scale data pipelines.
Development has slowed in recent years.

## [sciluigi](https://github.com/pharmbio/sciluigi)

sciluigi aims to be a lightweight wrapper around luigi.

Cons

- [Development](https://github.com/pharmbio/sciluigi/graphs/contributors) has basically
stalled since 2018.
- Not very popular compared to its lifetime.
sciluigi is a lightweight wrapper around Luigi aimed at simplifying scientific workflow
development. It reduces some of Luigi's boilerplate for research use cases. Development
has stalled since 2018.

## [scipipe](https://github.com/scipipe/scipipe)

Cons
SciPipe is a workflow library written in Go for building robust, flexible pipelines
using Flow-Based Programming principles. It compiles workflows to fast binaries and is
designed for bioinformatics and cheminformatics applications involving command-line
tools.

- [Development](https://github.com/scipipe/scipipe/graphs/contributors) slowed down.
- Written in Go.
## [SCons](https://github.com/SCons/scons)

## [Scons](https://github.com/SCons/scons)

Pros

- Mature library.

Cons

- Seems to have no plugin system.
SCons is a mature, cross-platform software construction tool that serves as an improved
substitute for Make. It uses Python scripts for configuration and has built-in support
for C, C++, Java, Fortran, and automatic dependency analysis.

## [pypyr](https://github.com/pypyr/pypyr)

General
pypyr is a task-runner for automation pipelines defined in YAML. It provides built-in
steps for common operations like loops, conditionals, retries, and error handling
without requiring custom code, and is often used for CI/CD and DevOps automation.

## [ZenML](https://github.com/zenml-io/zenml)

- A general task-runner with task defined in yaml files.
ZenML is an MLOps framework for building portable ML pipelines that can run on various
orchestrators including Kubernetes, AWS SageMaker, GCP Vertex AI, Kubeflow, and Airflow.
It focuses on productionizing ML workflows with features like automatic
containerization, artifact tracking, and native caching.

## [zenml](https://github.com/zenml-io/zenml)
## [Flyte](https://github.com/flyteorg/flyte)

## [flyte](https://github.com/flyteorg/flyte)
Flyte is a Kubernetes-native workflow orchestration platform for building
production-grade data and ML pipelines. It provides automatic retries, checkpointing,
failure recovery, and scales dynamically across cloud providers including AWS, GCP, and
Azure.

## [pipefunc](https://github.com/pipefunc/pipefunc)

A tool for executing graphs made out of functions. More focused on computational
compared to workflow graphs.
pipefunc is a lightweight library for creating function pipelines as directed acyclic
graphs (DAGs) in pure Python. It automatically handles execution order, supports
map-reduce operations, parallel execution, and provides resource profiling.

## [Common Workflow Language (CWL)](https://www.commonwl.org/)

CWL is an open standard for describing data analysis workflows in a portable,
language-agnostic format. Its primary goal is to enable workflows to be written once and
executed across different computing environments—from local workstations to clusters,
cloud, and HPC systems—without modification. Workflows described in CWL can be
registered on [WorkflowHub](https://workflowhub.eu/) for sharing and discovery following
FAIR (Findable, Accessible, Interoperable, Reusable) principles.

CWL is particularly prevalent in bioinformatics and life sciences where reproducibility
across institutions is critical. Tools that support CWL include
[cwltool](https://github.com/common-workflow-language/cwltool) (the reference
implementation), [Toil](https://github.com/DataBiosphere/toil),
[Arvados](https://arvados.org/), and [REANA](https://reanahub.io/). Some workflow
systems like Snakemake and Nextflow can export workflows to CWL format.

pytask is not a CWL-compliant tool because it operates on a fundamentally different
model. CWL describes workflows as graphs of command-line tool invocations where data
flows between tools via files. pytask, in contrast, orchestrates Python functions that
can execute arbitrary code, manipulate data in memory, call APIs, or perform any
operation available in Python. This Python-native approach enables features like
interactive debugging but means pytask workflows cannot be represented in CWL's
command-line-centric specification.