diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 99fe6c8..f31ec54 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -17,8 +17,8 @@ jobs:
include:
- os: ubuntu-latest
python: "3.12"
- - os: macOS-latest
- python: "3.12"
+ # - os: macOS-latest
+ # python: "3.12"
timeout-minutes: 15
steps:
@@ -39,6 +39,6 @@ jobs:
import lamindb as ln
ln.Project(name='Arrayloader benchmarks v2').save()
"
- - run: python scripts/run_data_loading_benchmark_on_tahoe100m.py MappedCollection
- - run: python scripts/run_data_loading_benchmark_on_tahoe100m.py scDataset
- - run: python scripts/run_data_loading_benchmark_on_tahoe100m.py annbatch
+ - run: python scripts/run_loading_benchmark_on_collection.py MappedCollection
+ - run: python scripts/run_loading_benchmark_on_collection.py scDataset
+ - run: python scripts/run_loading_benchmark_on_collection.py annbatch
diff --git a/README.md b/README.md
index c958dff..d84d070 100644
--- a/README.md
+++ b/README.md
@@ -1,31 +1,43 @@
-# `arrayloader-benchmarks`: Data loader benchmarks for scRNA-seq counts et al.
+# Data loader benchmarks for scRNA-seq counts et al.
_A collaboration between scverse, Lamin, and anyone interested in contributing!_
This repository contains benchmarking scripts & utilities for scRNA-seq data loaders and allows to collaboratively contribute new benchmarking results.
-A user can choose between different benchmarking dataset collections:
+## Quickstart
-https://lamin.ai/laminlabs/arrayloader-benchmarks/collections
+Setup:
-
+```bash
+git clone https://github.com/laminlabs/arrayloader-benchmarks
+cd arrayloader-benchmarks
+uv pip install -e ".[scdataset,annbatch]" # provide tools you'd like to install
+lamin connect laminlabs/arrayloader-benchmarks # to contribute results to the hosted lamindb instance, call `lamin init` to create a new lamindb instance
+```
Typical calls of the main benchmarking script are:
+```bash
+cd scripts
+python run_loading_benchmark_on_collection.py annbatch # run annbatch on collection Tahoe100M_tiny, n_datasets = 1
+python run_loading_benchmark_on_collection.py MappedCollection # run MappedCollection
+python run_loading_benchmark_on_collection.py scDataset # run scDataset
+python run_loading_benchmark_on_collection.py annbatch --n_datasets -1 # run against all datasets in the collection
+python run_loading_benchmark_on_collection.py annbatch --collection Tahoe100M --n_datasets -1 # run against the full 100M cells
+python run_loading_benchmark_on_collection.py annbatch --collection Tahoe100M --n_datasets 1 # run against the the first dataset, 2M cells
+python run_loading_benchmark_on_collection.py annbatch --collection Tahoe100M --n_datasets 5 # run against the the first dataset, 10M cells
```
-python scripts/run_data_loading_benchmark_on_tahoe100m.py annbatch # run with collection Tahoe100M_tiny, n_datasets = 1
-python scripts/run_data_loading_benchmark_on_tahoe100m.py MappedCollection # run MappedCollection
-python scripts/run_data_loading_benchmark_on_tahoe100m.py scDataset # run scDataset
-python scripts/run_data_loading_benchmark_on_tahoe100m.py annbatch --n_datasets -1 # run against all datasets in the collection
-python scripts/run_data_loading_benchmark_on_tahoe100m.py annbatch --collection Tahoe100M --n_datasets -1 # run against the full 100M cells
-python scripts/run_data_loading_benchmark_on_tahoe100m.py annbatch --collection Tahoe100M --n_datasets 1 # run against the the first dataset, 2M cells
-python scripts/run_data_loading_benchmark_on_tahoe100m.py annbatch --collection Tahoe100M --n_datasets 5 # run against the the first dataset, 10M cells
-```
-Parameters and results for each run are automatically tracked in a parquet file. Source code and datasets are tracked via data lineage.
+You can choose between different benchmarking [dataset collections](https://lamin.ai/laminlabs/arrayloader-benchmarks/collections).
+
+
+
+
-
+When running the script, [parameters and results](https://lamin.ai/laminlabs/arrayloader-benchmarks/artifact/0EiozNVjberZTFHa) are automatically tracked in a parquet file, along with source code, run environment, and input and output datasets.
-Results can be downloaded and reproduced from here: https://lamin.ai/laminlabs/arrayloader-benchmarks/artifact/0EiozNVjberZTFHa
+
+
+
Note: A previous version of this repo contained the benchmarking scripts accompanying the 2024 blog post: [lamin.ai/blog/arrayloader-benchmarks](https://lamin.ai/blog/arrayloader-benchmarks).
diff --git a/scripts/run_data_loading_benchmark_on_tahoe100m.py b/scripts/run_loading_benchmark_on_collection.py
similarity index 100%
rename from scripts/run_data_loading_benchmark_on_tahoe100m.py
rename to scripts/run_loading_benchmark_on_collection.py