Kompot Stability and Runtime Benchmarking Pipeline

A comprehensive stability and performance benchmarking pipeline for differential abundance (DA) and differential expression (DE) analysis using the kompot package on single-cell RNA-seq data. This pipeline provides systematic testing of parameter stability, subsampling robustness, and runtime scaling characteristics on SLURM-based compute clusters.

🚀 Quick Navigation

New Users:

Installation Guide → Set up your environment
Complete Workflow → Run your first analysis
Documentation Library → Find specific guides

Existing Users:

Processing Commands → Submit jobs and analyses
Runtime Analysis → Analyze performance scaling
Troubleshooting → Solve common issues
Resource Requirements → Optimize your runs

Overview

This repository implements a production-ready benchmarking framework with five main components:

📋 Configuration Management: Centralized dataset and parameter configurations
🔬 Data Subsampling: Generate multiple rounds of subsampled data for robustness testing
📊 Stability Testing: Test stability across parameters (ls_factor, sigma, n_components) and cell counts (subsampling)
📈 Stability Analysis: Correlation analysis and publication-ready visualization of DA/DE stability
⚡ Runtime Analysis: Performance scaling analysis across parameters, cell counts, and gene counts

The pipeline supports:

Parameter stability: Test sensitivity of DA and DE results to kompot parameters
Subsampling stability: Measure robustness across different cell counts (100 to 400k+ cells)
Runtime profiling: Systematic measurement of computational performance
Multi-tool benchmarking: Compare kompot against other DA/DE methods
SLURM array jobs: Massive parallelization on compute clusters
Publication-ready outputs: Automated visualization and correlation analysis

Complete Workflow

Step 1: Configuration Setup

The pipeline uses centralized configuration management for all datasets and parameters.

# List all available datasets and parameters
python processing_config.py --list

# List subsampling configurations
python subsampling_config.py --list

Key configuration files:

processing_config.py: Dataset configs, parameter stability settings, SLURM parameters
subsampling_config.py: Subsampling parameters and resource allocation

Step 2: Data Subsampling (For Cell Count Stability)

Generate subsampled datasets from original files. Required for subsampling stability analysis to test robustness across different cell counts. Parameter stability tests (ls_factor, sigma, n_components) use original datasets directly.

# Submit subsampling for a single dataset
python subsampling_config.py --submit aging

# Submit multiple datasets
python subsampling_config.py --submit aging covid bcr-xl

# Submit all configured datasets
python subsampling_config.py --submit-all

# Monitor subsampling progress
squeue -u $USER --name=subsample_aging

What this creates:

data/subsampled/{dataset}/
├── round_001_n_8009.h5ad      # Multiple rounds of subsampled data
├── round_002_n_7988.h5ad      # Each with decreasing cell counts
├── round_003_n_7967.h5ad      # For testing stability across cell counts
└── ...

📖 Detailed documentation: README_SUBSAMPLING.md

Step 3: Stability Testing

Test stability of both DA and DE results across different parameters and cell counts. Automatically measures runtime for performance analysis.

# Parameter stability: Test sensitivity to kompot parameters
python processing_config.py --submit-stability aging ls_factor      # LS factor (201 values)
python processing_config.py --submit-stability aging sigma          # Sigma (201 values)
python processing_config.py --submit-stability aging n_components   # Diffusion components (1-100)
python processing_config.py --submit-stability aging n_landmarks    # GP landmarks (affects runtime!)
python processing_config.py --submit-stability aging topn_genes     # Gene count (runtime scaling)

# Subsampling stability: Test robustness across cell counts (requires Step 2)
python processing_config.py --submit-stability aging subsampling    # Cell count robustness

# Test all stability parameters for a dataset
python processing_config.py --submit-all-stability aging

# Preview what would be generated
python processing_config.py --preview aging ls_factor
python processing_config.py --preview aging subsampling

What this creates:

data/processed/{dataset}_{parameter}/
├── original_ls_factor_0.125.h5ad                        # Parameter sweeps
├── original_ls_factor_0.250.h5ad                        # DA and DE results
├── original_ls_factor_0.250.h5ad.runinfo.yml            # Runtime data
└── ...

data/processed/{dataset}_kompot_subsampling/
├── round_001_n_8009_subsampling_subsampled_round_001.h5ad      # Subsampling stability
├── round_002_n_7988_subsampling_subsampled_round_002.h5ad      # DA/DE + runtime data
└── ...

Step 4: Stability Analysis and Visualization

Analyze stability of DA and DE results with correlation analysis and publication-ready plots.

# Run the stability analysis notebook
jupyter notebook plot_stability.ipynb

Or programmatically:

from gather_stability_columns import analyze_parameter_stability

# DA stability analysis
results = analyze_parameter_stability(
    stability_dir="data/processed/aging_ls_factor",
    column_name="kompot_da_log_fold_change_Young_to_Old",
    table_type="obs",
    save_plots=True,
    output_dir="plots/stability_analysis"
)

# DE stability analysis
results = analyze_parameter_stability(
    stability_dir="data/processed/aging_ls_factor",
    column_name="kompot_de_mahalanobis_Young_to_Old",
    table_type="var",
    save_plots=True,
    output_dir="plots/stability_analysis"
)

📖 Detailed documentation: README_GATHER_RESULTS.md

Step 5: Runtime Analysis

Analyze runtime scaling across parameters, cell counts, and gene counts for both DA and DE.

# Run the runtime analysis notebook
jupyter notebook runtime_analysis.ipynb

Or programmatically:

from runtime_utils import collect_all_runtime_data, plot_runtime_scaling
from pathlib import Path

# Collect all runtime data
all_runtime_data = collect_all_runtime_data()

# Plot DA runtime scaling across cell counts
plot_runtime_scaling(
    all_runtime_data['subsampling'],
    parameter_type='subsampling',
    analysis_type='da',
    output_dir=Path('runtime_analysis_plots'),
    show_legend_separately=True,
    min_param_value=101
)

# Plot DE runtime scaling across gene counts
plot_runtime_scaling(
    all_runtime_data['topn_genes'],
    parameter_type='topn_genes',
    analysis_type='de',
    output_dir=Path('runtime_analysis_plots'),
    show_legend_separately=True
)

Runtime analysis capabilities:

Parameter scaling: How runtime changes with ls_factor, sigma, n_components, n_landmarks
Cell count scaling: Performance from 100 to 400k+ cells (subsampling)
Gene count scaling: Performance across different numbers of genes (topn_genes)
CPU parallelization: Compare 1-CPU vs 16-CPU performance
DA vs DE performance: Separate analysis for differential abundance and expression

Key Features

Centralized Configuration Management

Unified parameter management across all datasets and analysis stages
Dynamic parameter sequence generation for stability testing
Support for multiple dataset "flavors" (e.g., with/without read counts)
SLURM resource optimization per dataset

Production-Ready Subsampling Infrastructure

Asynchronous I/O with configurable parallel writers
Cell-only or cell+read subsampling options
PCA computation and UMAP preservation
Comprehensive logging and progress monitoring
Built-in infinite loop protection

Comprehensive Stability Testing

Parameter stability: ls_factor (201 values), sigma (201 values), n_components (1-100), n_landmarks (131 values)
Subsampling stability: Original dataset vs. multiple subsampling rounds (100 to 400k+ cells)
Runtime profiling: Automatic timing for all DA and DE operations
Both DA and DE: Test stability for differential abundance and differential expression simultaneously

High-Performance Processing

SLURM array job support for massive parallelization
Memory-optimized processing with configurable disk storage
Automatic environment detection and activation
Intelligent error handling and recovery
CPU parallelization testing (1-CPU vs 16-CPU comparisons)

Advanced Analysis and Visualization

Stability analysis: Parallel data loading, multiple correlation methods (Spearman, Pearson, low-density)
Runtime analysis: Performance scaling plots across parameters, cell counts, and gene counts
Parameter-aware plot formatting with scientific notation
Automated heatmap generation for all dataset/parameter combinations
Publication-ready outputs for both DA and DE results

Data Structure

data/
├── raw/                                 # Original datasets (configured in processing_config.py)
├── subsampled/{dataset}/                # Step 2: Subsampled data
│   ├── round_001_n_8009.h5ad           # Multiple rounds with decreasing cell counts
│   ├── round_002_n_7988.h5ad
│   └── subsampling_aging_*.log         # Subsampling logs
├── processed/{dataset}/                 # Step 3: Standard processing results
│   ├── round_001.h5ad                  # Kompot analysis results
│   ├── round_001.h5ad.obs.csv          # Cell metadata with DA results
│   ├── round_001.h5ad.var.csv          # Gene metadata with DE results
│   └── round_001.h5ad.runinfo.yml      # Processing metadata and timing
└── processed/{dataset}_{parameter}/     # Step 4: Parameter stability results
    ├── original_ls_factor_0.125.h5ad   # Results for each parameter value
    ├── original_ls_factor_0.250.h5ad   # Using original dataset
    └── ...

plots/stability_heatmaps/               # Step 5: Analysis outputs
├── aging_ls_factor_obs_kompot_da_log_fold_change_Young_to_Old_spearman.png
├── analysis_summary.csv               # Comprehensive results summary
└── ...

logs/                                  # SLURM job logs
├── subsample_aging_*.{out,err}        # Subsampling logs
├── kompot_*.{out,err}                 # Processing logs
└── ...

Supported Datasets

Currently Configured

aging: Aging hematopoiesis dataset (Young vs Old, ~8000 cells → 398 subsampled files)
aging_with_reads: Aging dataset with read count normalization (314 subsampled files)
covid: COVID-19 PBMC dataset (healthy vs COVID-19, 790 subsampled files)
post-covid: COVID-19 PBMC dataset (healthy vs Post-COVID-19, 790 subsampled files)
bcr-xl: BCR-XL stimulation dataset (control vs BCR-XL)

Dataset Requirements

Each dataset configuration includes:

Original file path: For parameter stability testing
Output patterns: For subsampling and processing
Column mappings: Sample, grouping, cell type columns
Analysis conditions: Two-group comparisons
Resource requirements: Memory, time, CPU allocation per analysis type

Analysis Outputs

Differential Abundance Results (in .obs.csv)

kompot_da_log_fold_change_{cond1}_to_{cond2}: DA without sample variance
kompot_da_log_fold_change_zscore_{cond1}_to_{cond2}: DA z-score
kompot_da_neg_log10_fold_change_pvalue_{cond1}_to_{cond2}: DA p-value
kompot_da_log_fold_change_{cond1}_to_{cond2}_sample_var: DA with sample variance

Differential Expression Results (in .var.csv)

kompot_de_mahalanobis_{cond1}_to_{cond2}: DE Mahalanobis distance
kompot_de_mean_lfc_{cond1}_to_{cond2}: Mean log fold change
kompot_de_mahalanobis_{cond1}_to_{cond2}_sample_var: DE with sample variance

Runtime Data (in .runinfo.yml)

DA timing: da_no_sample_seconds, da_with_sample_seconds
DE timing: de_no_sample_seconds, de_with_sample_seconds
PCA/diffusion maps timing: Preprocessing step timings
Configuration: CPU count, batch size, null genes, node information
Dataset metadata: Number of cells, genes, parameter values

Stability Analysis Outputs

Correlation heatmaps: DA and DE parameter stability visualization
Correlation vs parameter plots: Stability trends across parameter ranges
Summary statistics: Mean/min correlations for DA and DE
Interactive analysis: Jupyter notebooks (plot_stability.ipynb, runtime_analysis.ipynb)

Runtime Analysis Outputs

Scaling plots: Performance vs parameters, cell counts, gene counts
DA vs DE comparison: Separate runtime analysis for each
CPU parallelization: 1-CPU vs 16-CPU performance comparison
Publication-ready figures: PNG plots with separate legends

Performance Characteristics

Resource Requirements by Stage

Based on analysis of 2,776 real processing jobs across all datasets:

Standard Processing (Recommended Production Settings)

sbatch --mem=32G --time=8:00:00 --cpus-per-task=8 submit_benchmark.sh DATASET

Success rate: 98-100% for datasets <200k cells
Typical runtime: 3.7-4.9 hours
Memory usage: 16-32GB sufficient for most datasets
Processing time breakdown:
- PCA: 0s (often pre-computed)
- Diffusion Maps: 2-65s
- DA (no sample): 36-874s
- DA (with sample): 76-1,732s
- DE (no sample): 31-569s
- DE (with sample): 13,137s (main bottleneck)

Large Dataset Processing (>400k cells)

sbatch --mem=64G --time=12:00:00 --cpus-per-task=8 submit_benchmark.sh DATASET TOPN_GENES=200

COVID datasets (422k cells): 92.7% failure rate with 16GB, requires optimization
Memory scaling: ~109GB per 1M cells (based on allocation errors)
Optimization required: Reduce TOPN_GENES, enable batching, consider subsampling

Parameter Stability Testing

sbatch --array=1-201 --mem=32G --time=8:00:00 --cpus-per-task=8 submit_benchmark.sh DATASET_PARAM

Scale factor: 100-201x base processing requirements
Total compute: 201 jobs × 4-8 hours = 800-1,600 CPU-hours per dataset
Success pattern: Same as base processing, scales linearly

Resource Requirements by Dataset Size

Dataset Size	Recommended Memory	Time Limit	Success Rate	Common Issues
Small (<100k cells)	32GB	8 hours	99.6%	Rare memory errors
Medium (100k-200k cells)	32GB	8 hours	98-100%	Stable processing
Large (200k-400k cells)	32-64GB	12 hours	Variable	Monitor closely
Very Large (>400k cells)	64GB+	12+ hours	7-53%	Requires optimization

Memory Failure Analysis

Primary failure mode: DE without sample variance (16.3% overall failure rate)
Memory error pattern: "Unable to allocate X GiB for array with shape (cells, genes)"
Critical threshold: ~400k cells with standard settings
Mitigation strategies: Reduce TOPN_GENES, enable STORE_ARRAYS_ON_DISK, use batching

Scaling

Subsampling: 1 job per dataset (generates 50-790 files depending on dataset)
Processing: Array jobs (1 job per file, 100-790 parallel jobs)
Stability testing: Array jobs (1 job per parameter value, 100-201 parallel jobs)
Analysis: Local machine post-processing (minutes for correlation computation)

Prerequisites and Setup

⚠️ Critical Configuration Required

Before running any pipeline jobs, you MUST configure micromamba to disable lock files:

# REQUIRED: Prevent job failures in concurrent environments
micromamba config set use_lockfiles False

Why: Large SLURM array jobs (100-398 concurrent processes) will fail with mamba lock contention errors without this setting.

Environment Setup

📋 Complete setup instructions: INSTALLATION_GUIDE.md

Quick setup for kompot-only usage:

micromamba create -n kompot_v1 python=3.10 -c conda-forge
micromamba activate kompot_v1
micromamba install -c conda-forge -c bioconda pandas numpy scipy matplotlib seaborn anndata scanpy h5py pyyaml tqdm jupyter scikit-learn umap-learn
pip install kompot

Full setup for multi-tool benchmarking: See INSTALLATION_GUIDE.md for complete instructions including all tool environments.

Requirements

Micromamba/Mamba 1.4.0+ and Python 3.10+
SLURM cluster with array job support
Storage: ~500GB per dataset for full workflow

Documentation

📖 Complete Documentation Library

Getting Started

INSTALLATION_GUIDE.md - Environment setup for kompot-only and multi-tool benchmarking
README_PROCESSING.md - Core processing workflow and configuration

Specialized Workflows

README_SUBSAMPLING.md - Subsampling infrastructure and configuration
README_BENCHMARK.md - Multi-tool benchmarking system
README_GATHER_RESULTS.md - Stability analysis and plotting guide

Technical Documentation

COMPLETE_TOOL_SUMMARY.md - Tool implementation status and capabilities
BENCHMARKING_PLAN.md - Technical benchmarking framework design
final_resource_summary.md - Detailed resource usage analysis
README_resource_section.md - Resource requirements overview

Development

CLAUDE.md - AI assistant project instructions and codebase context

Key Scripts and Utilities

Configuration and Submission

subsampling_config.py - Subsampling job submission and configuration
processing_config.py - Processing job submission and parameter stability
submit_benchmark.sh - SLURM submission script for processing
submit_subsampling.sh - SLURM submission script for subsampling

Core Processing

process_single_file.py - Main processing script for individual H5AD files
subsample_dataset.py - Dataset subsampling implementation

Analysis and Visualization

gather_stability_columns.py - DA/DE stability analysis and correlation computation
runtime_utils.py - Runtime data extraction and performance plotting
plot_stability.ipynb - DA/DE parameter and subsampling stability analysis notebook
runtime_analysis.ipynb - Comprehensive runtime scaling analysis notebook
benchmark_plots_accuracy.ipynb - Multi-tool benchmarking accuracy analysis

Testing and Validation

test_tool_functionality.py - Test tool implementations
test_r_tools_with_renv.py - Test R-based tools with renv
run_comprehensive_tests.py - Complete testing suite
validate_result_format.py - Validate standardized result formats

Utilities and Inspection

count_project_results.py - Count all computational runs across project
inspect_subsampling.py - Monitor subsampling progress
inspect_processing.py - Monitor processing progress
analyze_resource_usage.py - Resource usage analysis from logs

Quick Start Examples

Complete Workflow for New Dataset

# 1. Add dataset to processing_config.py and subsampling_config.py

# 2. Generate subsampled data (for cell count stability)
python subsampling_config.py --submit new_dataset

# 3. Process subsampled data (generates DA, DE, and runtime data)
python processing_config.py --submit-default new_dataset

# 4. Test parameter stability (ls_factor, sigma, n_components, etc.)
python processing_config.py --submit-all-stability new_dataset

# 5. Analyze DA/DE stability
jupyter notebook plot_stability.ipynb

# 6. Analyze runtime scaling
jupyter notebook runtime_analysis.ipynb

Monitoring and Debugging

# Monitor all jobs
squeue -u $USER

# Check specific job types
squeue -u $USER --name=subsample_aging
squeue -u $USER --name=kompot_benchmark

# View logs
tail -f logs/subsample_aging_*.out
tail -f data/subsampled/aging/subsampling_*.log

# Inspect subsampling progress
python inspect_subsampling.py --dataset aging

Contributing

Adding New Datasets

Configure subsampling: Add to subsampling_config.py with input path and parameters
Configure processing: Add to processing_config.py with column mappings and conditions
Test pipeline: Run subsampling → processing → stability testing
Validate results: Check outputs and run correlation analysis

Adding New Parameters

Define parameter range: Add to STABILITY_CONFIGS in processing_config.py
Implement parameter application: Update process_single_file.py
Update result parsing: Modify gather_stability_columns.py filename patterns
Test stability analysis: Verify correlation plots work correctly

Citation

Please cite the kompot package and relevant datasets when using this pipeline:

@software{kompot2024,
  title={kompot: Differential abundance and expression analysis for single-cell data},
  author={Setty Lab},
  year={2024},
  url={https://github.com/settylab/kompot}
}

Troubleshooting

Common Issues by Stage

Subsampling Issues:

Input file not found: Check file paths in subsampling_config.py
Memory errors: Increase SLURM memory or reduce max_write_workers
Infinite loops: Fixed in current version with proper stopping conditions

Processing Issues:

Environment activation failures: Ensure conda/mamba is properly configured
Missing kompot columns: Check if subsampling completed successfully
Parameter stability using wrong files: Verify ORIGINAL_INPUT_FILE environment variable

Analysis Issues:

Missing columns in stability analysis: Use quick_column_check() to verify available data
Correlation computation errors: Check for sufficient non-null data (>10% required)
Plot generation failures: Verify parameter value parsing and scientific notation handling

Getting Help

Check stage-specific documentation (README_SUBSAMPLING.md, README_GATHER_RESULTS.md)
Examine log files in logs/ directory and dataset subdirectories
Use --dry-run and --verbose flags for debugging
Inspect intermediate outputs with provided utility scripts

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
benchmarking		benchmarking
renv_projects/seurat-env		renv_projects/seurat-env
.gitignore		.gitignore
BENCHMARKING_PLAN.md		BENCHMARKING_PLAN.md
CITATION.cff		CITATION.cff
COMPLETE_TOOL_SUMMARY.md		COMPLETE_TOOL_SUMMARY.md
INSTALLATION_GUIDE.md		INSTALLATION_GUIDE.md
LICENSE		LICENSE
README.md		README.md
README_BENCHMARK.md		README_BENCHMARK.md
README_GATHER_RESULTS.md		README_GATHER_RESULTS.md
README_PROCESSING.md		README_PROCESSING.md
README_SUBSAMPLING.md		README_SUBSAMPLING.md
README_resource_section.md		README_resource_section.md
analyze_resource_usage.py		analyze_resource_usage.py
benchmark_plots_accuracy.ipynb		benchmark_plots_accuracy.ipynb
count_project_results.py		count_project_results.py
detailed_error_analysis.py		detailed_error_analysis.py
detect_mamba_env.sh		detect_mamba_env.sh
example_gene_subset_analysis.py		example_gene_subset_analysis.py
final_resource_summary.md		final_resource_summary.md
fix_missing_dependencies.sh		fix_missing_dependencies.sh
flexible_de_tool_comparison.py		flexible_de_tool_comparison.py
gather_stability_columns.py		gather_stability_columns.py
inspect_processing.py		inspect_processing.py
inspect_subsampling.py		inspect_subsampling.py
notebook_de_tool_comparison.py		notebook_de_tool_comparison.py
notebook_example_hvg_subset.py		notebook_example_hvg_subset.py
plot_stability.ipynb		plot_stability.ipynb
process_single_file.py		process_single_file.py
processing_config.py		processing_config.py
prototype_processing.ipynb		prototype_processing.ipynb
reorganize_stability_results.py		reorganize_stability_results.py
resource_summary_report.py		resource_summary_report.py
run_comprehensive_tests.py		run_comprehensive_tests.py
run_r_with_renv.sh		run_r_with_renv.sh
runtime_analysis.ipynb		runtime_analysis.ipynb
runtime_utils.py		runtime_utils.py
setup_complete_r_environments.sh		setup_complete_r_environments.sh
setup_environments.sh		setup_environments.sh
setup_r_environments_with_renv.sh		setup_r_environments_with_renv.sh
simple_tool_comparison.py		simple_tool_comparison.py
submit_benchmark.sh		submit_benchmark.sh
submit_missing_jobs.sh		submit_missing_jobs.sh
submit_subsampling.sh		submit_subsampling.sh
subsample_dataset.py		subsample_dataset.py
subsampling.ipynb		subsampling.ipynb
subsampling_config.py		subsampling_config.py
test_all_tools.py		test_all_tools.py
test_benchmarking_pipeline.py		test_benchmarking_pipeline.py
test_environment_dependencies.sh		test_environment_dependencies.sh
test_r_tools_with_renv.py		test_r_tools_with_renv.py
test_subsample.py		test_subsample.py
test_subsample_simple.py		test_subsample_simple.py
test_table_type_detection.py		test_table_type_detection.py
test_tool_functionality.py		test_tool_functionality.py
validate_analysis.py		validate_analysis.py
validate_result_format.py		validate_result_format.py

License

settylab/kompot_de_benchmark

Folders and files

Latest commit

History

Repository files navigation

Kompot Stability and Runtime Benchmarking Pipeline

🚀 Quick Navigation

Overview

Complete Workflow

Step 1: Configuration Setup

Step 2: Data Subsampling (For Cell Count Stability)

Step 3: Stability Testing

Step 4: Stability Analysis and Visualization

Step 5: Runtime Analysis

Key Features

Centralized Configuration Management

Production-Ready Subsampling Infrastructure

Comprehensive Stability Testing

High-Performance Processing

Advanced Analysis and Visualization

Data Structure

Supported Datasets

Currently Configured

Dataset Requirements

Analysis Outputs

Differential Abundance Results (in .obs.csv)

Differential Expression Results (in .var.csv)

Runtime Data (in .runinfo.yml)

Stability Analysis Outputs

Runtime Analysis Outputs

Performance Characteristics

Resource Requirements by Stage

Standard Processing (Recommended Production Settings)

Large Dataset Processing (>400k cells)

Parameter Stability Testing

Resource Requirements by Dataset Size

Memory Failure Analysis

Scaling

Prerequisites and Setup

⚠️ Critical Configuration Required

Environment Setup

Requirements

Documentation

📖 Complete Documentation Library

Getting Started

Specialized Workflows

Technical Documentation

Development

Key Scripts and Utilities

Configuration and Submission

Core Processing

Analysis and Visualization

Testing and Validation

Utilities and Inspection

Quick Start Examples

Complete Workflow for New Dataset

Monitoring and Debugging

Contributing

Adding New Datasets

Adding New Parameters

Citation

Troubleshooting

Common Issues by Stage

Getting Help

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages