A comprehensive stability and performance benchmarking pipeline for differential abundance (DA) and differential expression (DE) analysis using the kompot package on single-cell RNA-seq data. This pipeline provides systematic testing of parameter stability, subsampling robustness, and runtime scaling characteristics on SLURM-based compute clusters.
New Users:
- Installation Guide → Set up your environment
- Complete Workflow → Run your first analysis
- Documentation Library → Find specific guides
Existing Users:
- Processing Commands → Submit jobs and analyses
- Runtime Analysis → Analyze performance scaling
- Troubleshooting → Solve common issues
- Resource Requirements → Optimize your runs
This repository implements a production-ready benchmarking framework with five main components:
- 📋 Configuration Management: Centralized dataset and parameter configurations
- 🔬 Data Subsampling: Generate multiple rounds of subsampled data for robustness testing
- 📊 Stability Testing: Test stability across parameters (ls_factor, sigma, n_components) and cell counts (subsampling)
- 📈 Stability Analysis: Correlation analysis and publication-ready visualization of DA/DE stability
- ⚡ Runtime Analysis: Performance scaling analysis across parameters, cell counts, and gene counts
The pipeline supports:
- Parameter stability: Test sensitivity of DA and DE results to kompot parameters
- Subsampling stability: Measure robustness across different cell counts (100 to 400k+ cells)
- Runtime profiling: Systematic measurement of computational performance
- Multi-tool benchmarking: Compare kompot against other DA/DE methods
- SLURM array jobs: Massive parallelization on compute clusters
- Publication-ready outputs: Automated visualization and correlation analysis
The pipeline uses centralized configuration management for all datasets and parameters.
# List all available datasets and parameters
python processing_config.py --list
# List subsampling configurations
python subsampling_config.py --listKey configuration files:
processing_config.py: Dataset configs, parameter stability settings, SLURM parameterssubsampling_config.py: Subsampling parameters and resource allocation
Generate subsampled datasets from original files. Required for subsampling stability analysis to test robustness across different cell counts. Parameter stability tests (ls_factor, sigma, n_components) use original datasets directly.
# Submit subsampling for a single dataset
python subsampling_config.py --submit aging
# Submit multiple datasets
python subsampling_config.py --submit aging covid bcr-xl
# Submit all configured datasets
python subsampling_config.py --submit-all
# Monitor subsampling progress
squeue -u $USER --name=subsample_agingWhat this creates:
data/subsampled/{dataset}/
├── round_001_n_8009.h5ad # Multiple rounds of subsampled data
├── round_002_n_7988.h5ad # Each with decreasing cell counts
├── round_003_n_7967.h5ad # For testing stability across cell counts
└── ...
📖 Detailed documentation: README_SUBSAMPLING.md
Test stability of both DA and DE results across different parameters and cell counts. Automatically measures runtime for performance analysis.
# Parameter stability: Test sensitivity to kompot parameters
python processing_config.py --submit-stability aging ls_factor # LS factor (201 values)
python processing_config.py --submit-stability aging sigma # Sigma (201 values)
python processing_config.py --submit-stability aging n_components # Diffusion components (1-100)
python processing_config.py --submit-stability aging n_landmarks # GP landmarks (affects runtime!)
python processing_config.py --submit-stability aging topn_genes # Gene count (runtime scaling)
# Subsampling stability: Test robustness across cell counts (requires Step 2)
python processing_config.py --submit-stability aging subsampling # Cell count robustness
# Test all stability parameters for a dataset
python processing_config.py --submit-all-stability aging
# Preview what would be generated
python processing_config.py --preview aging ls_factor
python processing_config.py --preview aging subsamplingWhat this creates:
data/processed/{dataset}_{parameter}/
├── original_ls_factor_0.125.h5ad # Parameter sweeps
├── original_ls_factor_0.250.h5ad # DA and DE results
├── original_ls_factor_0.250.h5ad.runinfo.yml # Runtime data
└── ...
data/processed/{dataset}_kompot_subsampling/
├── round_001_n_8009_subsampling_subsampled_round_001.h5ad # Subsampling stability
├── round_002_n_7988_subsampling_subsampled_round_002.h5ad # DA/DE + runtime data
└── ...
Analyze stability of DA and DE results with correlation analysis and publication-ready plots.
# Run the stability analysis notebook
jupyter notebook plot_stability.ipynbOr programmatically:
from gather_stability_columns import analyze_parameter_stability
# DA stability analysis
results = analyze_parameter_stability(
stability_dir="data/processed/aging_ls_factor",
column_name="kompot_da_log_fold_change_Young_to_Old",
table_type="obs",
save_plots=True,
output_dir="plots/stability_analysis"
)
# DE stability analysis
results = analyze_parameter_stability(
stability_dir="data/processed/aging_ls_factor",
column_name="kompot_de_mahalanobis_Young_to_Old",
table_type="var",
save_plots=True,
output_dir="plots/stability_analysis"
)📖 Detailed documentation: README_GATHER_RESULTS.md
Analyze runtime scaling across parameters, cell counts, and gene counts for both DA and DE.
# Run the runtime analysis notebook
jupyter notebook runtime_analysis.ipynbOr programmatically:
from runtime_utils import collect_all_runtime_data, plot_runtime_scaling
from pathlib import Path
# Collect all runtime data
all_runtime_data = collect_all_runtime_data()
# Plot DA runtime scaling across cell counts
plot_runtime_scaling(
all_runtime_data['subsampling'],
parameter_type='subsampling',
analysis_type='da',
output_dir=Path('runtime_analysis_plots'),
show_legend_separately=True,
min_param_value=101
)
# Plot DE runtime scaling across gene counts
plot_runtime_scaling(
all_runtime_data['topn_genes'],
parameter_type='topn_genes',
analysis_type='de',
output_dir=Path('runtime_analysis_plots'),
show_legend_separately=True
)Runtime analysis capabilities:
- Parameter scaling: How runtime changes with ls_factor, sigma, n_components, n_landmarks
- Cell count scaling: Performance from 100 to 400k+ cells (subsampling)
- Gene count scaling: Performance across different numbers of genes (topn_genes)
- CPU parallelization: Compare 1-CPU vs 16-CPU performance
- DA vs DE performance: Separate analysis for differential abundance and expression
- Unified parameter management across all datasets and analysis stages
- Dynamic parameter sequence generation for stability testing
- Support for multiple dataset "flavors" (e.g., with/without read counts)
- SLURM resource optimization per dataset
- Asynchronous I/O with configurable parallel writers
- Cell-only or cell+read subsampling options
- PCA computation and UMAP preservation
- Comprehensive logging and progress monitoring
- Built-in infinite loop protection
- Parameter stability: ls_factor (201 values), sigma (201 values), n_components (1-100), n_landmarks (131 values)
- Subsampling stability: Original dataset vs. multiple subsampling rounds (100 to 400k+ cells)
- Runtime profiling: Automatic timing for all DA and DE operations
- Both DA and DE: Test stability for differential abundance and differential expression simultaneously
- SLURM array job support for massive parallelization
- Memory-optimized processing with configurable disk storage
- Automatic environment detection and activation
- Intelligent error handling and recovery
- CPU parallelization testing (1-CPU vs 16-CPU comparisons)
- Stability analysis: Parallel data loading, multiple correlation methods (Spearman, Pearson, low-density)
- Runtime analysis: Performance scaling plots across parameters, cell counts, and gene counts
- Parameter-aware plot formatting with scientific notation
- Automated heatmap generation for all dataset/parameter combinations
- Publication-ready outputs for both DA and DE results
data/
├── raw/ # Original datasets (configured in processing_config.py)
├── subsampled/{dataset}/ # Step 2: Subsampled data
│ ├── round_001_n_8009.h5ad # Multiple rounds with decreasing cell counts
│ ├── round_002_n_7988.h5ad
│ └── subsampling_aging_*.log # Subsampling logs
├── processed/{dataset}/ # Step 3: Standard processing results
│ ├── round_001.h5ad # Kompot analysis results
│ ├── round_001.h5ad.obs.csv # Cell metadata with DA results
│ ├── round_001.h5ad.var.csv # Gene metadata with DE results
│ └── round_001.h5ad.runinfo.yml # Processing metadata and timing
└── processed/{dataset}_{parameter}/ # Step 4: Parameter stability results
├── original_ls_factor_0.125.h5ad # Results for each parameter value
├── original_ls_factor_0.250.h5ad # Using original dataset
└── ...
plots/stability_heatmaps/ # Step 5: Analysis outputs
├── aging_ls_factor_obs_kompot_da_log_fold_change_Young_to_Old_spearman.png
├── analysis_summary.csv # Comprehensive results summary
└── ...
logs/ # SLURM job logs
├── subsample_aging_*.{out,err} # Subsampling logs
├── kompot_*.{out,err} # Processing logs
└── ...
- aging: Aging hematopoiesis dataset (Young vs Old, ~8000 cells → 398 subsampled files)
- aging_with_reads: Aging dataset with read count normalization (314 subsampled files)
- covid: COVID-19 PBMC dataset (healthy vs COVID-19, 790 subsampled files)
- post-covid: COVID-19 PBMC dataset (healthy vs Post-COVID-19, 790 subsampled files)
- bcr-xl: BCR-XL stimulation dataset (control vs BCR-XL)
Each dataset configuration includes:
- Original file path: For parameter stability testing
- Output patterns: For subsampling and processing
- Column mappings: Sample, grouping, cell type columns
- Analysis conditions: Two-group comparisons
- Resource requirements: Memory, time, CPU allocation per analysis type
kompot_da_log_fold_change_{cond1}_to_{cond2}: DA without sample variancekompot_da_log_fold_change_zscore_{cond1}_to_{cond2}: DA z-scorekompot_da_neg_log10_fold_change_pvalue_{cond1}_to_{cond2}: DA p-valuekompot_da_log_fold_change_{cond1}_to_{cond2}_sample_var: DA with sample variance
kompot_de_mahalanobis_{cond1}_to_{cond2}: DE Mahalanobis distancekompot_de_mean_lfc_{cond1}_to_{cond2}: Mean log fold changekompot_de_mahalanobis_{cond1}_to_{cond2}_sample_var: DE with sample variance
- DA timing:
da_no_sample_seconds,da_with_sample_seconds - DE timing:
de_no_sample_seconds,de_with_sample_seconds - PCA/diffusion maps timing: Preprocessing step timings
- Configuration: CPU count, batch size, null genes, node information
- Dataset metadata: Number of cells, genes, parameter values
- Correlation heatmaps: DA and DE parameter stability visualization
- Correlation vs parameter plots: Stability trends across parameter ranges
- Summary statistics: Mean/min correlations for DA and DE
- Interactive analysis: Jupyter notebooks (plot_stability.ipynb, runtime_analysis.ipynb)
- Scaling plots: Performance vs parameters, cell counts, gene counts
- DA vs DE comparison: Separate runtime analysis for each
- CPU parallelization: 1-CPU vs 16-CPU performance comparison
- Publication-ready figures: PNG plots with separate legends
Based on analysis of 2,776 real processing jobs across all datasets:
sbatch --mem=32G --time=8:00:00 --cpus-per-task=8 submit_benchmark.sh DATASET- Success rate: 98-100% for datasets <200k cells
- Typical runtime: 3.7-4.9 hours
- Memory usage: 16-32GB sufficient for most datasets
- Processing time breakdown:
- PCA: 0s (often pre-computed)
- Diffusion Maps: 2-65s
- DA (no sample): 36-874s
- DA (with sample): 76-1,732s
- DE (no sample): 31-569s
- DE (with sample): 13,137s (main bottleneck)
sbatch --mem=64G --time=12:00:00 --cpus-per-task=8 submit_benchmark.sh DATASET TOPN_GENES=200- COVID datasets (422k cells): 92.7% failure rate with 16GB, requires optimization
- Memory scaling: ~109GB per 1M cells (based on allocation errors)
- Optimization required: Reduce TOPN_GENES, enable batching, consider subsampling
sbatch --array=1-201 --mem=32G --time=8:00:00 --cpus-per-task=8 submit_benchmark.sh DATASET_PARAM- Scale factor: 100-201x base processing requirements
- Total compute: 201 jobs × 4-8 hours = 800-1,600 CPU-hours per dataset
- Success pattern: Same as base processing, scales linearly
| Dataset Size | Recommended Memory | Time Limit | Success Rate | Common Issues |
|---|---|---|---|---|
| Small (<100k cells) | 32GB | 8 hours | 99.6% | Rare memory errors |
| Medium (100k-200k cells) | 32GB | 8 hours | 98-100% | Stable processing |
| Large (200k-400k cells) | 32-64GB | 12 hours | Variable | Monitor closely |
| Very Large (>400k cells) | 64GB+ | 12+ hours | 7-53% | Requires optimization |
- Primary failure mode: DE without sample variance (16.3% overall failure rate)
- Memory error pattern: "Unable to allocate X GiB for array with shape (cells, genes)"
- Critical threshold: ~400k cells with standard settings
- Mitigation strategies: Reduce TOPN_GENES, enable STORE_ARRAYS_ON_DISK, use batching
- Subsampling: 1 job per dataset (generates 50-790 files depending on dataset)
- Processing: Array jobs (1 job per file, 100-790 parallel jobs)
- Stability testing: Array jobs (1 job per parameter value, 100-201 parallel jobs)
- Analysis: Local machine post-processing (minutes for correlation computation)
Before running any pipeline jobs, you MUST configure micromamba to disable lock files:
# REQUIRED: Prevent job failures in concurrent environments
micromamba config set use_lockfiles FalseWhy: Large SLURM array jobs (100-398 concurrent processes) will fail with mamba lock contention errors without this setting.
📋 Complete setup instructions: INSTALLATION_GUIDE.md
Quick setup for kompot-only usage:
micromamba create -n kompot_v1 python=3.10 -c conda-forge
micromamba activate kompot_v1
micromamba install -c conda-forge -c bioconda pandas numpy scipy matplotlib seaborn anndata scanpy h5py pyyaml tqdm jupyter scikit-learn umap-learn
pip install kompotFull setup for multi-tool benchmarking: See INSTALLATION_GUIDE.md for complete instructions including all tool environments.
- Micromamba/Mamba 1.4.0+ and Python 3.10+
- SLURM cluster with array job support
- Storage: ~500GB per dataset for full workflow
- INSTALLATION_GUIDE.md - Environment setup for kompot-only and multi-tool benchmarking
- README_PROCESSING.md - Core processing workflow and configuration
- README_SUBSAMPLING.md - Subsampling infrastructure and configuration
- README_BENCHMARK.md - Multi-tool benchmarking system
- README_GATHER_RESULTS.md - Stability analysis and plotting guide
- COMPLETE_TOOL_SUMMARY.md - Tool implementation status and capabilities
- BENCHMARKING_PLAN.md - Technical benchmarking framework design
- final_resource_summary.md - Detailed resource usage analysis
- README_resource_section.md - Resource requirements overview
- CLAUDE.md - AI assistant project instructions and codebase context
subsampling_config.py- Subsampling job submission and configurationprocessing_config.py- Processing job submission and parameter stabilitysubmit_benchmark.sh- SLURM submission script for processingsubmit_subsampling.sh- SLURM submission script for subsampling
process_single_file.py- Main processing script for individual H5AD filessubsample_dataset.py- Dataset subsampling implementation
gather_stability_columns.py- DA/DE stability analysis and correlation computationruntime_utils.py- Runtime data extraction and performance plottingplot_stability.ipynb- DA/DE parameter and subsampling stability analysis notebookruntime_analysis.ipynb- Comprehensive runtime scaling analysis notebookbenchmark_plots_accuracy.ipynb- Multi-tool benchmarking accuracy analysis
test_tool_functionality.py- Test tool implementationstest_r_tools_with_renv.py- Test R-based tools with renvrun_comprehensive_tests.py- Complete testing suitevalidate_result_format.py- Validate standardized result formats
count_project_results.py- Count all computational runs across projectinspect_subsampling.py- Monitor subsampling progressinspect_processing.py- Monitor processing progressanalyze_resource_usage.py- Resource usage analysis from logs
# 1. Add dataset to processing_config.py and subsampling_config.py
# 2. Generate subsampled data (for cell count stability)
python subsampling_config.py --submit new_dataset
# 3. Process subsampled data (generates DA, DE, and runtime data)
python processing_config.py --submit-default new_dataset
# 4. Test parameter stability (ls_factor, sigma, n_components, etc.)
python processing_config.py --submit-all-stability new_dataset
# 5. Analyze DA/DE stability
jupyter notebook plot_stability.ipynb
# 6. Analyze runtime scaling
jupyter notebook runtime_analysis.ipynb# Monitor all jobs
squeue -u $USER
# Check specific job types
squeue -u $USER --name=subsample_aging
squeue -u $USER --name=kompot_benchmark
# View logs
tail -f logs/subsample_aging_*.out
tail -f data/subsampled/aging/subsampling_*.log
# Inspect subsampling progress
python inspect_subsampling.py --dataset aging- Configure subsampling: Add to
subsampling_config.pywith input path and parameters - Configure processing: Add to
processing_config.pywith column mappings and conditions - Test pipeline: Run subsampling → processing → stability testing
- Validate results: Check outputs and run correlation analysis
- Define parameter range: Add to
STABILITY_CONFIGSinprocessing_config.py - Implement parameter application: Update
process_single_file.py - Update result parsing: Modify
gather_stability_columns.pyfilename patterns - Test stability analysis: Verify correlation plots work correctly
Please cite the kompot package and relevant datasets when using this pipeline:
@software{kompot2024,
title={kompot: Differential abundance and expression analysis for single-cell data},
author={Setty Lab},
year={2024},
url={https://github.com/settylab/kompot}
}Subsampling Issues:
- Input file not found: Check file paths in
subsampling_config.py - Memory errors: Increase SLURM memory or reduce
max_write_workers - Infinite loops: Fixed in current version with proper stopping conditions
Processing Issues:
- Environment activation failures: Ensure conda/mamba is properly configured
- Missing kompot columns: Check if subsampling completed successfully
- Parameter stability using wrong files: Verify
ORIGINAL_INPUT_FILEenvironment variable
Analysis Issues:
- Missing columns in stability analysis: Use
quick_column_check()to verify available data - Correlation computation errors: Check for sufficient non-null data (>10% required)
- Plot generation failures: Verify parameter value parsing and scientific notation handling
- Check stage-specific documentation (README_SUBSAMPLING.md, README_GATHER_RESULTS.md)
- Examine log files in
logs/directory and dataset subdirectories - Use
--dry-runand--verboseflags for debugging - Inspect intermediate outputs with provided utility scripts
MIT License
Copyright (c) 2025 Setty Lab
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.