QCatch: Automated quality control of single-cell quantifications from alevin-fry and simpleaf.
View the complete QCatch documentation with interactive examples, FAQs, and detailed usage guides.
You need to have Python 3.11, 3.12, or 3.13 installed on your system.
There are several alternative options to install QCatch:
You can install using Conda from Bioconda.
conda install -c bioconda qcatchYou can also install from PyPI using pip:
pip install qcatchTips: If you run into environment issues, you can also use the provided Conda .yml file, which specifies the exact versions of all dependencies to ensure consistency.
conda env create -f qcatch_conda_env.ymlProvide the path to the parent folder for quantification results, or the direct path to a .h5ad file generated by alevin-fry or simpleaf. QCatch will automatically scan the input path, assess data quality, and generate an interactive HTML report that can be viewed directly in your browser.
qcatch \
--input path/to/your/quantification/result \
--output path/to/desired/QC/output/folder \ # if you want another folder for output
--chemistry 10X_3p_v3
--save_filtered_h5adFor details on how to configure chemistries, See chemistry section.
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status
echo "π¦ Downloading QCatch example dataset..."
# Define where to run the tutorial (you can change this path if desired)
CWD=$(pwd) # Current working directory
TUTORIAL_DIR="${CWD}/qcatch_tutorial"
# Clean any existing tutorial directory to ensure a fresh download
rm -rf "$TUTORIAL_DIR" && mkdir -p "$TUTORIAL_DIR"
ZIP_FILE="data.zip"
# Download from Box
wget -O "$ZIP_FILE" "https://umd.box.com/shared/static/zd4sai70uw9fs24e1qx6r41ec50pf45g.zip?dl=1"
# Unzip and clean up
unzip "$ZIP_FILE" -d "$TUTORIAL_DIR"
rm "$ZIP_FILE"
echo "β
Test data downloaded to $TUTORIAL_DIR"π All set! Now letβs run QCatch:
#Set up output directory
OUT_DIR="${TUTORIAL_DIR}/output"
mkdir -p "$OUT_DIR"
# Step2 - Run QCatch
qcatch --input ${TUTORIAL_DIR}/test_data/simpleaf_with_map/quants.h5ad \
--output ${OUT_DIR} \Provide either:
- the path to the parent directory containing quantification results, or
- the direct path to a .h5ad file generated by those tools.
QCatch will automatically detect the input type:
- If a .h5ad file is provided, QCatch will process it directly.
- If a directory is provided, QCatch will first look for an existing .h5ad file inside. If not found, it will fall back to processing the mtx-based quantification results.
See the example directory structures at the end of the Tips section for reference:
If you do not want any modifications in your input folder/files, speaficy the output path, we will save any new results and QC HTML report there.
By default, QCatch saves the QC report and all output files in the input directory by default. If you prefer a different output location, you may specify an output path; however, this is optional.
Specifically:
- If QCatch detects an existing
quants.h5adfile in the input directory and the output path is the same as the input path, QCatch will modify the original .h5ad file in place by appending cell-filtering results toanndata.obs. In addition, it will generate a separate HTML QC report in the input directory. - For MTX-based inputs (i.e., when not using simpleaf v0.19.5 or newer), QCatch will generate a new
.h5adfile containing metadata produced during QCatch processing. This file does NOT include metadata from the original alevin-fry quantification, which remains stored in the original files.
The --chemistry information is used to estimate --n_partitions, which represents the total partition capacity (i.e., the total number of physical droplets or wells generated in an experiment, regardless of whether they contain a cell). This value is critical for accurately defining the "ambient pool" used to model empty droplets. (NOTE: this is distinct from the --chemistry argument in alevin-fry, which refers to the barcode/UMI geometry.)
If you used a standard 10X chemistry, QCatch will first attempt to infer the chemistry from the metadata and use the internal database to get the corresponding number of partitions; If this inference fails, QCatch will stop and prompt you to explicitly provide the chemistry version using the --chemistry argument before rerunning the command. Supported chemistries currently include: '10X_3p_v2', '10X_3p_v3', '10X_3p_v4', '10X_3p_LT', '10X_5p_v3', or '10X_HT'.
For non-10x or custom assays (e.g., sci-RNA-seq3, Drop-seq), users can manually specify the capacity using --n_partitions. We recommend setting this value by rounding the number of processed barcodes (found in the alevin-fry/simpleaf log or the number of rows in the .h5ad file) up to the next 10% increment of the current order of magnitude. For example, If 79,000 barcodes are detected, n_partitions should be set to 80,000; If 144,000 barcodes are detected, n_partitions should be set to 150,000. This option will override any chemistry-based setting for cell-calling.
If you are using simpleaf v0.19.3 or later, the generated .h5ad file already includes gene names. In this case, you do not need to specify the --gene_id2name_file option.
To provide a 'gene id to name mapping' info, the file should be a TSV containing two columnsββgene_idβ (e.g., ENSG00000284733) and βgene_nameβ (e.g., OR4F29)β without header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed, but will not affect the QC report.
If you want to save filtered h5ad file separately, you can specify --save_filtered_h5ad, which is only applicable when QCatch detects the h5ad file as the input.
If you want to use a specified list of valid cell barcodes, you can provide the file path with --valid_cell_list. QCatch will then skip the default cell calling step and use the supplied list instead. The updated .h5ad file will include only one additional column, 'is_retained_cells', containing boolean values based on the specified list.
To reduce runtime, you may enable the --skip_umap_tsne option to bypass dimensionality reduction and visualization steps.
To export the summary metrics, enable the --export_summary_table flag. The summary table will be saved as a separate CSV file in the output directory.
To get debug-level messages and more intermediate computation in cell calling step, you can specify --verbose
If you re-run QCatch analysis on a modified .h5ad file (i.e., an .h5ad file with additional columns added for cell calling results), the existing cell calling-related columns will be removed and then replaced with new results. The new cell calling can be generated either through QCatch's internal method or based on a user-specified list of valid cell barcodes.
# simpleaf
parent_quant_dir/
βββ af_map/
βββ af_quant/
β βββ alevin/
β β βββ quants_mat_cols.txt
β β βββ quants_mat_rows.txt
β β βββ quants_mat.mtx
β β βββ quants.h5ad (available if you use simpleaf after v0.19.3)
β β ...
β βββ featureDump.txt
β βββ quant.json
βββ simpleaf_quant_log.json
# alevin-fry
parent_quant_dir/
βββ alevin/
β βββ quants_mat_cols.txt
β βββ quants_mat_rows.txt
β βββ quants_mat.mtx
βββ featureDump.txt
βββ quant.json
For more advanced options and usage details, see the sections below.
| Flag | Short | Type | Description |
|---|---|---|---|
--input |
-i |
str (Required) |
Path to the input directory containing the quantification output files or to the H5AD file itself. |
--output |
-o |
str(Required) |
Path to the output directory. |
--chemistry |
-c |
str(Recommended) |
Specifies the chemistry used in the experiment, which determines the partition range for the empty_drops step. Supported options: '10X_3p_v2', '10X_3p_v3', '10X_3p_v4', '10X_5p_v3', '10X_3p_LT', '10X_HT'. If you used a standard 10X chemistry (e.g., '10X_3p_v2', '10X_3p_v3') and performed quantification with simpleaf(v0.19.5 or later), QCatch will try to infer the correct chemistry from the metadata. If inference fails, QCatch will stop and prompt you to provide the chemistry explicitly via this flag. |
--save_filtered_h5ad |
-s |
flag (Optional) |
If enabled, qcatch will save a separate .h5ad file containing only the final retained cells. |
--gene_id2name_file |
-g |
str (Optional) |
File provides a mapping from gene IDs to gene names. The file must be a TSV containing two columnsββgene_idβ (e.g., ENSG00000284733) and βgene_nameβ (e.g., OR4F29)βwithout a header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed. |
--valid_cell_list |
-l |
str (Optional) |
File provides a user-specified list of valid cell barcode. The file must be a TSV containing one column with cell barcodes without a header row. If provided, qcatch will skip the internal cell calling steps and and use the supplied list instead |
--n_partitions |
-n |
int (Optional) |
Number of partitions (max number of barcodes to consider for ambient estimation). Use --n_partitions only when working with a custom or unsupported chemistry. When provided, this value will override the chemistry-based configuration during the cell-calling step. |
--remove_doublets |
-d |
flag (Optional) |
If enabled, QCatch will perform doublet detection(use Scrublet tool) and remove detected doublets from the cells retained after cell calling. |
--skip_umap_tsne |
-u |
flag (Optional) |
If provided, skips generation of UMAP and t-SNE plots. |
--export_summary_table |
-x |
flag (Optional) |
If enabled, QCatch will export the summary metrics as a separate CSV file. |
--verbose |
-b |
flag (Optional) |
Enable verbose logging with debug-level messages. |
--version |
-v |
flag (Optional) |
Display the installed version of qcatch. |