A powerful, fast, and user-friendly Python package for deep characterization of single-cell feature expression patterns.
scfeatureprofiler provides a suite of statistical tools to analyze single-cell data (e.g., scRNA-seq, CITE-seq) and answer fundamental biological questions:
- Cluster Quality: Are my clusters well-defined and biologically meaningful?
- Marker Discovery: Which features are robust and specific markers for each cell group?
- Feature Activity: In which cell groups is a specific feature actively expressed?
The package is designed for performance, with a parallelized backend that can handle extremely large datasets, including out-of-core analysis for data that doesn't fit into memory.
- Multi-Interface: Use it as a Python library in your Jupyter notebooks or as a command-line tool for script-based workflows.
- Flexible Input: Works directly with
AnnDataobjects,pandas.DataFrame, ornumpyarrays. - Robust Cluster Validation: Includes an
evaluate_clusteringfunction using silhouette scores to quantify cluster quality before marker discovery. - Data-Driven Marker Selection: Implements a dynamic, clustering-based method to automatically identify the best markers without arbitrary thresholds.
- High Performance: Parallelized using
joblibto use all available CPU cores for rapid analysis. - Scalable: Supports out-of-core computation for memory-mapped
AnnDataobjects, enabling analysis of millions of cells.
You can install scfeatureprofiler directly from PyPI:
pip install scfeatureprofilerTo include support for AnnData objects (recommended), install with the [anndata] extra:
pip install scfeatureprofiler[anndata]To install all dependencies for development, use:
# Clone the repository first
git clone https://github.com/zqzneptune/SingleCellFeatureProfiler.git
cd SingleCellFeatureProfiler
pip install -e ".[all]"scfeatureprofiler is designed to be intuitive. Here are two examples for the most common use cases.
This is the recommended workflow inside a Jupyter notebook after you have performed clustering.
import scanpy as sc
from scfeatureprofiler import evaluate_clustering, find_marker_features, select_robust_markers
# 1. Load your clustered single-cell data
adata = sc.read_h5ad("path/to/your_clustered_data.h5ad")
# 2. (Recommended) Evaluate clustering quality first
# This helps ensure your clusters are meaningful before finding markers.
cluster_report = evaluate_clustering(adata, cluster_key='leiden')
# A good cluster should have a silhouette score > 0.25.
# 3. Find all potential marker features for your clusters
# This returns a comprehensive pandas DataFrame for deep exploration.
all_markers_df = find_marker_features(
data=adata,
group_by='leiden'
)
# 4. Automatically select the top 10 best markers per cluster
# This function uses a data-driven method to find natural cutoffs.
top_markers_df = select_robust_markers(all_markers_df, top_n=10)
print("--- Top 5 Robust Markers for each Cluster ---")
print(top_markers_df.groupby('group').head(5))
# 5. Convert to a dictionary for Scanpy plotting functions
top_markers_dict = top_markers_df.groupby('group')['feature_id'].apply(list).to_dict()
sc.pl.dotplot(adata, top_markers_dict, groupby='leiden')If you prefer to work from the terminal, you can perform the entire marker discovery pipeline with a single command.
Input File:
my_data.h5ad: An AnnData file with clustering results inadata.obs['leiden'].
Command:
scfeatureprofiler find-markers \
--input my_data.h5ad \
--group-by leiden \
--output ranked_markers.csvOutput (ranked_markers.csv):
This produces a detailed CSV file with all statistically significant markers, ranked by group and significance.
feature_id,group,stability_score,norm_score,pct_expressing,log2fc_all,fdr_marker,...
CD8A,CD8 T-cell,1.0,1.0,95.4,8.2,1.2e-250,...
GZMB,CD8 T-cell,1.0,0.98,92.1,7.5,4.5e-245,...
MS4A1,B-cell,1.0,1.0,98.2,9.5,8.1e-280,...
...
scfeatureprofiler find-markers: A full pipeline to select, profile, and rank robust marker features.scfeatureprofiler profile: Generate a detailed statistical profile for a user-provided list of features.scfeatureprofiler activity: Summarize a profile to show in which groups features are "ON".
Use scfeatureprofiler --help or scfeatureprofiler find-markers --help for a full list of options.