Skip to content

Single-Cell Differential Gene Expression #906

@rcannood

Description

@rcannood

Task Motivation

Single-cell RNA sequencing allows for unprecedented resolution in studying cellular heterogeneity and gene expression dynamics. However, analyzing data from multiple experiments is often complicated by batch effects, which can significantly impact the identification of differentially expressed genes (DEGs). Accurate DEG identification in multi-batch scRNA-seq data is crucial for understanding cell type identity, gene regulatory networks, and disease states.

This task, inspired by Nguyen et al. (2023) and Soneson & Robinson (2018), aims to benchmark DEG methods in multi-batch scRNA-seq data. By incorporating diverse datasets and performance metrics, we can provide researchers with a clear understanding of the strengths and weaknesses of various approaches, leading to more reliable and reproducible DEG analysis.

Task Description

Problem: Identify genes differentially expressed between two or more groups of cells in a multi-batch scRNA-seq dataset, while accounting for batch effects.

Input:

  • Count matrix: Genes (rows) x Cells (columns) with expression counts.
  • Cell metadata: Data frame with cell annotations (cell type labels, experimental conditions, batch information).

Output:

  • Ranked DEG list: Genes with associated statistics (p-value, fold change, effect size) indicating the magnitude and significance of differential expression, adjusted for batch effects.

Assumptions:

  • Preprocessed count matrix (quality control, normalization).
  • Accurate cell type annotations.
  • Balanced study design (each batch contains cells from all conditions/groups).

Constraints:

  • Methods should handle high dimensionality, sparsity, and batch effects.
  • Methods and metrics should be computationally efficient for large datasets.

Proposed Datasets

  • Nguyen et al. (2023) datasets:
    • Model-based simulated data (splatter): Controlled environment with varying batch effects, sequencing depths, and zero rates.
    • Model-free simulated data: Real scRNA-seq data with simulated DEGs, incorporating realistic batch effects.
  • dyngen generated datasets: Dynamic gene expression data with complex regulatory networks and batch effects.
  • Soneson & Robinson (2018) datasets (from conquer repository): Consistently processed, analysis-ready public scRNA-seq datasets with abundance estimates for genes and transcripts, including both full-length and UMI protocols.
  • cellxgene census datasets: Pairs of well-annotated cell types across multiple studies, providing real-world data with diverse batch effects.

Initial Methods

DE methods as evaluated in Soneson & Robinson:

  • Bulk RNA-seq methods: edgeR, DESeq2, limma (voom, trend), SAMseq
  • Single-cell specific methods: MAST, SCDE, monocle, scDD, BPSC, DEsingle, D3E

Types of approaches as evaluarted by Nguyen et al.

  • Naïve Methods: Standard DE analysis of pooled uncorrected data.
  • Covariate Models: Parametric DE analysis with a batch covariate (DESeq2, edgeR, limma, MAST).
  • Batch Effect Correction: MNN, scVI, Scanorama.
  • Meta-Analysis: Methods for combining DE results across batches (e.g., Fisher's method, fixed/random effects models).

Control Methods

  • Positive Control: Known marker genes for each cell type.
  • Negative Control: Randomly permuted cell labels (maintaining batch assignments) or a random permutation of genes.

Proposed Metrics

  • Generalized F-score and partial AUPR score (as used in Nguyen et al. 2023).

References

Soneson, C., Robinson, M. Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 15, 255–261 (2018). https://doi.org/10.1038/nmeth.4612

Nguyen, H.C.T., Baik, B., Yoon, S. et al. Benchmarking integration of single-cell differential expression. Nat Commun 14, 1570 (2023). https://doi.org/10.1038/s41467-023-37126-3

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededtaskAdd a new task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions