-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Task motivation
Gene Regulatory Network (GRN) inference is pivotal in systems biology, offering profound insights into the complex mechanisms that govern gene expression and cellular behavior. These insights are crucial for advancing our understanding of biological processes and have significant implications in medical research, particularly in developing targeted therapies and understanding disease mechanisms.
Computational Challenges
Despite its importance, GRN inference from single-cell RNA-Seq data is challenged by the high dimensionality of the data, inherent data noise, sparsity of the data, sparsity of the networks to be inferred, the lack of known negative edges in the GRN (positive unlabeled setting) and the ambiguity of possible causal explanations for the data. Available computational approaches often struggle with these issues, leading to inaccurate or overfitted models.
Research Gap
Current methods range from statistical correlations to advanced machine learning, each with limitations in terms of accuracy, data requirements, and interpretability. Multiple benchmarking studies exist, differing in the choices of evaluation, such as the way of negative sampling, metrics used and the choice of synthetic vs experimental data. What is missing is a more standardized way of benchmarking using biologically meaningful metrics.
Task description
The task focuses on the inference of GRNs from scRNA-Seq data. It is divided into two subtasks based on the availability of prior knowledge:
- GRN Inference without prior knowledge: Inferring GRN solely from scRNA-Seq data.
- GRN Inference with prior knowledge: Inferring GRN from scRNA-Seq data using an additional prior knowledge graph (a subset of edges from the ground truth GRN).
Input Data
- For Subtask 1: Normalized and preprocessed scRNA-Seq data
- For Subtask 2: In addition to the scRNA-Seq data, a subset of given edges of the GRN as prior knowledge
Expected Output
The output for both subtasks is a predicted GRN, represented as a graph where nodes are genes and edges indicate regulatory interactions. The quality of the predicted networks can be evaluated in two main ways:
- Binary Classification: Each potential interaction (edge) is classified as either present or absent (like this)
- Topological Evaluation: The overall structure and properties of the predicted network are assessed (like this)
Proposed ground-truth in datasets
- Synthetic, Curated and Experimental datasets from (BEELINE)
- Experimental datasets from (this paper)
Initial set of methods to implement
- MLPs
- Graph Neural Network based diffusion models (GCN / GAT)
Proposed control methods
- Pearson / Spearman correlation
- Random predictor
Proposed Metrics
Binary classification:
- Link-equality metrics (AUROC / AUPRC)
- Node-equality metrics (Mean Average Precision)
- Precision@Top k
Topological evaluation:
- Information Exchange (Average Shortest Path Length, Global and Local Efficiency)
- Hub Topology (Assortativity, Clustering Coefficient, Centralization)
- Hub Identification (PageRank, Betweenness, Radiality, Centrality)