Skip to content

Gene regulatory network inference (with prior knowledge) #900

@stkmrc

Description

@stkmrc

Task motivation

Gene Regulatory Network (GRN) inference is pivotal in systems biology, offering profound insights into the complex mechanisms that govern gene expression and cellular behavior. These insights are crucial for advancing our understanding of biological processes and have significant implications in medical research, particularly in developing targeted therapies and understanding disease mechanisms.

Computational Challenges

Despite its importance, GRN inference from single-cell RNA-Seq data is challenged by the high dimensionality of the data, inherent data noise, sparsity of the data, sparsity of the networks to be inferred, the lack of known negative edges in the GRN (positive unlabeled setting) and the ambiguity of possible causal explanations for the data. Available computational approaches often struggle with these issues, leading to inaccurate or overfitted models.

Research Gap

Current methods range from statistical correlations to advanced machine learning, each with limitations in terms of accuracy, data requirements, and interpretability. Multiple benchmarking studies exist, differing in the choices of evaluation, such as the way of negative sampling, metrics used and the choice of synthetic vs experimental data. What is missing is a more standardized way of benchmarking using biologically meaningful metrics.

Task description

The task focuses on the inference of GRNs from scRNA-Seq data. It is divided into two subtasks based on the availability of prior knowledge:

  1. GRN Inference without prior knowledge: Inferring GRN solely from scRNA-Seq data.
  2. GRN Inference with prior knowledge: Inferring GRN from scRNA-Seq data using an additional prior knowledge graph (a subset of edges from the ground truth GRN).

Input Data

  • For Subtask 1: Normalized and preprocessed scRNA-Seq data
  • For Subtask 2: In addition to the scRNA-Seq data, a subset of given edges of the GRN as prior knowledge

Expected Output

The output for both subtasks is a predicted GRN, represented as a graph where nodes are genes and edges indicate regulatory interactions. The quality of the predicted networks can be evaluated in two main ways:

  1. Binary Classification: Each potential interaction (edge) is classified as either present or absent (like this)
  2. Topological Evaluation: The overall structure and properties of the predicted network are assessed (like this)

Proposed ground-truth in datasets

  1. Synthetic, Curated and Experimental datasets from (BEELINE)
  2. Experimental datasets from (this paper)

Initial set of methods to implement

  1. MLPs
  2. Graph Neural Network based diffusion models (GCN / GAT)

Proposed control methods

  1. Pearson / Spearman correlation
  2. Random predictor

Proposed Metrics

Binary classification:

  1. Link-equality metrics (AUROC / AUPRC)
  2. Node-equality metrics (Mean Average Precision)
  3. Precision@Top k

Topological evaluation:

  1. Information Exchange (Average Shortest Path Length, Global and Local Efficiency)
  2. Hub Topology (Assortativity, Clustering Coefficient, Centralization)
  3. Hub Identification (PageRank, Betweenness, Radiality, Centrality)

Metadata

Metadata

Assignees

No one assigned

    Labels

    taskAdd a new task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions