Skip to content

Define ontology #249

@rcannood

Description

@rcannood

something like this?
@slobentanzer @scottgigante-immunai Regardless of how we decide to resolve this issue, I'm sure we can already many items we can define.

Originally posted by @rcannood in #247 (comment)

For instance:

Common dataset workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  normalization:::group
  dataset_processors:::group
  raw_dataset["Raw dataset"]:::anndata
  common_dataset[Common<br/>dataset]:::anndata
  dataset_loader[/Dataset<br/>loader/]:::component
  subgraph normalization [Normalization methods]
    log_cpm[/"Log CPM"/]:::component
    l1_sqrt[/"L1 sqrt"/]:::component
    log_scran_pooling[/"Log scran<br/>pooling"/]:::component
    sqrt_cpm[/Sqrt CPM/]:::component
  end
  subgraph dataset_processors[Dataset processors]
    pca[/PCA/]:::component
    hvg[/HVG/]:::component
    knn[/KNN/]:::component
  end
  dataset_loader --> raw_dataset --> log_cpm & l1_sqrt & log_scran_pooling & sqrt_cpm --> pca --> hvg --> knn --> common_dataset
Loading

Task-specific benchmarking workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  common_dataset[Common<br/>dataset]:::anndata
  dataset_processor[/Dataset<br/>processor/]:::component
  solution[Ground-truth]:::anndata
  masked_data[Input data]:::anndata
  method[/Method/]:::component
  control_method[/Control<br/>method/]:::component
  output[Prediction]:::anndata
  metric[/Metric/]:::component
  score[Score]:::anndata
  common_dataset --> dataset_processor --> masked_data
  dataset_processor --> solution
  masked_data --> method --> output
  masked_data & solution --> control_method --> output
  solution & output --> metric --> score
Loading

Discussion

However, this workflow might not be applicable for all tasks.

  • Multimodal datasets will have to be processed differently to regular unimodal datasets
  • Some tasks don't really have a ground-truth and instead rely on internal scores. IMO these "benchmarks" should not be a part of OpenProblems, since it doesn't really count as a benchmark.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions