Define ontology

> something like this? 
@slobentanzer @scottgigante-immunai Regardless of how we decide to resolve this issue, I'm sure we can already many items we _can_ define.

_Originally posted by @rcannood in https://github.com/openproblems-bio/website/issues/247#issuecomment-1538772548_

For instance:


## Common dataset workflow

```mermaid
graph LR
 classDef component fill:#decbe4,stroke:#333,color:#000
 classDef anndata fill:#d9d9d9,stroke:#333,color:#000
 normalization:::group
 dataset_processors:::group
 raw_dataset["Raw dataset"]:::anndata
 common_dataset[Common dataset]:::anndata
 dataset_loader[/Dataset loader/]:::component
 subgraph normalization [Normalization methods]
 log_cpm[/"Log CPM"/]:::component
 l1_sqrt[/"L1 sqrt"/]:::component
 log_scran_pooling[/"Log scran pooling"/]:::component
 sqrt_cpm[/Sqrt CPM/]:::component
 end
 subgraph dataset_processors[Dataset processors]
 pca[/PCA/]:::component
 hvg[/HVG/]:::component
 knn[/KNN/]:::component
 end
 dataset_loader --> raw_dataset --> log_cpm & l1_sqrt & log_scran_pooling & sqrt_cpm --> pca --> hvg --> knn --> common_dataset
```

## Task-specific benchmarking workflow

```mermaid
graph LR
 classDef component fill:#decbe4,stroke:#333,color:#000
 classDef anndata fill:#d9d9d9,stroke:#333,color:#000
 common_dataset[Common dataset]:::anndata
 dataset_processor[/Dataset processor/]:::component
 solution[Ground-truth]:::anndata
 masked_data[Input data]:::anndata
 method[/Method/]:::component
 control_method[/Control method/]:::component
 output[Prediction]:::anndata
 metric[/Metric/]:::component
 score[Score]:::anndata
 common_dataset --> dataset_processor --> masked_data
 dataset_processor --> solution
 masked_data --> method --> output
 masked_data & solution --> control_method --> output
 solution & output --> metric --> score
```

## Discussion

However, this workflow might not be applicable for all tasks. 

* Multimodal datasets will have to be processed differently to regular unimodal datasets
* Some tasks don't really have a ground-truth and instead rely on internal scores. IMO these "benchmarks" should not be a part of OpenProblems, since it doesn't really count as a benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define ontology #249

Common dataset workflow

Task-specific benchmarking workflow

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Define ontology #249

Description

Common dataset workflow

Task-specific benchmarking workflow

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions