This repo contains the package to compute the evaluation metrics for the TopBrain 2025 challenge on grand-challnge (GC).
At the root folder, there is a pyproject.toml config file that can set up the evaluation project folder
as a local pip module called topbrain25_eval for running the evaluations in your python project.
To setup and install topbrain25_eval package:
# from topbrain25_eval root
bash ./setup.sh
# activate the env with topbrain_eval installed
source env_py310/bin/activateFirst go to topbrain25_eval/configs.py and configure the track and expected_num_cases.
The expected_num_cases is required and must match the number of cases to evalute, i.e. the number of ground-truth cases etc.
See below.
When not in docker environment, the paths of pred, gt, roi etc
are set by default to be on the same level as the package dir topbrain25_eval:
# mkdir and put your gt, pred etc like this:
├── ground-truth
├── predictions
├── topbrain25_evalSimply put the files of ground-truth and predictions in the folders ground-truth/ and predictions/,
and run:
python3 topbrain25_eval/evaluation.pyNote: You can also specify your own custom paths for the ground-truth, predictions, output etc when you call the TopBrainEvaluation object in your own code:
# example from topbrain25_eval/test_evaluation_task_1_seg.py
from topbrain25_eval.evaluation import TopBrainEvaluation
evalRun = TopBrainEvaluation(
track,
expected_num_cases,
predictions_path=predictions_path,
ground_truth_path=ground_truth_path,
output_path=output_path,
)
evalRun.evaluate()The naming of gt and pred files can be arbitrary as long as their filelist dataframe .sort_values() are sorted in the same way!
The accepted file formats for ground-truth and predictions are:
- NIfTI (
.nii.gz,.nii) or SimpleITK compatible images.mhafor images and masks
In topbrain25_eval/metrics/, you will find our implementations for evaluating the submitted segmentation predictions.
Six evaluation metrics with equal weights for head-angio multiclass (TopBrain anatomical vessels) segmentation task:
- Class-average Dice similarity coefficient:
- Class-average centerline Dice (clDice):
- Class-average error on number of connected components (B0):
- Class-average Hausdorff Distance 95% Percentile (HD95):
- Class-average error on number of invalid neighbors:
- Average F1 score (harmonic mean of the precision and recall) for detection of the "side road" vessels:
The documentations for our code come in the form of unit tests. Please check our test cases to see the expected inputs and outputs, expected behaviors and calculations.
The files with names that follow the form test_*.py contain the test cases for the evaluation metrics.
- Dice:
- clDice:
- Connected component B0 number error:
- HD95:
- Invalid neighbor error:
- detections:
Test asset files used in the test cases are stored in the folder test_assets/.
Simply invoke the tests by pytest .:
# simply run pytest
$ pytest .
topbrain25_eval/aggregate/test_aggregate_all_detection_dicts.py .... [ 4%]
topbrain25_eval/metrics/test_cls_avg_b0.py ............... [ 19%]
topbrain25_eval/metrics/test_cls_avg_clDice.py ..... [ 24%]
topbrain25_eval/metrics/test_cls_avg_dice.py ............. [ 37%]
topbrain25_eval/metrics/test_cls_avg_hd95.py ............. [ 51%]
topbrain25_eval/metrics/test_cls_avg_invalid_neighbors.py ........ [ 59%]
topbrain25_eval/metrics/test_detection_sideroad_labels.py .......... [ 69%]
topbrain25_eval/metrics/test_generate_cls_avg_dict.py .......... [ 79%]
topbrain25_eval/metrics/test_valid_neighbors.py .. [ 81%]
topbrain25_eval/test_constants.py . [ 82%]
topbrain25_eval/test_evaluation_task_1_seg.py .. [ 84%]
topbrain25_eval/test_evaluation_task_1_seg_2.py .. [ 86%]
topbrain25_eval/test_score_case_task_1_seg.py .. [ 88%]
topbrain25_eval/utils/test_get_neighbor_per_mask.py ....... [ 95%]
topbrain25_eval/utils/test_utils_mask.py .... [100%]
==================================================================== 98 passed in 3.17s ====================================================================