AvgMinMax median approximation is inconsistent

**Describe the bug**

The median value in dataset metrics (train_data_utils.py) produces different results on each run, even with identical input data. This causes validation failures when comparing metrics files. The `_validate_aggregate_metrics` function detects differences in the median field and raises a ValueError about conflicting aggregate metrics.

**Steps/Code to reproduce bug**

Run data preparation on any dataset, e.g.:
```bash
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/library_judge_math/configs/bytedtsinghua_dapo17k.yaml"
ng_prepare_data "+config_paths=[${config_paths}]" \
    +output_dirpath=data/bytedtsinghua_dapo17k \
    +mode=train_preparation +should_download=true
```
This may or may not produce a ValueError about conflicting aggregate metrics:
```bash
Differences found in aggregate metrics:
[
    'Numeric mismatch at {field_name}.Median: 80.33 != 80.44'
]

...

Found conflicting aggregate metrics that need to be corrected:
- resources_servers/math_with_judge/data/dapo17k_train_metrics_conflict.json
- resources_servers/math_with_judge/data/dapo17k_validation_metrics_conflict.json

This could be due to a change in how metrics are calculated, leading to outdated metrics. Try deleting the below file(s) and rerunning data preparation:
- resources_servers/math_with_judge/data/dapo17k_train_metrics.json
- resources_servers/math_with_judge/data/dapo17k_validation_metrics.json
```

**Expected behavior**

Metrics should be deterministic. Running data preparation multiple times on the same dataset should produce identical metrics, including the median. The validation check should pass when re-running with unchanged data.

**Configs**
Any dataset configuration.

**Environment details**

Otherwise, please provide:
N/A

**Additional context**

The `AvgMinMax` class uses TDigest for median estimation. This is an approximation of the median, and is not guaranteed to be exactly the same on each run.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AvgMinMax median approximation is inconsistent #360

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AvgMinMax median approximation is inconsistent #360

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions