[WIP] Add IFEval #4

nikil-ravi · 2025-12-25T23:50:24Z

This PR adds IFEval. So far,I've tested baseline evaluation with Qwen 3 1.7B so far on an H100, and this runs in just under 15 minutes.

max-andr · 2025-12-26T11:28:05Z

thanks! we need to discuss internally whether we want to add IFEval to our "official" suite of benchmarks. but i personally like IFEval, since IFEval scores can be improved a lot by doing smarter post-training.

Add IFEval benchmark and update evaluation/caching scripts

6b72fc2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add IFEval #4

[WIP] Add IFEval #4

Uh oh!

nikil-ravi commented Dec 25, 2025

Uh oh!

max-andr commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Add IFEval #4

Are you sure you want to change the base?

[WIP] Add IFEval #4

Uh oh!

Conversation

nikil-ravi commented Dec 25, 2025

Uh oh!

max-andr commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants