feat: add MathVista benchmark #1081

omkar-334 · 2025-11-22T09:57:48Z

Multimodal math benchmark, consists of 2 types of questions - free-form and MCQ.
I've separated each type into a different subset.

The benchmark can be evaluated in 2 ways - either provide the problem solution or the problem code.
For now I'm implementing the solution method.

I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.

NathanHB · 2025-11-24T10:56:51Z

hey @omkar-334 !

I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.

Don't worry about this, what's important for new evals like this is the inspect-ai implementation :)

There are examples here and documentation on how to use here, the inspect-ai documentation is here

initial commit

d80ec83

NathanHB added the new-task label Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add MathVista benchmark #1081

feat: add MathVista benchmark #1081

omkar-334 commented Nov 22, 2025

Uh oh!

NathanHB commented Nov 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add MathVista benchmark #1081

Are you sure you want to change the base?

feat: add MathVista benchmark #1081

Conversation

omkar-334 commented Nov 22, 2025

Uh oh!

NathanHB commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NathanHB commented Nov 24, 2025 •

edited

Loading