Skip to content

feat: add MathVista benchmark#1081

Closed
omkar-334 wants to merge 2 commits intohuggingface:mainfrom
omkar-334:vista
Closed

feat: add MathVista benchmark#1081
omkar-334 wants to merge 2 commits intohuggingface:mainfrom
omkar-334:vista

Conversation

@omkar-334
Copy link
Copy Markdown
Contributor

Multimodal math benchmark, consists of 2 types of questions - free-form and MCQ.
I've separated each type into a different subset.

The benchmark can be evaluated in 2 ways - either provide the problem solution or the problem code.
For now I'm implementing the solution method.

I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.

@NathanHB
Copy link
Copy Markdown
Member

NathanHB commented Nov 24, 2025

hey @omkar-334 !

I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.

Don't worry about this, what's important for new evals like this is the inspect-ai implementation :)

There are examples here and documentation on how to use here, the inspect-ai documentation is here

@NathanHB
Copy link
Copy Markdown
Member

hey @omkar-334 closing here as i branched from it to make it compatible with isnpect-ai.
I kept your contributions in the branch, thanks for the efforts on this !

@NathanHB NathanHB closed this Jan 12, 2026
@NathanHB NathanHB mentioned this pull request Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants