Reproducible LLM proof grading benchmark + API for Olympiad-style math.
python benchmarking fastapi math-education llm-evaluation llm-evals proof-grading rubric-aware olympiad-math math-ed
-
Updated
Apr 24, 2026 - Python