[Arxiv'25] A biologically-inspired visual benchmarking approach for large models
benchmark evaluation arena-battle-game large-language-models biological-motion multimodal-large-language-models
-
Updated
Sep 10, 2025 - Python