Add VL-RewardBench by TobiasLee · Pull Request #703 · open-compass/VLMEvalKit

TobiasLee · 2024-12-30T12:33:37Z

Hi there,

Thanks for your awesome project, which helps a lot for LMM evaluation & development!

This PR incorporates our recently released VL-RewardBench.
Example script:

python run.py --data VL-RewardBench --model GPT4o

Saved results for GPT4O-MINI:

"hallucination","reasoning","general","Macro Accuracy","Overall Consistency"
"0.4552736982643525","0.6477987421383647","0.4371584699453552","0.5134103034493575","0.5016"

and GPT4O:

"hallucination","reasoning","general","Macro Accuracy","Overall Consistency"
"0.7076101468624834","0.6509433962264151","0.4918032786885246","0.616785607259141","0.6616"

The results are consistent with our reported with small variance.

…ass#648) * add molmo prompts * fix lint format

Co-authored-by: Yuan Ye <yuany2@chinatelecom.cn>

kennymckormick · 2025-01-01T15:23:12Z

Evaluation Results of GPT4o-20241120

hallucination 0.753004
reasoning 0.676101
general 0.535519
Macro Accuracy 0.654875
Overall Consistency 0.7016

* update vlrewardbench * pre-commit fix * formatter * [Improvement] Better `AUTO_SPLIT` and model split for InternVL2 * [Minor] Improve CC-OCR Import * [Model] Support QVQ * [Model] Update Molmo Eval to Match Official Implementation (open-compass#648) * add molmo prompts * fix lint format * [Fix] Refine Qwen-VL2 device assignment * [Fix] Fix RealWorldQA md5 * update MMMU_DEV_VAL tsv * [Fix] Fix confusing image width&height (open-compass#704) Co-authored-by: Yuan Ye <yuany2@chinatelecom.cn> * Update llama_vision.py (open-compass#705) * [Fix] Fix Lint * Fix Lint * Fix Lint --------- Co-authored-by: kennymckormick <dhd.efz@gmail.com> Co-authored-by: jamespark3922 <jspark96@cs.washington.edu> Co-authored-by: CMeteor <CMeteor@users.noreply.github.com> Co-authored-by: Yuan Ye <yuany2@chinatelecom.cn> Co-authored-by: Guowei Xu <113534787+XuGW-Kevin@users.noreply.github.com>

TobiasLee and others added 15 commits December 30, 2024 10:47

update vlrewardbench

80a329c

pre-commit fix

72262a8

formatter

86c2a57

[Improvement] Better AUTO_SPLIT and model split for InternVL2

ac535c5

[Minor] Improve CC-OCR Import

fe3b252

[Model] Support QVQ

c29628d

[Model] Update Molmo Eval to Match Official Implementation (open-comp…

40bbc75

…ass#648) * add molmo prompts * fix lint format

[Fix] Refine Qwen-VL2 device assignment

8c6ee87

[Fix] Fix RealWorldQA md5

624c127

update MMMU_DEV_VAL tsv

b66d47f

[Fix] Fix confusing image width&height (open-compass#704)

e540952

Co-authored-by: Yuan Ye <yuany2@chinatelecom.cn>

Update llama_vision.py (open-compass#705)

3691698

[Fix] Fix Lint

3b5d93f

Fix Lint

1bce5c7

Fix Lint

c222e2f

kennymckormick merged commit 276d90a into open-compass:main Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VL-RewardBench#703

Add VL-RewardBench#703
kennymckormick merged 15 commits intoopen-compass:mainfrom
TobiasLee:vlrewardbench

TobiasLee commented Dec 30, 2024

Uh oh!

kennymckormick commented Jan 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

TobiasLee commented Dec 30, 2024

Uh oh!

kennymckormick commented Jan 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants