This repository contains scripts to evaluate Vision-Language Models (VLMs) on:
- Counting: Predict how many objects are in an image.
- Pointing: Locate objects using referring expressions.
gemma3_1b_it_count.pygemma3_4b_it_count.pygranite_vision_count.pyjanus_count.py(select model inside script)moondream_count.pymoondream_halfB_count.pysmolvlm_count.pyevaluate_count.py→ Evaluate counting predictions
molmo_point.pymoon_dream_point.pypointing_score.py→ Evaluate pointing predictions
Datasets are automatically downloaded and cached on first run.
Pixmo-Count (Counting)
from datasets import load_dataset
dataset_path = "/your/path/to/cache/dataset" # Path to store downloaded dataset
data = load_dataset("allenai/pixmo-count", split="test", cache_dir=dataset_path)RefCOCO (Pointing)
from datasets import load_dataset
dataset_path = "/your/path/to/cache/dataset" # Path to store downloaded dataset
ds = load_dataset("lmms-lab/RefCOCO", split="testA", cache_dir=dataset_path)pip install -r requirements.txtRun the script for the desired model:
python gemma3_1b_it_count.py
python gemma3_4b_it_count.py
python granite_vision_count.py
python moondream_count.py
python moondream_halfB_count.py
python smolvlm_count.pyFor Janus:
Select model inside janus_count.py:
model_path = "deepseek-ai/Janus-Pro-1B" # Change as needed
python janus_count.pyEvaluate counting results:
python evaluate_count.pyRun pointing scripts:
python molmo_point.py
python moon_dream_point.pyEvaluate pointing results:
python pointing_score.pyModify these in pointing_score.py as needed:
sigma = 0.8 # Lower for stricter scoring
file = 'moondream2_points_testA.json' # Change result file- Model weights should be downloaded beforehand if needed.
- Datasets will download and cache automatically on first run.