Skip to content

zzumri/AI702_Project

Repository files navigation

Counting and Pointing Evaluation

This repository contains scripts to evaluate Vision-Language Models (VLMs) on:

  • Counting: Predict how many objects are in an image.
  • Pointing: Locate objects using referring expressions.

📦 Scripts

Counting

  • gemma3_1b_it_count.py
  • gemma3_4b_it_count.py
  • granite_vision_count.py
  • janus_count.py (select model inside script)
  • moondream_count.py
  • moondream_halfB_count.py
  • smolvlm_count.py
  • evaluate_count.py → Evaluate counting predictions

Pointing

  • molmo_point.py
  • moon_dream_point.py
  • pointing_score.py → Evaluate pointing predictions

📥 Dataset Preparation

Datasets are automatically downloaded and cached on first run.

Pixmo-Count (Counting)

from datasets import load_dataset

dataset_path = "/your/path/to/cache/dataset"  # Path to store downloaded dataset
data = load_dataset("allenai/pixmo-count", split="test", cache_dir=dataset_path)

RefCOCO (Pointing)

from datasets import load_dataset

dataset_path = "/your/path/to/cache/dataset"  # Path to store downloaded dataset
ds = load_dataset("lmms-lab/RefCOCO", split="testA", cache_dir=dataset_path)

📦 Installation

pip install -r requirements.txt

🚀 Usage

Counting

Run the script for the desired model:

python gemma3_1b_it_count.py
python gemma3_4b_it_count.py
python granite_vision_count.py
python moondream_count.py
python moondream_halfB_count.py
python smolvlm_count.py

For Janus:

Select model inside janus_count.py:

model_path = "deepseek-ai/Janus-Pro-1B"  # Change as needed
python janus_count.py

Evaluate counting results:

python evaluate_count.py

Pointing

Run pointing scripts:

python molmo_point.py
python moon_dream_point.py

Evaluate pointing results:

python pointing_score.py

Modify these in pointing_score.py as needed:

sigma = 0.8  # Lower for stricter scoring
file = 'moondream2_points_testA.json'  # Change result file

📌 Notes

  • Model weights should be downloaded beforehand if needed.
  • Datasets will download and cache automatically on first run.

About

This is the repository for the AI702 course project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages