Skip to content

aashish2000/RONA

Repository files navigation

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee

Accepted in the 4th In2Writing Workshop, co-located with NAACL 2025.

TL;DR

We propose RONA, a Coherence Relation-based pragmatic prompting strategy for MLLMs. Our approach generates pragmatically diverse captions, improving over existing baselines that only focus on syntax and semantic variations.

Arch Diagram

Setup Instructions

Installing Packages

We recommend using Conda, to create a virtual environment for evaluation. Clone this repository, and use the following commands:

conda create -n <your-env> python=3.10 -y
conda activate <your-env>
pip install -r requirements.txt

Setting up MLLMs

We have used the Vertex AI API to access Claude 3.5 Sonnet v2. If you want to setup Google Cloud Access for Claude, please install the gcloud CLI and follow this guide.

In the case of GPT4o, we used the Azure OpenAI service to create a custom deployment. To create this, please check this guide.

Datasets Used

We used the Tweet Subtitles and ANNA datasets for our experiments. The test set metadata used for our evaluation is present here.

Please follow the below directory structure for all datasets. Use the same names for the image folder as mentioned below, as the metadata filenames depend on it.

├── tweet_subtitles
|   ├── tweet_subtitles_test_metadata.json
|   └── multimodal_discourse_dataset/
└── anna
    ├── anna_test_metadata.json
    └── test/

Environment Constants

Rename the file environment.py.example to environment.py and fill all the constants.

  • DATASET_ROOT_DIRS: This is the location of the test set images and metadata for each dataset.
  • LOCATION: Location of the Vertex AI API for Claude
  • AZURE_OPENAI_ENDPOINT: Endpoint URL of the Azure GPT4o Deployment
  • AZURE_OPENAI_API_KEY: API Key for the Azure GPT4o Deployment
  • AZURE_API_VERSION: Version used for the Azure OpenAI API

Generating RONA Captions

python eval_mllm.py --model <model> --dataset <dataset> --seed <seed> --with_pair --with_dc

This command creates captions using the RONA Prompting Strategy (image-caption pair + coherence relation). If you want to just use images without the captions, omit the with_pair flag. If you do not want to use coherence relations, omit the with_dc flag. The results of the evaluation will be saved in the llm_outputs/ folder.

Calculating Metrics

# To calculate metrics for captions generated in all datasets
python results_bench.py --model <model> --all_datasets

# To calculate metrics for captions generated in a particular dataset
python results_bench.py --model <model> --dataset <dataset>

# To calculate metrics for a particular results file
# (add --dc if it was generated using coherence relations)
python results_bench.py --model <model> --dataset <dataset> --results_file <path-to-file>

# Collecting average scores for each dataset
python results_bench.py --model <model> --dataset <dataset> --get_dataset_avg_scores

All scores will be saved in the same llm_outputs/ folder. Collected average scores will be present in the all_average_scores/ folder.

Citing

If you find our work useful, please consider citing:

@inproceedings{anantha-ramakrishnan-etal-2025-rona,
    title = "{RONA}: Pragmatically Diverse Image Captioning with Coherence Relations",
    author = "Anantha Ramakrishnan, Aashish  and
      Ramakrishnan, Aadarsh Anantha  and
      Lee, Dongwon",
    editor = "Padmakumar, Vishakh  and
      Gero, Katy  and
      Wambsganss, Thiemo  and
      Sterman, Sarah  and
      Huang, Ting-Hao  and
      Zhou, David  and
      Chung, John",
    booktitle = "Proceedings of the Fourth Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2025)",
    month = may,
    year = "2025",
    address = "Albuquerque, New Mexico, US",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.in2writing-1.8/",
    pages = "74--86",
    ISBN = "979-8-89176-239-8"
}

About

[NAACL In2Writing 2025] This repo is the official implementation of "RONA: Pragmatically Diverse Image Captioning with Coherence Relations"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages