Skip to content

iLearn-Lab/AAAI26-INTENT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ (AAAI 2026) INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

1School of Software, Shandong University Β Β Β 
βœ‰Β Corresponding authorΒ Β 

INTENT Teaser

AAAI 2026 arXiv Paper page Author Page PyTorch Python stars

πŸ“Œ Introduction

Welcome to the official repository for INTENT. This project provides the codebase of our paper, offering a novel approach to Composed Image Retrieval with Noisy Correspondence using BLIP-2 architectures.

Disclaimer: This codebase is intended for research purposes.

πŸ“° News and Updates

  • [Mar 2026] πŸš€ Official paper is released at AAAI 2026.
  • [Mar 2026] πŸš€ We have officially released the training and testing code for INTENT!
  • [Nov 2025] ⏳ INTENT is accepted by AAAI 2026.

INTENT Pipeline (based on LAVIS)

Table of Contents


πŸƒβ€β™‚οΈ Experiment Results

CIR Task Performance

πŸ’‘ Note for Fully-Supervised CIR Benchmarking: 🎯 The 0% noise setting in the table below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this 0% block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.

CIRR:

Table 1. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.

FIQ:

Table 2. Performance comparison on FashionIQ in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.

Image Intervention

πŸ“‚ Project Structure

To help you navigate our codebase quickly, here is an overview of the main components:

β”œβ”€β”€ lavis/                 # Core model directory (built upon LAVIS)
β”‚   └── models/
β”‚       └── blip2_models/
β”‚           └── blip2_cir.py   # 🧠 The core INTENT model implementation.
β”œβ”€β”€ train_INTENT.py        # πŸš‚ Main training script
β”œβ”€β”€ test.py                # πŸ§ͺ General evaluation script
β”œβ”€β”€ cirr_sub_BLIP2.py      # πŸ“€ Script to generate submission files for the CIRR dataset
β”œβ”€β”€ datasets.py            # πŸ“Š Data loading and processing utilities
└── utils.py               # πŸ› οΈ Helper functions (logging, metrics, etc.)

πŸ› οΈ Setup

We recommend running this code on a Linux system with an NVIDIA GPU.

1. Clone the repository

git clone https://github.com/ZivChen-Ty/INTENT.git
cd INTENT

2. Create a virtual environment

conda create -n intent_env python=3.9
conda activate intent_env

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

3. Install dependencies

pip install -r requirements.txt

πŸ’Ύ Data Preparation

Before training or testing, you need to download and structure the datasets.

Download the CIRR / FashionIQ dataset from CIRR official repo and FashionIQ official repo.

Organize the data as follows:

1) FashionIQ:

β”œβ”€β”€ FashionIQ
β”‚   β”œβ”€β”€ captions
|   |   β”œβ”€β”€ cap.dress.[train | val].json
|   |   β”œβ”€β”€ cap.toptee.[train | val].json
|   |   β”œβ”€β”€ cap.shirt.[train | val].json

β”‚   β”œβ”€β”€ image_splits
|   |   β”œβ”€β”€ split.dress.[train | val | test].json
|   |   β”œβ”€β”€ split.toptee.[train | val | test].json
|   |   β”œβ”€β”€ split.shirt.[train | val | test].json

β”‚   β”œβ”€β”€ dress
|   |   β”œβ”€β”€ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]

β”‚   β”œβ”€β”€ shirt
|   |   β”œβ”€β”€ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]

β”‚   β”œβ”€β”€ toptee
|   |   β”œβ”€β”€ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

2) CIRR:

β”œβ”€β”€ CIRR
β”‚   β”œβ”€β”€ train
|   |   β”œβ”€β”€ [0 | 1 | 2 | ...]
|   |   |   β”œβ”€β”€ [train-10108-0-img0.png | train-10108-0-img1.png | ...]

β”‚   β”œβ”€β”€ dev
|   |   β”œβ”€β”€ [dev-0-0-img0.png | dev-0-0-img1.png | ...]

β”‚   β”œβ”€β”€ test1
|   |   β”œβ”€β”€ [test1-0-0-img0.png | test1-0-0-img1.png | ...]

β”‚   β”œβ”€β”€ cirr
|   |   β”œβ”€β”€ captions
|   |   |   β”œβ”€β”€ cap.rc2.[train | val | test1].json
|   |   β”œβ”€β”€ image_splits
|   |   |   β”œβ”€β”€ split.rc2.[train | val | test1].json

(Note: Please modify datasets.py if your local data paths differ from the default setup.)

πŸš€ Quick Start

1. Training & Evaluating the Model

To train the INTENT model from scratch, use the train_INTENT.py script. You can specify hyperparameters via command line arguments or a config file.

python train_INTENT.py

And the evaluation process is included. (Tip: Check out utils.py for logging details during training. Checkpoints will be automatically saved.)

2. Generating Submissions (CIRR Dataset)

If you are evaluating on the CIRR test server, we provide a dedicated script to generate the required JSON submission files.

python cirr_sub_BLIP2.py \
  --checkpoint_path ./checkpoints/intent_run/best_model.pth \
  --output_file ./submission.json

πŸ€” Some More Discussion

While developing the first module of INTENT, we experimented with a straightforward causal mechanism: explicitly aligning the intervened image with the original one. Conceptually, this operation seems to help mitigate spurious correlations by blocking potential backdoor paths in our specific setting. We wonder if this causal perspective could also provide some inspiration for the Zero-Shot Composed Image Retrieval (ZS-CIR) community. Given that ZS-CIR models heavily rely on large-scale pre-training, they might occasionally be influenced by dataset biases and high-frequency co-occurrences. Introducing a similar causal alignment mechanism could potentially be an interesting direction to explore for decoupling the true modification intent from inherent background noise. Although this is just a preliminary thought rather than a definitive conclusion, we hope it might spark some fresh ideas and discussions for future research towards more robust ZS-CIR models.

πŸ“ Citation

If you find our work or this code useful in your research, please consider leaving a star or citing our paper πŸ₯°:

@inproceedings{INTENT,
  title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
  author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

πŸ™ Acknowledgements

This codebase is heavily inspired by and built upon the excellent Salesforce LAVIS, SPRC and TME library. We thank the authors for their open-source contributions.

βœ‰οΈ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at zivczw@gmail.com.

πŸ”— Related Projects

Ecosystem & Other Works from our Team

TEMA
TEMA (ACL'26)
Web | Code |
ConeSep
ConeSep (CVPR'26)
Web | Code |
HABIT
Air-Know (CVPR'26)
Web | Code |
HABIT
HABIT (AAAI'26)
Web | Code | Paper
ReTrack
ReTrack (AAAI'26)
Web | Code | Paper
HUD
HUD (ACM MM'25)
Web | Code | Paper
OFFSET
OFFSET (ACM MM'25)
Web | Code | Paper
ENCODER
ENCODER (AAAI'25)
Web | Code | Paper

πŸ“„ License

This project is released under the terms of the LICENSE file included in this repository.


If this project helps you, please leave a Star!

GitHub stars

About

[AAAI 2026] Official repository of AAAI 2026 - INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages