🚀 (AAAI 2026) INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

Zhiwei Chen¹, Yupeng Hu^1✉, Zhiheng Fu¹, Zixu Li¹, Jiale Huang¹, Qinlei Huang¹, Yinwei Wei¹

¹School of Software, Shandong University
^✉Corresponding author

📌 Introduction

Welcome to the official repository for INTENT. This project provides the codebase of our paper, offering a novel approach to Composed Image Retrieval with Noisy Correspondence using BLIP-2 architectures.

Disclaimer: This codebase is intended for research purposes.

📰 News and Updates

[Mar 2026] 🚀 Official paper is released at AAAI 2026.
[Mar 2026] 🚀 We have officially released the training and testing code for INTENT!
[Nov 2025] ⏳ INTENT is accepted by AAAI 2026.

INTENT Pipeline (based on LAVIS)

🏃‍♂️ Experiment Results

CIR Task Performance

💡 Note for Fully-Supervised CIR Benchmarking: 🎯 The 0% noise setting in the table below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this 0% block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.

CIRR：

Table 1. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.

FIQ:

Table 2. Performance comparison on FashionIQ in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.

Image Intervention

📂 Project Structure

To help you navigate our codebase quickly, here is an overview of the main components:

├── lavis/                 # Core model directory (built upon LAVIS)
│   └── models/
│       └── blip2_models/
│           └── blip2_cir.py   # 🧠 The core INTENT model implementation.
├── train_INTENT.py        # 🚂 Main training script
├── test.py                # 🧪 General evaluation script
├── cirr_sub_BLIP2.py      # 📤 Script to generate submission files for the CIRR dataset
├── datasets.py            # 📊 Data loading and processing utilities
└── utils.py               # 🛠️ Helper functions (logging, metrics, etc.)

🛠️ Setup

We recommend running this code on a Linux system with an NVIDIA GPU.

1. Clone the repository

git clone https://github.com/ZivChen-Ty/INTENT.git
cd INTENT

2. Create a virtual environment

conda create -n intent_env python=3.9
conda activate intent_env

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

3. Install dependencies

pip install -r requirements.txt

💾 Data Preparation

Before training or testing, you need to download and structure the datasets.

Download the CIRR / FashionIQ dataset from CIRR official repo and FashionIQ official repo.

Organize the data as follows:

1) FashionIQ:

├── FashionIQ
│   ├── captions
|   |   ├── cap.dress.[train | val].json
|   |   ├── cap.toptee.[train | val].json
|   |   ├── cap.shirt.[train | val].json

│   ├── image_splits
|   |   ├── split.dress.[train | val | test].json
|   |   ├── split.toptee.[train | val | test].json
|   |   ├── split.shirt.[train | val | test].json

│   ├── dress
|   |   ├── [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]

│   ├── shirt
|   |   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]

│   ├── toptee
|   |   ├── [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

2) CIRR:

├── CIRR
│   ├── train
|   |   ├── [0 | 1 | 2 | ...]
|   |   |   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]

│   ├── dev
|   |   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]

│   ├── test1
|   |   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]

│   ├── cirr
|   |   ├── captions
|   |   |   ├── cap.rc2.[train | val | test1].json
|   |   ├── image_splits
|   |   |   ├── split.rc2.[train | val | test1].json

(Note: Please modify datasets.py if your local data paths differ from the default setup.)

🚀 Quick Start

1. Training & Evaluating the Model

To train the INTENT model from scratch, use the train_INTENT.py script. You can specify hyperparameters via command line arguments or a config file.

python train_INTENT.py

And the evaluation process is included. (Tip: Check out utils.py for logging details during training. Checkpoints will be automatically saved.)

2. Generating Submissions (CIRR Dataset)

If you are evaluating on the CIRR test server, we provide a dedicated script to generate the required JSON submission files.

python cirr_sub_BLIP2.py \
  --checkpoint_path ./checkpoints/intent_run/best_model.pth \
  --output_file ./submission.json

🤔 Some More Discussion

While developing the first module of INTENT, we experimented with a straightforward causal mechanism: explicitly aligning the intervened image with the original one. Conceptually, this operation seems to help mitigate spurious correlations by blocking potential backdoor paths in our specific setting. We wonder if this causal perspective could also provide some inspiration for the Zero-Shot Composed Image Retrieval (ZS-CIR) community. Given that ZS-CIR models heavily rely on large-scale pre-training, they might occasionally be influenced by dataset biases and high-frequency co-occurrences. Introducing a similar causal alignment mechanism could potentially be an interesting direction to explore for decoupling the true modification intent from inherent background noise. Although this is just a preliminary thought rather than a definitive conclusion, we hope it might spark some fresh ideas and discussions for future research towards more robust ZS-CIR models.

📝 Citation

If you find our work or this code useful in your research, please consider leaving a star or citing our paper 🥰:

@inproceedings{INTENT,
  title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
  author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

🙏 Acknowledgements

This codebase is heavily inspired by and built upon the excellent Salesforce LAVIS, SPRC and TME library. We thank the authors for their open-source contributions.

✉️ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at zivczw@gmail.com.

🔗 Related Projects

Ecosystem & Other Works from our Team

TEMA (ACL'26) Web \| Code \|	ConeSep (CVPR'26) Web \| Code \|	Air-Know (CVPR'26) Web \| Code \|
HABIT (AAAI'26) Web \| Code \| Paper	ReTrack (AAAI'26) Web \| Code \| Paper	HUD (ACM MM'25) Web \| Code \| Paper
OFFSET (ACM MM'25) Web \| Code \| Paper	ENCODER (AAAI'25) Web \| Code \| Paper

📄 License

This project is released under the terms of the LICENSE file included in this repository.

If this project helps you, please leave a Star!

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
figs		figs
lavis		lavis
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
cirr_sub_BLIP2.py		cirr_sub_BLIP2.py
data_utils.py		data_utils.py
datasets.py		datasets.py
googleef10cadbf6a593ec.html		googleef10cadbf6a593ec.html
test.py		test.py
train_INTENT.py		train_INTENT.py
utils.py		utils.py

TEMA (ACL'26) Web \| Code \|	ConeSep (CVPR'26) Web \| Code \|	Air-Know (CVPR'26) Web \| Code \|
HABIT (AAAI'26) Web \| Code \| Paper	ReTrack (AAAI'26) Web \| Code \| Paper	HUD (ACM MM'25) Web \| Code \| Paper
OFFSET (ACM MM'25) Web \| Code \| Paper	ENCODER (AAAI'25) Web \| Code \| Paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 (AAAI 2026) INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

📌 Introduction

📰 News and Updates

INTENT Pipeline (based on LAVIS)

Table of Contents

🏃‍♂️ Experiment Results

CIR Task Performance

CIRR：

FIQ:

Image Intervention

📂 Project Structure

🛠️ Setup

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

💾 Data Preparation

1) FashionIQ:

2) CIRR:

🚀 Quick Start

1. Training & Evaluating the Model

2. Generating Submissions (CIRR Dataset)

🤔 Some More Discussion

📝 Citation

🙏 Acknowledgements

✉️ Contact

🔗 Related Projects

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 (AAAI 2026) INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

📌 Introduction

📰 News and Updates

INTENT Pipeline (based on LAVIS)

Table of Contents

🏃‍♂️ Experiment Results

CIR Task Performance

CIRR：

FIQ:

Image Intervention

📂 Project Structure

🛠️ Setup

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

💾 Data Preparation

1) FashionIQ:

2) CIRR:

🚀 Quick Start

1. Training & Evaluating the Model

2. Generating Submissions (CIRR Dataset)

🤔 Some More Discussion

📝 Citation

🙏 Acknowledgements

✉️ Contact

🔗 Related Projects

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages