π (AAAI 2026) INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval
βΒ Corresponding authorΒ Β
Welcome to the official repository for INTENT. This project provides the codebase of our paper, offering a novel approach to Composed Image Retrieval with Noisy Correspondence using BLIP-2 architectures.
Disclaimer: This codebase is intended for research purposes.
- [Mar 2026] π Official paper is released at AAAI 2026.
- [Mar 2026] π We have officially released the training and testing code for INTENT!
- [Nov 2025] β³ INTENT is accepted by AAAI 2026.
INTENT Pipeline (based on LAVIS)
- Experiment Results
- Project Structure
- Setup
- Data Preparation
- Quick Start
- Some More Discussion
- Citation
- Acknowledgement
Table 1. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively. Table 2. Performance comparison on FashionIQ in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.π‘ Note for Fully-Supervised CIR Benchmarking: π― The 0% noise setting in the table below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this
0%block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.
To help you navigate our codebase quickly, here is an overview of the main components:
βββ lavis/ # Core model directory (built upon LAVIS)
β βββ models/
β βββ blip2_models/
β βββ blip2_cir.py # π§ The core INTENT model implementation.
βββ train_INTENT.py # π Main training script
βββ test.py # π§ͺ General evaluation script
βββ cirr_sub_BLIP2.py # π€ Script to generate submission files for the CIRR dataset
βββ datasets.py # π Data loading and processing utilities
βββ utils.py # π οΈ Helper functions (logging, metrics, etc.)
We recommend running this code on a Linux system with an NVIDIA GPU.
git clone https://github.com/ZivChen-Ty/INTENT.git
cd INTENT
conda create -n intent_env python=3.9
conda activate intent_env
# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
pip install -r requirements.txt
Before training or testing, you need to download and structure the datasets.
Download the CIRR / FashionIQ dataset from CIRR official repo and FashionIQ official repo.
Organize the data as follows:
βββ FashionIQ
β βββ captions
| | βββ cap.dress.[train | val].json
| | βββ cap.toptee.[train | val].json
| | βββ cap.shirt.[train | val].json
β βββ image_splits
| | βββ split.dress.[train | val | test].json
| | βββ split.toptee.[train | val | test].json
| | βββ split.shirt.[train | val | test].json
β βββ dress
| | βββ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
β βββ shirt
| | βββ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
β βββ toptee
| | βββ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]
βββ CIRR
β βββ train
| | βββ [0 | 1 | 2 | ...]
| | | βββ [train-10108-0-img0.png | train-10108-0-img1.png | ...]
β βββ dev
| | βββ [dev-0-0-img0.png | dev-0-0-img1.png | ...]
β βββ test1
| | βββ [test1-0-0-img0.png | test1-0-0-img1.png | ...]
β βββ cirr
| | βββ captions
| | | βββ cap.rc2.[train | val | test1].json
| | βββ image_splits
| | | βββ split.rc2.[train | val | test1].json
(Note: Please modify datasets.py if your local data paths differ from the default setup.)
To train the INTENT model from scratch, use the train_INTENT.py script. You can specify hyperparameters via command line arguments or a config file.
python train_INTENT.py
And the evaluation process is included. (Tip: Check out utils.py for logging details during training. Checkpoints will be automatically saved.)
If you are evaluating on the CIRR test server, we provide a dedicated script to generate the required JSON submission files.
python cirr_sub_BLIP2.py \
--checkpoint_path ./checkpoints/intent_run/best_model.pth \
--output_file ./submission.json
While developing the first module of INTENT, we experimented with a straightforward causal mechanism: explicitly aligning the intervened image with the original one. Conceptually, this operation seems to help mitigate spurious correlations by blocking potential backdoor paths in our specific setting. We wonder if this causal perspective could also provide some inspiration for the Zero-Shot Composed Image Retrieval (ZS-CIR) community. Given that ZS-CIR models heavily rely on large-scale pre-training, they might occasionally be influenced by dataset biases and high-frequency co-occurrences. Introducing a similar causal alignment mechanism could potentially be an interesting direction to explore for decoupling the true modification intent from inherent background noise. Although this is just a preliminary thought rather than a definitive conclusion, we hope it might spark some fresh ideas and discussions for future research towards more robust ZS-CIR models.
If you find our work or this code useful in your research, please consider leaving a star or citing our paper π₯°:
@inproceedings{INTENT,
title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}
This codebase is heavily inspired by and built upon the excellent Salesforce LAVIS, SPRC and TME library. We thank the authors for their open-source contributions.
For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at zivczw@gmail.com.
Ecosystem & Other Works from our Team
![]() TEMA (ACL'26) Web | Code | |
![]() ConeSep (CVPR'26) Web | Code | |
![]() Air-Know (CVPR'26) Web | Code | |
![]() HABIT (AAAI'26) Web | Code | Paper |
![]() ReTrack (AAAI'26) Web | Code | Paper |
![]() HUD (ACM MM'25) Web | Code | Paper |
![]() OFFSET (ACM MM'25) Web | Code | Paper |
![]() ENCODER (AAAI'25) Web | Code | Paper |
This project is released under the terms of the LICENSE file included in this repository.













