Skip to content

iLearn-Lab/AAAI26-ReTrack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 (AAAI 2026) ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval

1School of Software, Shandong University Β Β Β 
2School of Computer Science and Technology, Shandong Jianzhu UniversityΒ Β Β 
βœ‰Β Corresponding authorΒ Β 

AAAI 2026 arXiv Paper page Author Page PyTorch Python stars

Accepted by AAAI 2026: An evidence-driven framework tackling both the 🎬 Composed Video Retrieval (CVR) and 🌁 Composed Image Retrieval (CIR) tasks.

πŸ“– Introduction

ReTrack is an advanced open-source PyTorch framework designed to improve multi-modal query understanding by calibrating directional bias in composed features. It achieves state-of-the-art (SOTA) performance across both Composed Video Retrieval (CVR) and Composed Image Retrieval (CIR) benchmarks.

⬆ Back to top

πŸ“’ News

  • [2026-03-20] πŸš€ Official paper is released at AAAI 2026.
  • [2026-03-19] πŸš€ Released all training and evaluation codes for ReTrack.
  • [2025-11-08] πŸ”₯ Our paper "ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval" has been accepted by AAAI 2026!

⬆ Back to top

✨ Key Features

  • 🎯 Dual-Stream Directional Anchor Calibration: Explicitly identifies and calibrates visual and textual semantic contributions to resolve directional bias in multi-modal composition.
  • βš–οΈ Reliable Evidence-Driven Alignment: Leverages Dempster-Shafer Theory to evaluate similarity reliability, greatly reducing uncertainty caused by highly similar retrieval candidates.
  • 🧩 Unified Framework: Built on top of BLIP-2 (via the Salesforce LAVIS library), seamlessly supporting both video (CVR) and image (CIR) retrieval tasks.
  • βš™οΈ Modular & Scalable: Entirely managed by Hydra and Lightning Fabric for flexible configuration, easy hyperparameter overrides, and scalable multi-GPU training.

⬆ Back to top

πŸ—οΈ Architecture

ReTrack architecture

Figure 1. The proposed ReTrack consists of three key modules: (a) Semantic Contribution Disentanglement, (b) Composition Geometry Calibration, and (c) Reliable Evidence-driven Alignment.

⬆ Back to top

πŸƒβ€β™‚οΈ Experiment-Results

CVR Task Performance

Table 1. Performance comparison on the test set of the CVR dataset, WebVid-CoVR, relative to R@k(%). The overall best results are in bold, while the best results over baselines are underlined.

CIR Task Performance

Table 2. Performance comparison on the CIR dataset, FashionIQ and CIRR, relative to R@k(%). The overall best results are in bold, while the best results over baselines are underlined.

⬆ Back to top

Table of Contents

πŸš€ Quick Start & Installation

We recommend using Anaconda to manage your environment following CoVR-Project. Note: This project was developed and tested with Python 3.8 and PyTorch 2.1.0.

# 1. Clone the repository
git clone https://github.com/Lee-zixu/ReTrack.git
cd ReTrack

# 2. Create a virtual environment
conda create -n retrack python=3.8 -y
conda activate retrack

# 3. Install PyTorch (Adjust CUDA version based on your hardware)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# 4. Install other dependencies
pip install -r requirements.txt

⬆ Back to top

πŸ“‚ Repository Structure

Our codebase is highly modular. Here is a brief overview of the core files and directories:

ReTrack/
β”œβ”€β”€ configs/               # βš™οΈ Hydra configuration files (data, model, trainer, etc.)
β”œβ”€β”€ src/                   # 🧠 Source code (dataloaders, model implementations, testing)
β”œβ”€β”€ train_CVR.py           # ▢️ Training entry point for WebVid-CoVR
β”œβ”€β”€ train_CIR.py           # ▢️ Training entry point for FashionIQ & CIRR
β”œβ”€β”€ test.py                # πŸ§ͺ Evaluation entry point
└── requirements.txt       # πŸ“¦ Project dependencies

⬆ Back to top

βš™οΈ Configuration Overview

All hyperparameters and paths are managed by Hydra under the configs/ directory. The key configuration groups are:

  • configs/data/ β€” Dataset loaders and dataset-specific path definitions.
  • configs/model/ β€” Model architecture, checkpoints, optimizers, schedulers, and loss functions.
  • configs/trainer/ β€” Lightning Fabric training settings (devices, precision, checkpointing).
  • configs/machine/ β€” Hardware/Machine settings (batch size, num workers, default root paths).
  • configs/test/ β€” Evaluation presets across different test splits.

⬆ Back to top

πŸ—ƒοΈ Data Preparation

By default, the datasets are expected to be placed under a common root directory (e.g., /root/autodl-tmp/data/).

πŸ’‘ Path Configuration: You must adjust these paths for your local setup. There are two recommended ways to do this:

  1. Edit YAML directly (Preferred): Modify configs/machine/default.yaml or the specific files in configs/data/*.yaml.
  2. Override via CLI: Append machine.default.datasets_dir=/path/to/data to your run commands.

1. Composed Video Retrieval (CVR)

Dataset: WebVid-CoVR

Expected directory structure (configs/data/webvid-covr.yaml):

datasets_dir/
└── WebVid-CoVR/
    β”œβ”€β”€ videos/
    β”‚   β”œβ”€β”€ 2M/
    β”‚   └── 8M/
    └── annotation/
        β”œβ”€β”€ webvid2m-covr_train.csv
        β”œβ”€β”€ webvid8m-covr_val.csv
        └── webvid8m-covr_test.csv

2. Composed Image Retrieval (CIR)

Datasets: FashionIQ and CIRR

Expected directory structure:

datasets_dir/
β”œβ”€β”€ FashionIQ/
β”‚   β”œβ”€β”€ captions/
β”‚   β”‚   β”œβ”€β”€ cap.dress.[train|val|test].json
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ image_splits/
β”‚   β”‚   β”œβ”€β”€ split.dress.[train|val|test].json
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ dress/
β”‚   β”œβ”€β”€ shirt/
β”‚   └── toptee/
└── CIRR/
    β”œβ”€β”€ train/
    β”œβ”€β”€ dev/
    β”œβ”€β”€ test1/
    └── cirr/
        β”œβ”€β”€ captions/
        β”‚   └── cap.rc2.[train|val|test1].json
        └── image_splits/
            └── split.rc2.[train|val|test1].json

⬆ Back to top

▢️ Training

You can easily override hyperparameters, datasets, and paths directly from the command line using Hydra syntax.

Train CVR Model (WebVid-CoVR)

python train_CVR.py

Train CIR Model (FashionIQ or CIRR)

python train_CIR.py

⚠️ Before running CIR training, make sure to update the dataset selection in configs/train_CIR.yaml (data and test in defaults) to your target dataset (e.g. fashioniq or cirr).

For example:

defaults:
  - data: fashioniq
  - test: fashioniq

or:

defaults:
  - data: cirr
  - test: cirr-all

⬆ Back to top

πŸ§ͺ Evaluation / Testing

To evaluate a trained model, use test.py and specify the target benchmark.

python test.py

(Make sure to specify the dataset and path to your trained checkpoint via the config overrides or by updating the relevant configs/test/*.yaml file).

⬆ Back to top

πŸ“Œ Output & Checkpoints

Hydra automatically manages your experiment logs and weights.

  • Outputs are systematically written to: outputs/<dataset>/<model>/<ckpt>/<experiment>/<run_name>/.
  • Checkpoints are saved inside the run directory as ckpt_last.ckpt (or ckpt_<epoch>.ckpt if configured via save_ckpt=all).

⬆ Back to top

🀝 Acknowledgements

This codebase is built upon several great open-source projects. We thank the authors of:

  • CoVR and CoVR-2 for the foundational Composed Video Retrieval baselines and datasets.
  • LAVIS for providing robust Vision-Language models like BLIP-2.

⬆ Back to top

βœ‰οΈ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at lizixu.cs@gmail.com.

⬆ Back to top

πŸ”— Related Projects

Ecosystem & Other Works from our Team

TEMA
TEMA (ACL'26)
Web | Code |
ConeSep
ConeSep (CVPR'26)
Web | Code |
Air-Know
Air-Know (CVPR'26)
Web | Code |
HABIT
HABIT (AAAI'26)
Web | Code | Paper
INTENT
INTENT (AAAI'26)
Web | Code | Paper
HUD
HUD (ACM MM'25)
Web | Code | Paper
OFFSET
OFFSET (ACM MM'25)
Web | Code | Paper
ENCODER
ENCODER (AAAI'25)
Web | Code | Paper

πŸ“β­οΈ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or CitingπŸ“ our paper πŸ₯°. Your support is our greatest motivation!

@inproceedings{ReTrack,
  title={ReTrack: Evidence Driven Dual Stream Directional Anchor Calibration Network for Composed Video Retrieval},
  author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Huang, Qinlei and Qiu, Guozhi and Fu, Zhiheng and Liu, Meng},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

⬆ Back to top

🫑 Support & Contributing

We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:

  • Open an Issue for discussions or bug reports.
  • Submit a Pull Request to improve the codebase.

⬆ Back to top

πŸ“„ License

This project is released under the terms of the LICENSE file included in this repository.

About

[AAAI 2026] Official repository of AAAI 2026 - ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages