π¬ (AAAI 2026) ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval
2School of Computer Science and Technology, Shandong Jianzhu UniversityΒ Β Β
βΒ Corresponding authorΒ Β
Accepted by AAAI 2026: An evidence-driven framework tackling both the π¬ Composed Video Retrieval (CVR) and π Composed Image Retrieval (CIR) tasks.
ReTrack is an advanced open-source PyTorch framework designed to improve multi-modal query understanding by calibrating directional bias in composed features. It achieves state-of-the-art (SOTA) performance across both Composed Video Retrieval (CVR) and Composed Image Retrieval (CIR) benchmarks.
- [2026-03-20] π Official paper is released at AAAI 2026.
- [2026-03-19] π Released all training and evaluation codes for ReTrack.
- [2025-11-08] π₯ Our paper "ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval" has been accepted by AAAI 2026!
- π― Dual-Stream Directional Anchor Calibration: Explicitly identifies and calibrates visual and textual semantic contributions to resolve directional bias in multi-modal composition.
- βοΈ Reliable Evidence-Driven Alignment: Leverages Dempster-Shafer Theory to evaluate similarity reliability, greatly reducing uncertainty caused by highly similar retrieval candidates.
- π§© Unified Framework: Built on top of BLIP-2 (via the Salesforce LAVIS library), seamlessly supporting both video (CVR) and image (CIR) retrieval tasks.
- βοΈ Modular & Scalable: Entirely managed by Hydra and Lightning Fabric for flexible configuration, easy hyperparameter overrides, and scalable multi-GPU training.
- Introduction
- News
- Key Features
- Architecture
- Experiment Results
- Quick Start & Installation
- Repository Structure
- Configuration Overview
- Data Preparation
- Training
- Evaluation/Testing
- Output & Checkpoints
- Acknowledgement
- Contact
- Citation
- Support & Contributing
We recommend using Anaconda to manage your environment following CoVR-Project. Note: This project was developed and tested with Python 3.8 and PyTorch 2.1.0.
# 1. Clone the repository
git clone https://github.com/Lee-zixu/ReTrack.git
cd ReTrack
# 2. Create a virtual environment
conda create -n retrack python=3.8 -y
conda activate retrack
# 3. Install PyTorch (Adjust CUDA version based on your hardware)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# 4. Install other dependencies
pip install -r requirements.txtOur codebase is highly modular. Here is a brief overview of the core files and directories:
ReTrack/
βββ configs/ # βοΈ Hydra configuration files (data, model, trainer, etc.)
βββ src/ # π§ Source code (dataloaders, model implementations, testing)
βββ train_CVR.py # βΆοΈ Training entry point for WebVid-CoVR
βββ train_CIR.py # βΆοΈ Training entry point for FashionIQ & CIRR
βββ test.py # π§ͺ Evaluation entry point
βββ requirements.txt # π¦ Project dependencies
All hyperparameters and paths are managed by Hydra under the configs/ directory. The key configuration groups are:
configs/data/β Dataset loaders and dataset-specific path definitions.configs/model/β Model architecture, checkpoints, optimizers, schedulers, and loss functions.configs/trainer/β Lightning Fabric training settings (devices, precision, checkpointing).configs/machine/β Hardware/Machine settings (batch size, num workers, default root paths).configs/test/β Evaluation presets across different test splits.
By default, the datasets are expected to be placed under a common root directory (e.g., /root/autodl-tmp/data/).
π‘ Path Configuration: You must adjust these paths for your local setup. There are two recommended ways to do this:
- Edit YAML directly (Preferred): Modify
configs/machine/default.yamlor the specific files inconfigs/data/*.yaml.- Override via CLI: Append
machine.default.datasets_dir=/path/to/datato your run commands.
Dataset: WebVid-CoVR
Expected directory structure (configs/data/webvid-covr.yaml):
datasets_dir/
βββ WebVid-CoVR/
βββ videos/
β βββ 2M/
β βββ 8M/
βββ annotation/
βββ webvid2m-covr_train.csv
βββ webvid8m-covr_val.csv
βββ webvid8m-covr_test.csv
Expected directory structure:
datasets_dir/
βββ FashionIQ/
β βββ captions/
β β βββ cap.dress.[train|val|test].json
β β βββ ...
β βββ image_splits/
β β βββ split.dress.[train|val|test].json
β β βββ ...
β βββ dress/
β βββ shirt/
β βββ toptee/
βββ CIRR/
βββ train/
βββ dev/
βββ test1/
βββ cirr/
βββ captions/
β βββ cap.rc2.[train|val|test1].json
βββ image_splits/
βββ split.rc2.[train|val|test1].json
You can easily override hyperparameters, datasets, and paths directly from the command line using Hydra syntax.
python train_CVR.pypython train_CIR.py
β οΈ Before running CIR training, make sure to update the dataset selection inconfigs/train_CIR.yaml(dataandtestindefaults) to your target dataset (e.g.fashioniqorcirr).For example:
defaults: - data: fashioniq - test: fashioniqor:
defaults: - data: cirr - test: cirr-all
To evaluate a trained model, use test.py and specify the target benchmark.
python test.py(Make sure to specify the dataset and path to your trained checkpoint via the config overrides or by updating the relevant configs/test/*.yaml file).
Hydra automatically manages your experiment logs and weights.
- Outputs are systematically written to:
outputs/<dataset>/<model>/<ckpt>/<experiment>/<run_name>/. - Checkpoints are saved inside the run directory as
ckpt_last.ckpt(orckpt_<epoch>.ckptif configured viasave_ckpt=all).
This codebase is built upon several great open-source projects. We thank the authors of:
- CoVR and CoVR-2 for the foundational Composed Video Retrieval baselines and datasets.
- LAVIS for providing robust Vision-Language models like BLIP-2.
For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at lizixu.cs@gmail.com.
Ecosystem & Other Works from our Team
![]() TEMA (ACL'26) Web | Code | |
![]() ConeSep (CVPR'26) Web | Code | |
![]() Air-Know (CVPR'26) Web | Code | |
![]() HABIT (AAAI'26) Web | Code | Paper |
![]() INTENT (AAAI'26) Web | Code | Paper |
![]() HUD (ACM MM'25) Web | Code | Paper |
![]() OFFSET (ACM MM'25) Web | Code | Paper |
![]() ENCODER (AAAI'25) Web | Code | Paper |
If you find our work or this code useful in your research, please consider leaving a StarβοΈ or Citingπ our paper π₯°. Your support is our greatest motivation!
@inproceedings{ReTrack,
title={ReTrack: Evidence Driven Dual Stream Directional Anchor Calibration Network for Composed Video Retrieval},
author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Huang, Qinlei and Qiu, Guozhi and Fu, Zhiheng and Liu, Meng},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:
- Open an Issue for discussions or bug reports.
- Submit a Pull Request to improve the codebase.
This project is released under the terms of the LICENSE file included in this repository.











