ColTrack not only outperforms state-of-the-art methods on large-scale datasets under high frame rates but also achieves higher and more stable performance under low frame rates. This allows it to obtain a higher equivalent FPS by reducing the frame rate requirement.
ColTrack: Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking
Yiheng Liu, Junta Wu, Yi Fu
- (2023.07) Our paper is accepted by ICCV 2023!
| Dataset | HOTA | MOTA | IDF1 |
|---|---|---|---|
| MOT17 | 61.0 | 78.8 | 73.9 |
| Dancetrack | 72.6 | 92.1 | 74.0 |
| Dancetrack(+val) | 75.3 | 92.2 | 77.3 |
The codebase is built on top of DINO and MOTR.
-
Install pytorch using conda (optional)
conda create -n coltrack python=3.9 conda activate coltrack conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
-
Other requirements
pip install Cython pip install -r requirements.txt
-
Compiling CUDA operators
cd models/dino/ops python setup.py build install # unit test (should see all checking is True) python test.py cd ../../..
Download MOT17, CrowdHuman, DanceTrack and unzip them under Coltrack_HOME as follows:
mot_files
|——————dataset
| └——————dancetrack
| └—————test
| └—————train
| └—————val
| └——————MOT17
| └—————images
| └—————train
| └—————test
└——————models
| └——————coltrack
| └——————dancetrack_val.pth
| └——————dino
| └——————dino_e2e
-
Standard models
Put these models in the directory
mot_files/models/coltrack.Model HOTA MOTA IDF1 dancetrack_val 73.51 92.1 76.48 -
Dependency models
These models are downloaded from DINO
Model Target folder checkpoint0027_5scale_swin mot_files/models/dino checkpoint0029_4scale_swin mot_files/models/dino checkpoint0031_5scale mot_files/models/dino checkpoint0033_4scale mot_files/models/dino
Pretraining (Baseline model): This model is used to initialize the end-to-end model for the next stage.
cd <Coltrack_HOME>
bash scripts/train_dancetrack_val_tbd.sh your_log_folder_nameE2E model training: Select one model from the previous stage and put it in mot_files/models/dino_e2e/4scale_ablation_res_dancetrack.pth.
cd <Coltrack_HOME>
bash scripts/train_dancetrack_val_coltrack.sh your_log_folder_namecd <Coltrack_HOME>
bash scripts/test_dancetrack_val_coltrack.sh your_log_folder_name
# logs/your_log_folder_name/ColTrack/track_results: The tracking results of ColTrack.
# logs/your_log_folder_name/IPTrack/track_results : interpolated tracking results of ColTrack, which may be better or worse. The interpolation algorithm is motlib/tracker/interpolation/gsi.pycd <Coltrack_HOME>
# your_videos_path
# |——————videos1.mp4
# |——————videos2.mp4
bash scripts/demo.sh --output_dir your_log_folder_name --infer_data_path your_videos_path --is_mp4 --draw_tracking_results --inference_sampler_interval 1 --resume mot_files/models/coltrack/dancetrack_val.pth --options config_file=config/E2E/coltrack_inference.py--inference_sampler_interval 3 : The downsampling interval of the video frames.
cd <Coltrack_HOME>
# your_video_frames_path
# |——————videos1
# | └—————frame1.xxx
# | └—————frame2.xxx
# |——————videos2
# | └—————xxx.xxx
# | └—————xxx.xxx
bash scripts/demo.sh --output_dir your_log_folder_name --infer_data_path your_video_frames_path --draw_tracking_results --inference_sampler_interval 1 --resume mot_files/models/coltrack/dancetrack_val.pth --options config_file=config/E2E/coltrack_inference.py@article{liu2023collaborative,
title={Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking},
author={Liu, Yiheng and Wu, Junta and Fu, Yi},
booktitle={ICCV},
year={2023}
}
A large part of the code is borrowed from DINO, MOTR, ByteTrack and Bot-SORT. Many thanks for their wonderful works.

