COMO: CrOss-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection

📖 Introduction

This repository contains the official implementation of COMO, a framework designed to improve multimodal object detection through CrOss-Mamba interaction and Offset-guided fusion.
Our approach mitigates offset effects, reduces computational costs, and improves the accuracy of multimodal object detection in remote sensing scenarios.

🛠 Requirements

Python: 3.11
PyTorch: 2.0.0
Numpy: 1.23.5
CUDA: 11.8~12.5
Other requirements can be seen in requirement.txt

Install dependencies step by step:

# 1. Install dependencies
pip install causal_conv1d-1.1.1+cu118torch2.0cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install mamba_ssm-1.2.0.post1+cu118torch2.0cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
# mamba_ssm-1.2.0链接: https://pan.baidu.com/s/16X_vhdSkRB_9y6GKOgv4cw?pwd=1234 提取码: 1234

# 2. Replace mamba_simple.py & local_scan.py
# (copy modified ./ssm/mamba_simple.py and ./ssm/local_scan.py  into the installed mamba-ssm package folder as fig.1 )

# 3. Replace selective_scan_interface.py 
# (copy  modified ./ssm/selective_scan_interface.py into the correct path in mamba-ssm/ops)

mamba-ssm/
├── modules/
│   ├── mamba_simple.py     ← Replace this file (Step 2)
│   └──  local_scan.py       ← Replace this file (Step 2)
├── ops/
│   └── selective_scan_interface.py ← Replace this file (Step 3)

📊 dataset

The DroneVehicle dataset is available through Baidu Cloud:

🔗 Download Link: Baidu Cloud Drive
📁 Extraction Code: 1234
📦 File Size: 6.46 GB
📄 Format: ZIP archive

🏗 Dataset Structure After downloading and extracting, organize the dataset as follows:

DroneVehicle/
├── visible/
│   ├── train/
│   │   ├── 00001.jpg
│   │   ├── 00002.jpg
│   │   └── ...
│   ├── val/
│   │   └── ...
│   └── test/
│       └── ...
├── infrared/
│   ├── train/
│   │   └── ...
│   ├── val/
│   │   └── ...
│   └── test/
│       └── ...
└── labels/
    ├── train/
    │   └── ...
    ├── val/
    │   └── ...
    └── test/
       └── ...

🚀 Train

To train COMO on your dataset, run:

torchrun --nproc_per_node=2 --master_port=29502 train.py

⚠️ Adjust dataset path and hyperparameters according to your environment.

🧪 Validation

To evaluate a trained model, and the checkpoint is in './checkpoint/best.pt':

python val.py

This will output precision, recall, and mAP results.

🧪 Compute param. and Flops

python param.py

This will output param. and Flops.

🧪 Prediction

python  predict_write_csv.py

This will output predictions.

🙏 Acknowledgements

This repository is built upon Ultralytics YOLO and mamba-ssm. We thank the authors for their excellent work.

📑 Citation

If you find this repository useful, please cite our paper:

@article{liu2026cross,
  title={COMO: Cross-mamba interaction and offset-guided fusion for multimodal object detection},
  author={Liu, Chang and Ma, Xin and Yang, Xiaochen and Zhang, Yuxiang and Dong, Yanni},
  journal={Information Fusion},
  volume={125},
  pages={103414},
  year={2026},
  publisher={Elsevier}
}

📧 Contact

For questions or discussions, please contact liu_chang_@whu.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
checkpoints		checkpoints
docker		docker
docs		docs
ssm		ssm
ultralytics		ultralytics
.gitattributes		.gitattributes
README.md		README.md
causal_conv1d-1.1.1+cu118torch2.0cxx11abiFALSE-cp311-cp311-linux_x86_64.whl		causal_conv1d-1.1.1+cu118torch2.0cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
param.py		param.py
predict_write_csv.py		predict_write_csv.py
requirements.txt		requirements.txt
train.py		train.py
val.py		val.py
yolo11n.pt		yolo11n.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMO: CrOss-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection

📖 Introduction

🛠 Requirements

📊 dataset

🚀 Train

🧪 Validation

🧪 Compute param. and Flops

🧪 Prediction

🙏 Acknowledgements

📑 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

COMO: CrOss-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection

📖 Introduction

🛠 Requirements

📊 dataset

🚀 Train

🧪 Validation

🧪 Compute param. and Flops

🧪 Prediction

🙏 Acknowledgements

📑 Citation

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages