Skip to content

byzhaoAI/MCE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCE

MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates [DOI] [Arxiv]

Accepted by Pattern Recognition on Oct.08, 2025.

Overview

MCE Abstract: Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations.

Updates

  • 2025/10/11 Create project. Support emotion recognition task on IEMOCAP, digit recognition task on AudiovisionMNIST.

Dataset Support

  • nuScenes
  • BraTS2020
  • IEMOCAP
  • AudiovisionMNIST

Quick Start

1. nuScenes

Dataset setup
cd scene_seg
mkdir nuScenes

First, you need to download raw data from nuScenes to the created path /nuScenes.

Pretrained unimodal setup

Coming soon ...

Install and Train

Coming soon ...

2. IEMOCAP

Dataset setup
cd emotion_recog
mkdir inputs
mkdir inputs/IEMOCAP
  1. Download the IEMOCAP extracted features IEMOCAP_features.zip from GoogleDrive to the created path /inputs/IEMOCAP/.
  2. unzip IEMOCAP_features.zip
  3. Set the path to the features in the json files under folder: data/config/ (Only check)
Pretrained unimodal setup
mkdir unimodal_checkpoints

Download desired pretrained encoder&decoder from GoogleDrive to the created path /unimodal_checkpoints.

Install
cd emotion_recog
conda env create -f environment.yml
Train

Activate the environment mce_iemocap first, then

sh IEMOCAP_MCE.sh

3. AudiovisionMNIST

Dataset setup
cd digit_recog
mkdir soundmnist
  • Download the data from GoogleDrive and extracted it to /soundmnist
Pretrained unimodal setup
mkdir unimodal_ckpts
mkdir unimodal_ckpts/image
mkdir unimodal_ckpts/sound

Download desired pretrained encoder&decoder from GoogleDrive to the created path /unimodal_checkpoints.

Install
cd digit_recog
conda env create -f environment.yml
Train

Activate the environment audiovision first, then

python train_mce.py

Citation

If you are using our project for your research, please cite the following paper:

Coming soon...

or

Coming soon...
  1. If you are using scene_seg, please also cite:
  • nuScenes: A multimodal dataset for autonomous driving
@inproceedings{caesar2020nuscenes,
  title={nuscenes: A multimodal dataset for autonomous driving},
  author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={11621--11631},
  year={2020}
}
  • Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
@inproceedings{harley2023simple,
  title={Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?},
  author={Harley, Adam W and Fang, Zhaoyuan and Li, Jie and Ambrus, Rares and Fragkiadaki, Katerina},
  booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={2759--2765},
  year={2023},
  organization={IEEE}
}
  1. If you are using emotion_recog, please also cite:
  • RedCore: Relative Advantage Aware Cross-modal Representation Learning for Missing Modalities with Imbalanced Missing Rates
@inproceedings{sun2024redcore,
  title={RedCore: Relative advantage aware cross-modal representation learning for missing modalities with imbalanced missing rates},
  author={Sun, Jun and Zhang, Xinxin and Han, Shoukang and Ruan, Yu-Ping and Li, Taihao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={13},
  pages={15173--15182},
  year={2024}
}
  1. If you are using digit_recog, please also cite:
  • SMIL: Multimodal Learning with Severely Missing Modality
@inproceedings{ma2021smil,
  title={Smil: Multimodal learning with severely missing modality},
  author={Ma, Mengmeng and Ren, Jian and Zhao, Long and Tulyakov, Sergey and Wu, Cathy and Peng, Xi},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  volume={35},
  number={3},
  pages={2302--2310},
  year={2021}
}

Acknowledgements

Thank for the project supported by

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors