MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates [DOI] [Arxiv]
Accepted by Pattern Recognition on Oct.08, 2025.
Abstract: Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations.
- 2025/10/11 Create project. Support emotion recognition task on IEMOCAP, digit recognition task on AudiovisionMNIST.
- nuScenes
- BraTS2020
- IEMOCAP
- AudiovisionMNIST
cd scene_seg
mkdir nuScenes
First, you need to download raw data from nuScenes to the created path /nuScenes.
Coming soon ...
Coming soon ...
cd emotion_recog
mkdir inputs
mkdir inputs/IEMOCAP
- Download the IEMOCAP extracted features
IEMOCAP_features.zipfrom GoogleDrive to the created path/inputs/IEMOCAP/. - unzip
IEMOCAP_features.zip - Set the path to the features in the json files under folder: data/config/ (Only check)
mkdir unimodal_checkpoints
Download desired pretrained encoder&decoder from GoogleDrive to the created path /unimodal_checkpoints.
cd emotion_recog
conda env create -f environment.yml
Activate the environment mce_iemocap first, then
sh IEMOCAP_MCE.sh
cd digit_recog
mkdir soundmnist
- Download the data from GoogleDrive and extracted it to
/soundmnist
mkdir unimodal_ckpts
mkdir unimodal_ckpts/image
mkdir unimodal_ckpts/sound
Download desired pretrained encoder&decoder from GoogleDrive to the created path /unimodal_checkpoints.
cd digit_recog
conda env create -f environment.yml
Activate the environment audiovision first, then
python train_mce.py
If you are using our project for your research, please cite the following paper:
Coming soon...
or
Coming soon...
- If you are using
scene_seg, please also cite:
- nuScenes: A multimodal dataset for autonomous driving
@inproceedings{caesar2020nuscenes,
title={nuscenes: A multimodal dataset for autonomous driving},
author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={11621--11631},
year={2020}
}
- Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
@inproceedings{harley2023simple,
title={Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?},
author={Harley, Adam W and Fang, Zhaoyuan and Li, Jie and Ambrus, Rares and Fragkiadaki, Katerina},
booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
pages={2759--2765},
year={2023},
organization={IEEE}
}
- If you are using
emotion_recog, please also cite:
- RedCore: Relative Advantage Aware Cross-modal Representation Learning for Missing Modalities with Imbalanced Missing Rates
@inproceedings{sun2024redcore,
title={RedCore: Relative advantage aware cross-modal representation learning for missing modalities with imbalanced missing rates},
author={Sun, Jun and Zhang, Xinxin and Han, Shoukang and Ruan, Yu-Ping and Li, Taihao},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={13},
pages={15173--15182},
year={2024}
}
- If you are using
digit_recog, please also cite:
- SMIL: Multimodal Learning with Severely Missing Modality
@inproceedings{ma2021smil,
title={Smil: Multimodal learning with severely missing modality},
author={Ma, Mengmeng and Ren, Jian and Zhao, Long and Tulyakov, Sergey and Wu, Cathy and Peng, Xi},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
volume={35},
number={3},
pages={2302--2310},
year={2021}
}
Thank for the project supported by