Here is the official code for "Boosting Multimodal Learning via Disentangled Gradient Learning", which is a flexible framework to enhance the optimization process of multimodal learning. Please refer to our ICCV 2025 paper for more details.
- Ubuntu 20.04
- CUDA Version: 11.1
- PyTorch 1.11
- python 3.8.6
-
note The optimal hyperparameter (
$\alpha$ ) for DGL may vary with the dependencies. If your equipment and software are different, you may need to adjust the hyperparameters accordingly.
Download Dataset: CREMA-D, Kinetics-Sounds. Here we provide the processed dataset directly.
The original dataset can be seen in the following links, CREMA-D, Kinetics-Sounds. VGGSound,
And you need to process the dataset following the instruction below.
For CREMA-D and VGGSound dataset, we provide code to pre-process videos into RGB frames and audio wav files in the directory data/.
As the original CREMA-D dataset has provided the original audio and video files, we simply extract the video frames by running the code:
python data/CREMAD/video_preprecessing.py
Note that, the relevant path/dir should be changed according your own env.
As the original VGGSound dataset only provide the raw video files, we have to extract the audio by running the code:
python data/VGGSound/mp4_to_wav.py
Then, extracting the video frames:
python data/VGGSound/video_preprecessing.py
Note that, the relevant path/dir should be changed according your own env.
you should move the download dataset into the folder train_test_data, or make a soft link in this folder.
- Unimodal Regularization. Add unimodal loss to enhance the modality encoder via the parameters-shared method. (see **_AUXI in models/fusion_modules.py)
- Multimodal Gradient Truncation. Remove the gradient from the multimodal loss to the modality encoder (see .detach() in fusion module in the models/fusion_modules.py)
- Unimodal Gradient Truncation. Remove the gradient from the unimodal loss to the multimodal fusion module (see Line 114 in main_dgl.py)
We provide bash file for a quick start.
For CREMA-D
bash cramed_dgl.shpython valid.pyIf you have any detailed questions or suggestions, you can email us: shicaiwei@std.uestc.edu.cn