ICCV2025-GDL

Here is the official code for "Boosting Multimodal Learning via Disentangled Gradient Learning", which is a flexible framework to enhance the optimization process of multimodal learning. Please refer to our ICCV 2025 paper for more details.

Main Dependencies

Ubuntu 20.04
CUDA Version: 11.1
PyTorch 1.11
python 3.8.6
note The optimal hyperparameter ($\alpha$) for DGL may vary with the dependencies. If your equipment and software are different, you may need to adjust the hyperparameters accordingly.

Usage

Data Preparation

Download Dataset： CREMA-D, Kinetics-Sounds. Here we provide the processed dataset directly.

The original dataset can be seen in the following links, CREMA-D, Kinetics-Sounds. VGGSound,

And you need to process the dataset following the instruction below.

Pre-processing

For CREMA-D and VGGSound dataset, we provide code to pre-process videos into RGB frames and audio wav files in the directory data/.

CREMA-D

As the original CREMA-D dataset has provided the original audio and video files, we simply extract the video frames by running the code:

python data/CREMAD/video_preprecessing.py

Note that, the relevant path/dir should be changed according your own env.

VGGSound

As the original VGGSound dataset only provide the raw video files, we have to extract the audio by running the code:

python data/VGGSound/mp4_to_wav.py

Then, extracting the video frames:

python data/VGGSound/video_preprecessing.py

Note that, the relevant path/dir should be changed according your own env.

Data path

you should move the download dataset into the folder train_test_data, or make a soft link in this folder.

Key point of DGL

Unimodal Regularization. Add unimodal loss to enhance the modality encoder via the parameters-shared method. （see **_AUXI in models/fusion_modules.py）
Multimodal Gradient Truncation. Remove the gradient from the multimodal loss to the modality encoder （see .detach() in fusion module in the models/fusion_modules.py）
Unimodal Gradient Truncation. Remove the gradient from the unimodal loss to the multimodal fusion module (see Line 114 in main_dgl.py)

Train

We provide bash file for a quick start.

For CREMA-D

bash cramed_dgl.sh

Test

python valid.py

Contact us

If you have any detailed questions or suggestions, you can email us: shicaiwei@std.uestc.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
dataset		dataset
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
cramed.sh		cramed.sh
cramed_audio.sh		cramed_audio.sh
cramed_dgl.sh		cramed_dgl.sh
cramed_visual.sh		cramed_visual.sh
deap_dataset.py		deap_dataset.py
debug.py		debug.py
generate_splits.py		generate_splits.py
grammar_test.py		grammar_test.py
ks_audio.sh		ks_audio.sh
ks_dgl.sh		ks_dgl.sh
ks_full.sh		ks_full.sh
ks_ogm.sh		ks_ogm.sh
ks_visual.sh		ks_visual.sh
main.py		main.py
main_dgl.py		main_dgl.py
plot.py		plot.py
valid.py		valid.py
vggsound.sh		vggsound.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICCV2025-GDL

Main Dependencies

Usage

Data Preparation

Pre-processing

CREMA-D

VGGSound

Data path

Key point of DGL

Train

Test

Contact us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ICCV2025-GDL

Main Dependencies

Usage

Data Preparation

Pre-processing

CREMA-D

VGGSound

Data path

Key point of DGL

Train

Test

Contact us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages