Xin Jin1,*, Siyuan Li1,2,*, Siyong Jian1, Kai Yu1, Huan Wang1,†,
1Westlake University, 2Zhejiang University
* Equal Contribution † Corresponding Author
[2025/01/26] 🎉 Accepted by ICLR 2026!
[2025/10/28] We release the MergeMix code on LLaVA codebase.
[2025/10/14] We've updated our method on openmixup.
You could use the openmixup environment and codebase if you want to use MergeMix on image classification task.
OpenMixup is an open source mixup toolbox and a benchmark for visual representation learning, and we highly recommend using it in your research if you want to use some mxiup augmentation techniques in image classification.
OpenMixup is compatible with Python 3.6/3.7/3.8/3.9 and PyTorch >= 1.6. Here are quick installations for installation in the development mode:
conda create -n openmixup python=3.8 pytorch=1.12 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate openmixup
pip install openmim
mim install mmcv-full
git clone https://github.com/Westlake-AI/openmixup.git
cd openmixup
python setup.py developInstallation with PyTorch 2.x requiring different processes.
conda create -n openmixup python=3.9
conda activate openmixup
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install https://download.openmmlab.com/mmcv/dist/cu118/torch2.1.0/mmcv_full-1.7.2-cp39-cp39-manylinux1_x86_64.whl
git clone https://github.com/Westlake-AI/openmixup.git
cd openmixup
pip install -r requirements/runtime.txt
python setup.py developOpenMixup supports Linux and macOS. It enables easy implementation and extensions of mixup data augmentation methods in existing supervised, self-, and semi-supervised visual recognition models. Please see get_started.md for the basic usage of OpenMixup.
Here, we provide scripts for starting a quick end-to-end training with multiple GPUs and the specified CONFIG_FILE.
bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]For example, you can run the script below to train a ResNet-50 classifier on ImageNet with 4 GPUs:
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_4xb64_cos_ep100.py 4After training, you can test the trained models with the corresponding evaluation script:
bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${PATH_TO_MODEL} [optional arguments]We use the same environment as LLaVA.
- Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip # enable PEP 660 support
pip install -e .- Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
git pull
pip install -e .
# if you see some import errors when you upgrade,
# please try running the command below (without #)
# pip install flash-attn --no-build-isolation --no-cache-dirWe use the LLaVA for our codebase:
- Install the environment like LLaVA-v1.5
- Find the bash
scripts/v1_5/finetune.sh - Add some args in bash file as
- --use_augment True
- --augment_type "mixup"
- --use_ranking True
- --use_tome True
- --tome_ratio 0
- --tome_merge_num 288
- Run the bash file
scripts/v1_5/finetune.sh
You need to find the config.json of the checkpoints, and add the parameters:
- "use_tome": true
- "tome_ratio": 0,
- "tome_merge_num": 432
- This work is built upon LLaVA, and OpenMixup. We thank them for their excellent open-source contributions.
If you feel that our work has contributed to your research, please consider cite it, and please don`t forget to cite OpenMixup if you use this project.
@article{jin2025mergemix,
title={MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding},
author={Xin Jin and Siyuan Li and Siyong Jian and Kai Yu and Huan Wang},
journal={arXiv},
year={2025},
}
@article{li2022openmixup,
title = {OpenMixup: A Comprehensive Mixup Benchmark for Visual Classification},
author = {Siyuan Li and Zedong Wang and Zicheng Liu and Di Wu and Cheng Tan and Stan Z. Li},
journal = {ArXiv},
year = {2022},
volume = {abs/2209.04851}
}