Skip to content

[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding

License

Notifications You must be signed in to change notification settings

JinXins/MergeMix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

📆 News

[2025/01/26] 🎉 Accepted by ICLR 2026!

[2025/10/28] We release the MergeMix code on LLaVA codebase.

[2025/10/14] We've updated our method on openmixup.

🛠 Installation

1️⃣ For Image Classification

You could use the openmixup environment and codebase if you want to use MergeMix on image classification task.

OpenMixup is an open source mixup toolbox and a benchmark for visual representation learning, and we highly recommend using it in your research if you want to use some mxiup augmentation techniques in image classification.


OpenMixup is compatible with Python 3.6/3.7/3.8/3.9 and PyTorch >= 1.6. Here are quick installations for installation in the development mode:

conda create -n openmixup python=3.8 pytorch=1.12 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate openmixup
pip install openmim
mim install mmcv-full
git clone https://github.com/Westlake-AI/openmixup.git
cd openmixup
python setup.py develop
Installation with PyTorch 2.x requiring different processes.
conda create -n openmixup python=3.9
conda activate openmixup
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install https://download.openmmlab.com/mmcv/dist/cu118/torch2.1.0/mmcv_full-1.7.2-cp39-cp39-manylinux1_x86_64.whl
git clone https://github.com/Westlake-AI/openmixup.git
cd openmixup
pip install -r requirements/runtime.txt
python setup.py develop

Getting Started

OpenMixup supports Linux and macOS. It enables easy implementation and extensions of mixup data augmentation methods in existing supervised, self-, and semi-supervised visual recognition models. Please see get_started.md for the basic usage of OpenMixup.

Training and Evaluation Scripts

Here, we provide scripts for starting a quick end-to-end training with multiple GPUs and the specified CONFIG_FILE.

bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]

For example, you can run the script below to train a ResNet-50 classifier on ImageNet with 4 GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_4xb64_cos_ep100.py 4

After training, you can test the trained models with the corresponding evaluation script:

bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${PATH_TO_MODEL} [optional arguments]

2️⃣ For LLaVA

We use the same environment as LLaVA.

Getting Started

  1. Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Upgrade to latest code base

git pull
pip install -e .

# if you see some import errors when you upgrade,
# please try running the command below (without #)
# pip install flash-attn --no-build-isolation --no-cache-dir

🛫 Training

We use the LLaVA for our codebase:

  1. Install the environment like LLaVA-v1.5
  2. Find the bash scripts/v1_5/finetune.sh
  3. Add some args in bash file as
    • --use_augment True
    • --augment_type "mixup"
    • --use_ranking True
    • --use_tome True
    • --tome_ratio 0
    • --tome_merge_num 288
  4. Run the bash file scripts/v1_5/finetune.sh

🛬 Inference

You need to find the config.json of the checkpoints, and add the parameters:

  1. "use_tome": true
  2. "tome_ratio": 0,
  3. "tome_merge_num": 432

❤ Acknowledgement

  • This work is built upon LLaVA, and OpenMixup. We thank them for their excellent open-source contributions.

🤗 Citation

If you feel that our work has contributed to your research, please consider cite it, and please don`t forget to cite OpenMixup if you use this project.

@article{jin2025mergemix,
      title={MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding}, 
      author={Xin Jin and Siyuan Li and Siyong Jian and Kai Yu and Huan Wang},
      journal={arXiv},
      year={2025},
}

@article{li2022openmixup,
  title = {OpenMixup: A Comprehensive Mixup Benchmark for Visual Classification},
  author = {Siyuan Li and Zedong Wang and Zicheng Liu and Di Wu and Cheng Tan and Stan Z. Li},
  journal = {ArXiv},
  year = {2022},
  volume = {abs/2209.04851}
}

About

[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published