[CVPR 2026 Highlight] AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

Release

[2026.04.09] AdaptVision has been selected as a CVPR 2026 Highlight paper.
[2026.02.21] AdaptVision is accepted by CVPR 2026.
[2025.11.18] 🔥 AdaptVision is coming! We release the project page, paper, code and models.

Demo

The full runnable notebook is available in cookbooks/adaptvision.ipynb.

Question: Is there a stop sign facing us?

Global view -> local zoom -> final answer: Yes, there is a stop sign facing us.

# Equivalent to the example in cookbooks/adaptvision.ipynb
bot = AdaptVision(model_path="AdaptVision/AdaptVision-7B")
result = bot.run("assets/test_img2.png", "Is there a stop sign facing us?")
show_result(result)

--- Round 1 ---
<think>...I need to zoom in on that area.</think>
<tool_call>{"name": "request_local_region", "arguments": {"bbox_2d": [418, 189, 440, 214]}}</tool_call>

--- Round 2 ---
<answer>Yes, there is a stop sign facing us.</answer>

AdaptVision first reasons over a downsampled global image, then requests a high-resolution local crop before producing the final answer. This active-vision loop helps preserve efficiency while recovering small but decisive details.

Installation

The environment follows the Verl.

git clone https://github.com/AdaptVision/AdaptVision.git
conda create -n adaptvision python=3.11 -y
conda activate adaptvision
# veRL
pip3 install -e . 
# flash-attn
pip3 install flash-attn==2.7.3 --no-build-isolation

pip install transformers==4.51.0
pip install math_verify
pip install ray[default]
pip install tensordict==0.6.2
pip install qwen_vl_utils

Train

Data Preparation

# train file
huggingface-cli download --repo-type dataset --resume-download Senqiao/VisionThink-Smart-Train --local-dir datasets/VisionThink-Smart-Train

# val file
huggingface-cli download --repo-type dataset --resume-download Senqiao/VisionThink-Smart-Val --local-dir datasets/VisionThink-Smart-Val

Train AdaptVision via Reinforcement Learning

To use GPT as the reward model, first set the following environment variables:

AZURE_API_KEY=
AZURE_ENDPOINT=
AZURE_API_VERSION=

Run AdaptVision Training:

bash scripts/run_adaptvision.sh

Evaluation

We use lmms-eval to evaluate our model. Setup the evaulation environment by following instructions here.

We provide the evaluation code detail in scripts/vllm_adaptvision.py.

Citation

If you find this project useful in your research, please consider citing:

@article{lin2025adaptvision,
  title={AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition},
  author={Lin, Zichuan and Liu, Yicheng and Yang, Yang and Tao, Lvfang and Ye, Deheng},
  journal={arXiv preprint arXiv:2512.03794},
  year={2025}
}

Acknowledgement

We would like to thank the following repos for their great work:

This work is built upon the verl, lmms-eval, and VisionThink.
This work utilizes models from Qwen, and data from VisionThink.

License

AdaptVision is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
cookbooks		cookbooks
patches		patches
scripts		scripts
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2026 Highlight] AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

Release

Contents

Demo

Installation

Train

Data Preparation

Train AdaptVision via Reinforcement Learning

Evaluation

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2026 Highlight] AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

Release

Contents

Demo

Installation

Train

Data Preparation

Train AdaptVision via Reinforcement Learning

Evaluation

Citation

Acknowledgement

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages