VMBench: A Benchmark for Perception-Aligned Video Motion Generation

🔥 Updates

[7/2025] Paper accepted to ICCV 2025!
[3/2025] VMBench evaluation code & prompt set released!

📣 Overview

Video generation has advanced rapidly, improving evaluation methods, yet assessing video's motion remains a major challenge. Specifically, there are two key issues: 1) current motion metrics do not fully align with human perceptions; 2) the existing motion prompts are limited. Based on these findings, we introduce VMBench---a comprehensive Video Motion Benchmark that has perception-aligned motion metrics and features the most diverse types of motion. VMBench has several appealing properties: (1) Perception-Driven Motion Evaluation Metrics, we identify five dimensions based on human perception in motion video assessment and develop fine-grained evaluation metrics, providing deeper insights into models' strengths and weaknesses in motion quality. (2) Meta-Guided Motion Prompt Generation, a structured method that extracts meta-information, generates diverse motion prompts with LLMs, and refines them through human-AI validation, resulting in a multi-level prompt library covering six key dynamic scene dimensions. (3) Human-Aligned Validation Mechanism, we provide human preference annotations to validate our benchmarks, with our metrics achieving an average 35.3% improvement in Spearman’s correlation over baseline methods. This is the first time that the quality of motion in videos has been evaluated from the perspective of human perception alignment.

📊Evaluation Results

Gallery

Prompt: A tourist joyfully splashes water in an outdoor swimming pool, their arms and legs moving energetically as they playfully splash around.

cogvideo-1.mp4	hunyuan-1.mp4	mochi-1.mp4
opensora-1.mp4	opensoraplan-1.mp4	wan-1.mp4

Prompt: Three books are thrown into the air, their pages fluttering as they soar over the soccer field, landing in a scattered pattern.

cogvideo-2.mp4	hunyuan-2.mp4	mochi-2.mp4
opensora-2.mp4	opensora-plan-2.mp4	wan-2.mp4

Prompt: Four flickering candles cast shadows as they burn steadily on the balcony, their flames dancing with the gentle breeze.

cogvideo-3.mp4	hunyuan-3.mp4	mochi-3.mp4
opensora-3.mp4	opensora-plan-3.mp4	wan-3.mp4

Prompt: Two penguins waddle along the beach, occasionally stopping to preen their feathers before continuing their journey across the ocean shore.

cogvideo-4.mp4	hunyuan-4.mp4	mochi-4.mp4
opensora-4.mp4	opensora-plan-4.mp4	wan-4.mp4

Prompt: In the bustling street, two kids run towards a small dog, bending down to carefully comb its fur, their hands moving swiftly.

cogvideo-5.mp4	hunyuan-5.mp4	mochi-5.mp4
opensora-5.mp4	opensora-plan-5.mp4	wan-5.mp4

Prompt: In the garage, a young girl twirls gracefully, her arms outstretched, perfectly matching the lively country line dance beat.

cogvideo-6.mp4	hunyuan-6.mp4	mochi-6.mp4
opensora-6.mp4	opensora-plan-6.mp4	wan-6.mp4

Quantitative Results

VMBench Leaderboard

Models	Avg	CAS	MSS	OIS	PAS	TCS
OpenSora-v1.2	51.6	31.2	61.9	73.0	3.4	88.5
Mochi 1	53.2	37.7	62.0	68.6	14.4	83.6
OpenSora-Plan-v1.3.0	58.9	39.3	76.0	78.6	6.0	94.7
CogVideoX-5B	60.6	50.6	61.6	75.4	24.6	91.0
HunyuanVideo	63.4	51.9	81.6	65.8	26.1	96.3
Wan2.1	78.4	62.8	84.2	66.0	17.9	97.8

🔨 Installation

Create Environment

git clone https://github.com/Ran0618/VMBench.git
cd VMBench

# create conda environment
conda create -n VMBench python=3.10
pip install --upgrade setuptools
pip install torch==2.5.1 torchvision==0.20.1

# Install Grounded-Segment-Anything module
cd Grounded-Segment-Anything
python -m pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
pip install -r requirements.txt

# Install Groudned-SAM-2 module
cd ../Grounded-SAM-2
pip install -e .

# Install MMPose toolkit
pip install -U openmim
mim install mmengine
mim install "mmcv==2.1.0"
mim install "mmdet==3.2.0"
cd ../mmpose
pip install -r requirements.txt
pip install -v -e .

# Install Q-Align module
cd ../Q-Align
pip install -e .

# Install VideoMAEv2 module
cd ../VideoMAEv2
pip install -r requirements.txt

cd ..
pip install -r requirements.txt

Download Checkpoints

Place the pre-trained checkpoint files in the .cache directory. You can download our model's checkpoints are from our HuggingFace repository 🤗. You also need to download the checkpoints for Q-Align 🤗 and BERT 🤗 from their respective HuggingFace repositories

mkdir .cache

huggingface-cli download GD-ML/VMBench --local-dir .cache/
huggingface-cli download q-future/one-align --local-dir .cache/
huggingface-cli download google-bert/bert-base-uncased --local-dir .cache/

Please organize the pretrained models in this structure:

VMBench/.cache
├── google-bert
│   └── bert-base-uncased
│       ├── LICENSE
│        ......
├── groundingdino_swinb_cogcoor.pth
├── q-future
│   └── one-align
│       ├── README.md
│       ......
├── sam2.1_hiera_large.pt
├── sam_vit_h_4b8939.pth
├── scaled_offline.pth
└── vit_g_vmbench.pt

🔧Usage

Videos Preparation

Generate videos of your model using the 1050 prompts provided in prompts/prompts.txt or prompts/prompts.json and organize them in the following structure:

VMBench/eval_results/videos
├── 0001.mp4
├── 0002.mp4
...
└── 1050.mp4

Note: Ensure that you maintain the correspondence between prompts and video sequence numbers. The index for each prompt can be found in the prompts/prompts.json file.

You can follow us sample_video_demo.py to generate videos. Or you can put the results video named index into your own folder.

Evaluation on the VMBench

Running the Evaluation Pipeline

To evaluate generated videos using the VMBench, run the following command:

bash evaluate.sh your_videos_folder

The evaluation results for each video will be saved in the ./eval_results/${current_time}/results.json. Scores for each dimension will be saved as ./eval_results/${current_time}/scores.csv.

Evaluation Efficiency

We conducted a test using the following configuration:

Model: CogVideoX-5B
Number of Videos: 1,050
Frames per Video: 49
Frame Rate: 8 FPS

Here are the time measurements for each evaluation metric:

Metric	Time Taken
PAS (Perceptible Amplitude Score)	45 minutes
OIS (Object Integrity Score)	30 minutes
TCS (Temporal Coherence Score)	2 hours
MSS (Motion Smoothness Score)	2.5 hours
CAS (Commonsense Adherence Score)	1 hour

Total Evaluation Time: 6 hours and 45 minutes

❤️Acknowledgement

We would like to express our gratitude to the following open-source repositories that our work is based on: GroundedSAM, GroundedSAM2, Co-Tracker, MMPose, Q-Align, VideoMAEv2, VideoAlign. Their contributions have been invaluable to this project.

📜License

The VMBench is licensed under Apache-2.0 license. You are free to use our codes for research purpose.

✏️Citation

If you find our repo useful for your research, please consider citing our paper:

@article{ling2025vmbench,
title={VMBench: A Benchmark for Perception-Aligned Video Motion Generation},
author={Ling, Xinran and Zhu, Chen and Wu, Meiqi and Li, Hangyu and Feng, Xiaokun and Yang, Cundian and Hao, Aiming and Zhu, Jiashu and Wu, Jiahong and Chu, Xiangxiang},
journal={arXiv preprint arXiv:2503.10076},
year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Grounded-SAM-2		Grounded-SAM-2
Grounded-Segment-Anything		Grounded-Segment-Anything
Q-Align		Q-Align
VideoMAEv2		VideoMAEv2
asset		asset
bench_utils		bench_utils
co-tracker		co-tracker
mmpose		mmpose
prompts		prompts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
commonsense_adherence_score.py		commonsense_adherence_score.py
evaluate.sh		evaluate.sh
motion_smoothness_score.py		motion_smoothness_score.py
object_integrity_score.py		object_integrity_score.py
perceptible_amplitude_score.py		perceptible_amplitude_score.py
requirements.txt		requirements.txt
sample_video_demo.py		sample_video_demo.py
temporal_coherence_score.py		temporal_coherence_score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

🔥 Updates

📣 Overview

📊Evaluation Results

Gallery

Quantitative Results

VMBench Leaderboard

🔨 Installation

Create Environment

Download Checkpoints

🔧Usage

Videos Preparation

Evaluation on the VMBench

Running the Evaluation Pipeline

Evaluation Efficiency

❤️Acknowledgement

📜License

✏️Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

AMAP-ML/VMBench

Folders and files

Latest commit

History

Repository files navigation

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

🔥 Updates

📣 Overview

📊Evaluation Results

Gallery

Quantitative Results

VMBench Leaderboard

🔨 Installation

Create Environment

Download Checkpoints

🔧Usage

Videos Preparation

Evaluation on the VMBench

Running the Evaluation Pipeline

Evaluation Efficiency

❤️Acknowledgement

📜License

✏️Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages