Skip to content

RenxingIntelligence/OpenVTON-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ‘• OpenVTON-Bench

A Large-Scale High-Resolution Benchmark for Controllable Virtual Try-On Evaluation

arXiv Dataset Python CUDA License

A comprehensive multi-modal evaluation protocol using VLM-based scores and multi-scale representation metrics.

Overview โ€ข Dataset โ€ข Evaluation โ€ข Installation โ€ข Quick Start โ€ข Citation


๐Ÿ” Overview

OpenVTON-Bench is a large-scale, high-resolution benchmark designed for the systematic evaluation of controllable virtual try-on (VTON) models.

Unlike existing datasets and evaluation protocols that struggle with texture details and semantic consistency, OpenVTON-Bench provides:

  • ๐Ÿ–ผ๏ธ ~100K Image Pairs: Breathtaking resolutions up to 1536ร—1536 to evaluate fine-grained texture generation.
  • ๐Ÿท๏ธ Fine-Grained Taxonomy: Semantically balanced across 20 garment categories.
  • ๐Ÿ“ Multi-Level Automated Evaluation: Comprehensively covering:
    • Pixel fidelity
    • Garment consistency
    • Semantic realism

This benchmark enables fair, reproducible, and scalable comparison across modern diffusion-based and transformer-based try-on systems.


๐Ÿงฌ Data Construction Pipeline

Data Construction Pipeline

The dataset is constructed through a rigorous three-stage pipeline ensuring category diversity, visual quality, and semantic consistency:

  1. ๐ŸŒ Web-Scale Crawling: With strict resolution filtering to maintain commercial-grade quality.
  2. โœ๏ธ Hybrid Annotation: Combining human verification with Vision-Language Model (VLM) dense captioning.
  3. โš–๏ธ Semantic-Aware Balancing: Utilizing DINOv3 hierarchical clustering for uniform distribution.

๐Ÿ“ฆ Dataset

๐Ÿ”— HuggingFace Access

The full dataset is open-source and publicly available on HuggingFace:

๐Ÿ“Š Statistics

Property Value Description
Image Pairs ~100,000 High-quality garment and person pairs
Resolution Up to 1536ร—1536 Critical for fine-grained texture assessment
Categories 20 Fine-grained garment taxonomy
Annotation Hybrid VLM dense captioning + human verified

๐Ÿ‘— Garment Categories

Category Grid

Samples from the 20 garment categories (C0โ€“C19) included in OpenVTON-Bench.


๐Ÿ“Š Evaluation Protocols

OpenVTON-Bench introduces a hybrid evaluation paradigm featuring four complementary protocols. This multi-view design captures both perceptual and structural quality of generated try-on results:

Type Description Key Metrics
๐Ÿง  VLM-based Semantic realism via Vision-Language Models (VLM-as-a-Judge) Background, Identity, Texture, Shape, Overall
โœ‚๏ธ Garment-based Region-level evaluation via SAM3 SSIM, PSNR, LPIPS, Cosine
๐Ÿ–ผ๏ธ All-based Full-image feature comparison using DINOv3 SSIM, LPIPS, Cosine
๐Ÿ‘พ Pixel-based Raw pixel structural comparison MSE, PSNR, SSIM, LPIPS, FID

๐Ÿงฎ Evaluation Metrics Guide

Metric Goal Description
SSIM โ†‘ Higher Structural Similarity Index
PSNR โ†‘ Higher Peak Signal-to-Noise Ratio
Cosine Sim โ†‘ Higher Feature-level cosine similarity
LPIPS โ†“ Lower Learned Perceptual Image Patch Similarity
FID โ†“ Lower Frรฉchet Inception Distance
MSE โ†“ Lower Mean Squared Error

Note

VLM evaluation utilizes a 1โ€“5 scoring scale across five semantic dimensions (Background, Identity, Texture, Shape, and Overall realism).


๐Ÿ›  Installation

Requirements

  • Python 3.12+
  • CUDA 12.4+
  • Recommended: 4ร— GPUs for large-scale evaluation

Setup Environment

  1. Clone the repository:

    git clone https://github.com/RenxingIntelligence/OpenVTON-Bench.git
    cd OpenVTON-Bench
  2. Create and activate the environment: Using conda (Recommended):

    conda env create -f env.yaml
    conda activate bench_new

    Or using pip:

    pip install -r requirements.txt

๐Ÿš€ Quick Start

1. Download Required Models

Before running the benchmark, prepare the necessary backbone weights for feature extraction and segmentation in the models/ directory:

models/
 โ”œโ”€โ”€ dinov3-vith16plus/    # Feature extraction
 โ””โ”€โ”€ sam3/                 # Garment segmentation

2. Configure Paths

Copy the template configuration file:

cp benchmark/config.yaml benchmark/config.local.yaml

Update the configuration (benchmark/config.local.yaml) with your generated image directories and model paths:

data:
  test_jsonl: "./data/test_samples.jsonl"
  generated_dirs:
    - name: "your_model"
      path: "./generated_images/your_model"

models:
  dinov3:
    path: "./models/dinov3-vith16plus"

3. Run Benchmark

Run the full benchmark suite:

bash run_benchmark.sh --config benchmark/config.local.yaml

Or run specific evaluation types individually:

bash run_benchmark.sh --eval-type pixel
bash run_benchmark.sh --eval-type garment
bash run_benchmark.sh --eval-type vlm

Warning

Dataset Format Requirement: The generated images must maintain identical filenames to the source/target images as specified in the JSONL: {"source": "00001.jpg", "target": "00001.jpg"}


๐Ÿ“ Output Structure

The benchmark automatically generates a rich suite of analytics, outputted to the results/ directory:

results/
 โ””โ”€โ”€ YYYYMMDD_HHMMSS/
     โ”œโ”€โ”€ summary.json          # Aggregate metric scores
     โ”œโ”€โ”€ per_model/            # Detailed model-specific data
     โ””โ”€โ”€ visualizations/       # Radar plots, comparison charts, per-sample diagnostics

๐Ÿ“ˆ Correlation Analysis

To measure the agreement between our automated multi-modal metrics and human subjective evaluation:

python benchmark/analyze_correlation.py \
  --result_dir results/... \
  --human_ratings data/human.json

๐Ÿงฑ Project Structure

Click to expand
benchmark/
 โ”œโ”€โ”€ metrics/                  # Implementation of all evaluation metrics
 โ”œโ”€โ”€ utils/                    # Helper scripts and visualizers
 โ”œโ”€โ”€ run_benchmark.py          # Main execution entrypoint
 โ””โ”€โ”€ analyze_correlation.py    # Statistical correlation tools

๐Ÿงพ Citation

If you find this benchmark useful in your research, please consider citing:

@misc{li2026openvton,
  title={OpenVTON-Bench: A Large-Scale High-Resolution Benchmark for Controllable Virtual Try-On Evaluation},
  author={Jin Li and Tao Chen and Shuai Jiang and Weijie Wang and Jingwen Luo and Chenhui Wu},
  year={2026},
  eprint={2601.22725},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2601.22725}
}

๐Ÿ“œ License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors