Skip to content

Timsty1/vlaleaderboard

 
 

Repository files navigation

VLA Leaderboard

A collaborative, open-source leaderboard tracking the state-of-the-art in Vision-Language-Action (VLA) models for robotic manipulation and simulation benchmarks.

🌐 Live Leaderboard: https://vlaleaderboard.com/


🚀 Mission

As the field of robotics moves toward general-purpose foundational models, standardized evaluation becomes critical. This project aims to provide a centralized, easy-to-update repository of performance metrics across diverse simulation environments, enabling researchers to:

  • Track Progress: Follow the latest advances in robotic foundational models.
  • Fair Comparison: Compare models using standardized metrics across different benchmarks.
  • Easy Reference: Access original papers, code repositories, and datasets directly from the leaderboard.

📊 Supported Benchmarks

The leaderboard currently tracks performance across these key simulation environments:

Benchmark Description Key Metrics
LIBERO-PRO Robustness and generalization on LIBERO tasks Success Rate (Obj, Pos, Sem, etc.)
CALVIN Language-conditioned long-horizon manipulation Average task length, Success Rate
LIBERO Lifelong robot learning and knowledge transfer Success Rate (Spatial, Object, etc.)
Meta-World Multi-task and meta reinforcement learning Success Rate (Easy to Very Hard)
SIMPLER-Env Real-world policy evaluation in simulation Average Success Rate
VLABench Diverse robotic primitives and common sense Semantic Instruction, Cross Category
RoboTwin 2.0 Dual-arm manipulation with digital twins Success Rate

🤝 Contributing (Pull Requests Welcome!)

This is a community-driven project. We strongly encourage researchers and developers to submit their results!

How to Add Your Model or Update Scores

  1. Fork the repository.
  2. Add/Update Data:
    • Models: Edit src/data/models.ts to add your model details (name, paper, organization).
    • Scores: Edit src/data/benchmarks.ts to add the evaluation results for specific benchmarks.
  3. Submit a Pull Request: Provide a reference (arXiv link, blog post, or code repo) to verify the results.

Detailed instructions for data structure can be found in the Updating Data section below.


🛠 Technical Setup

This project is built with React, TypeScript.

Development

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

Deployment

# Build and deploy to Firebase Hosting
npm run deploy

📝 Updating Data (For Contributors)

1. Registering a Model (src/data/models.ts)

'model-id': {
  id: 'model-id',
  name: 'Model Name',
  organization: 'Organization Name',
  paper: {
    title: 'Paper Title',
    authors: ['Author 1', 'et al.'],
    year: 2025,
    arxivId: 'XXXX.XXXXX', // Optional: Auto-generates arXiv link
    url: 'https://...',    // Optional: Direct link
  },
  isOpenSource: true,
  dateAdded: '2025-12-19',
}

2. Adding Scores (src/data/benchmarks.ts)

{
  modelId: 'model-id', // Must match the ID in models.ts
  score: 85.5,         // Primary metric score
  details: {           // Sub-metrics (if applicable)
    easy: 90.0,
    hard: 70.0,
  }
}

📜 Acknowledgments

  • Development: Maintained by k1000dai.
  • Inspiration: Inspired by the LMArena Leaderboard.
  • Data: All credits go to the respective authors of the VLA models and benchmarks mentioned.

⚖️ License

Distributed under the MIT License. See LICENSE for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 70.9%
  • CSS 27.2%
  • HTML 1.1%
  • JavaScript 0.8%