MMTEB MCP

REST API + MCP server for MTEB benchmark cached results (526 models, 1319 tasks, 68 benchmarks). Built with FastAPI + Cloudflare Workers.

Data source: embeddings-benchmark/results cached-data branch, updated daily.

Usage (MCP)

The recommended way to use this API is via the Model Context Protocol (MCP) server. Add this to your MCP client config:

{
  "mcpServers": {
    "mmteb": {
      "url": "https://mmteb-api.jina.ai/mcp"
    }
  }
}

No API key required. The MCP server exposes 10 tools:

Tool	Description
`list_benchmarks`	List all available embedding benchmarks
`get_benchmark_rankings`	Get model rankings for a benchmark
`get_model_weaknesses`	Find tasks where a model performs worst
`get_benchmark_gap_to_top`	Show gap between a model and top performers
`list_models`	List/search all models with benchmark results
`get_model_tasks`	Get all task results for a model
`get_model_rank`	Get model rank in a specific benchmark
`compare_models`	Compare models head-to-head on a benchmark
`list_tasks`	List all evaluation tasks
`get_task_rankings`	Get model rankings for a specific task

Server info: GET https://mmteb-api.jina.ai/mcp

Features

Official-aligned rankings: Uses per-task-type mean scoring (mean of type averages, not simple task mean) to match the official MTEB leaderboard. Models must cover ≥80% of a benchmark's tasks to be ranked.
Auto task-type inference: Tasks with missing type metadata are classified from name patterns (e.g., *Classification → Classification, *HardNegatives → Retrieval).
Autoresearch signals: Weaknesses, gap-to-top, neighborhood, and better-than endpoints for competitive analysis and model optimization.
Hot reload: POST /refresh re-downloads data and atomic-swaps without downtime.
Fast cold start: Pre-built pickle baked into Docker image, ~5s startup on Cloud Run.

Setup

uv sync
uv run uvicorn src.main:app --reload

Data loads in the background on startup (~5s from pre-built pickle, ~20s from JSON). /health returns {"status":"loading"} until ready.

API Reference

Base URL: https://mmteb-api.jina.ai

Model names in URLs use __ instead of / (e.g. jinaai__jina-embeddings-v5-text-small). All scores are percentages (0-100).

Health & Stats

Method	Path	Description
GET	`/health`	Returns `ok`/`loading`/`error` + model/task counts
GET	`/stats`	Total models, tasks, benchmarks

Models

Method	Path	Description
GET	`/models`	List models. Filters: `name` (substring), `modality`, `min_tasks`
GET	`/models/{model}/tasks`	All task scores for a model, sorted by score desc
GET	`/models/{model}/tasks/{task}`	Detailed scores per subset/split for a task
GET	`/models/{model}/rank`	Model's rank + percentile on every benchmark it has results for

Benchmarks

Method	Path	Description
GET	`/benchmarks`	List all benchmarks
GET	`/benchmarks/{bench}/rankings`	Leaderboard. Optional `top_n`
GET	`/benchmarks/{bench}/models/{model}`	Model's per-task scores on a benchmark
GET	`/benchmarks/{bench}/by_type/{model}`	Scores grouped by task type (avg, count, task list)
GET	`/benchmarks/{bench}/weaknesses/{model}`	Model's weakest tasks by percentile rank
GET	`/benchmarks/{bench}/gap_to_top/{model}`	Gap to top-N models, broken down by task type

Tasks

Method	Path	Description
GET	`/tasks`	List tasks. Filters: `type`, `language`
GET	`/tasks/{task}/rankings`	Model leaderboard for a task. Optional `top_n`
GET	`/tasks/{task}/info`	Description, type, domains, score distribution (min/max/mean/p25/p75)
GET	`/tasks/{task}/better_than/{model}`	Models scoring higher, with gap. Optional `top_n`
GET	`/tasks/{task}/neighborhood/{model}`	Models ranked around yours. Optional `radius` (default 5)

Compare & Admin

Method	Path	Description
GET	`/compare?models=m1,m2&benchmark=X`	Side-by-side comparison of multiple models
POST	`/refresh`	Re-download data and atomic swap (safe while serving)

Usage Examples

Quick model overview

# Where does my model rank across all benchmarks?
curl "https://mmteb-api.jina.ai/models/jinaai__jina-embeddings-v5-text-small/rank"

Autoresearch KPI workflow

# 1. Find weakest tasks on MTEB(eng, v2)
curl "https://mmteb-api.jina.ai/benchmarks/MTEB(eng,%20v2)/weaknesses/jinaai__jina-embeddings-v5-text-small?top_n=5"

# 2. See gap to top-3 models, broken down by task type
curl "https://mmteb-api.jina.ai/benchmarks/MTEB(eng,%20v2)/gap_to_top/jinaai__jina-embeddings-v5-text-small?top_n=3"

# 3. Scores grouped by task type
curl "https://mmteb-api.jina.ai/benchmarks/MTEB(eng,%20v2)/by_type/jinaai__jina-embeddings-v5-text-small"

# 4. Who beats you on a specific task, and by how much?
curl "https://mmteb-api.jina.ai/tasks/FiQA2018/better_than/jinaai__jina-embeddings-v5-text-small?top_n=5"

# 5. See your competitive neighborhood on a task
curl "https://mmteb-api.jina.ai/tasks/FiQA2018/neighborhood/jinaai__jina-embeddings-v5-text-small?radius=3"

# 6. Understand what a task measures
curl "https://mmteb-api.jina.ai/tasks/FiQA2018/info"

# 7. Compare two models head-to-head
curl "https://mmteb-api.jina.ai/compare?models=jinaai__jina-embeddings-v5-text-small,jinaai__jina-embeddings-v5-text-nano&benchmark=MTEB(eng,%20v2)"

# 8. After publishing new results, refresh data
curl -X POST "https://mmteb-api.jina.ai/refresh"

Deployment

Deploys to GCP Cloud Run (jinaai-public project, us-central1) via GitHub Actions on push to main.

Custom domain: mmteb-api.jina.ai (Cloudflare DNS -> GCP domain mapping)

Memory: 4Gi
CPU: 2 (with startup CPU boost, no throttling)
Instances: 0-1 (scales to zero)
Cold start: ~5s (pre-built pickle baked into Docker image)

Tests

uv sync --all-groups
uv run pytest tests/ -v

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
tests		tests
training_data_gen		training_data_gen
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
OPTIMIZATION_PLAN.md		OPTIMIZATION_PLAN.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMTEB MCP

Usage (MCP)

Features

Setup

API Reference

Health & Stats

Models

Benchmarks

Tasks

Compare & Admin

Usage Examples

Quick model overview

Autoresearch KPI workflow

Deployment

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMTEB MCP

Usage (MCP)

Features

Setup

API Reference

Health & Stats

Models

Benchmarks

Tasks

Compare & Admin

Usage Examples

Quick model overview

Autoresearch KPI workflow

Deployment

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages