MCP: https://mmteb-api.jina.ai/mcp
REST API + MCP server for MTEB benchmark cached results (526 models, 1319 tasks, 68 benchmarks). Built with FastAPI + Cloudflare Workers.
Data source: embeddings-benchmark/results cached-data branch, updated daily.
The recommended way to use this API is via the Model Context Protocol (MCP) server. Add this to your MCP client config:
{
"mcpServers": {
"mmteb": {
"url": "https://mmteb-api.jina.ai/mcp"
}
}
}No API key required. The MCP server exposes 10 tools:
| Tool | Description |
|---|---|
list_benchmarks |
List all available embedding benchmarks |
get_benchmark_rankings |
Get model rankings for a benchmark |
get_model_weaknesses |
Find tasks where a model performs worst |
get_benchmark_gap_to_top |
Show gap between a model and top performers |
list_models |
List/search all models with benchmark results |
get_model_tasks |
Get all task results for a model |
get_model_rank |
Get model rank in a specific benchmark |
compare_models |
Compare models head-to-head on a benchmark |
list_tasks |
List all evaluation tasks |
get_task_rankings |
Get model rankings for a specific task |
Server info: GET https://mmteb-api.jina.ai/mcp
- Official-aligned rankings: Uses per-task-type mean scoring (mean of type averages, not simple task mean) to match the official MTEB leaderboard. Models must cover ≥80% of a benchmark's tasks to be ranked.
- Auto task-type inference: Tasks with missing type metadata are classified from name patterns (e.g.,
*Classification→ Classification,*HardNegatives→ Retrieval). - Autoresearch signals: Weaknesses, gap-to-top, neighborhood, and better-than endpoints for competitive analysis and model optimization.
- Hot reload:
POST /refreshre-downloads data and atomic-swaps without downtime. - Fast cold start: Pre-built pickle baked into Docker image, ~5s startup on Cloud Run.
uv sync
uv run uvicorn src.main:app --reloadData loads in the background on startup (~5s from pre-built pickle, ~20s from JSON). /health returns {"status":"loading"} until ready.
Base URL: https://mmteb-api.jina.ai
Model names in URLs use __ instead of / (e.g. jinaai__jina-embeddings-v5-text-small). All scores are percentages (0-100).
| Method | Path | Description |
|---|---|---|
| GET | /health |
Returns ok/loading/error + model/task counts |
| GET | /stats |
Total models, tasks, benchmarks |
| Method | Path | Description |
|---|---|---|
| GET | /models |
List models. Filters: name (substring), modality, min_tasks |
| GET | /models/{model}/tasks |
All task scores for a model, sorted by score desc |
| GET | /models/{model}/tasks/{task} |
Detailed scores per subset/split for a task |
| GET | /models/{model}/rank |
Model's rank + percentile on every benchmark it has results for |
| Method | Path | Description |
|---|---|---|
| GET | /benchmarks |
List all benchmarks |
| GET | /benchmarks/{bench}/rankings |
Leaderboard. Optional top_n |
| GET | /benchmarks/{bench}/models/{model} |
Model's per-task scores on a benchmark |
| GET | /benchmarks/{bench}/by_type/{model} |
Scores grouped by task type (avg, count, task list) |
| GET | /benchmarks/{bench}/weaknesses/{model} |
Model's weakest tasks by percentile rank |
| GET | /benchmarks/{bench}/gap_to_top/{model} |
Gap to top-N models, broken down by task type |
| Method | Path | Description |
|---|---|---|
| GET | /tasks |
List tasks. Filters: type, language |
| GET | /tasks/{task}/rankings |
Model leaderboard for a task. Optional top_n |
| GET | /tasks/{task}/info |
Description, type, domains, score distribution (min/max/mean/p25/p75) |
| GET | /tasks/{task}/better_than/{model} |
Models scoring higher, with gap. Optional top_n |
| GET | /tasks/{task}/neighborhood/{model} |
Models ranked around yours. Optional radius (default 5) |
| Method | Path | Description |
|---|---|---|
| GET | /compare?models=m1,m2&benchmark=X |
Side-by-side comparison of multiple models |
| POST | /refresh |
Re-download data and atomic swap (safe while serving) |
# Where does my model rank across all benchmarks?
curl "https://mmteb-api.jina.ai/models/jinaai__jina-embeddings-v5-text-small/rank"# 1. Find weakest tasks on MTEB(eng, v2)
curl "https://mmteb-api.jina.ai/benchmarks/MTEB(eng,%20v2)/weaknesses/jinaai__jina-embeddings-v5-text-small?top_n=5"
# 2. See gap to top-3 models, broken down by task type
curl "https://mmteb-api.jina.ai/benchmarks/MTEB(eng,%20v2)/gap_to_top/jinaai__jina-embeddings-v5-text-small?top_n=3"
# 3. Scores grouped by task type
curl "https://mmteb-api.jina.ai/benchmarks/MTEB(eng,%20v2)/by_type/jinaai__jina-embeddings-v5-text-small"
# 4. Who beats you on a specific task, and by how much?
curl "https://mmteb-api.jina.ai/tasks/FiQA2018/better_than/jinaai__jina-embeddings-v5-text-small?top_n=5"
# 5. See your competitive neighborhood on a task
curl "https://mmteb-api.jina.ai/tasks/FiQA2018/neighborhood/jinaai__jina-embeddings-v5-text-small?radius=3"
# 6. Understand what a task measures
curl "https://mmteb-api.jina.ai/tasks/FiQA2018/info"
# 7. Compare two models head-to-head
curl "https://mmteb-api.jina.ai/compare?models=jinaai__jina-embeddings-v5-text-small,jinaai__jina-embeddings-v5-text-nano&benchmark=MTEB(eng,%20v2)"
# 8. After publishing new results, refresh data
curl -X POST "https://mmteb-api.jina.ai/refresh"Deploys to GCP Cloud Run (jinaai-public project, us-central1) via GitHub Actions on push to main.
Custom domain: mmteb-api.jina.ai (Cloudflare DNS -> GCP domain mapping)
- Memory: 4Gi
- CPU: 2 (with startup CPU boost, no throttling)
- Instances: 0-1 (scales to zero)
- Cold start: ~5s (pre-built pickle baked into Docker image)
uv sync --all-groups
uv run pytest tests/ -v