Intelligent model routing for OpenClaw. Automatically picks the best AI model for any task based on real benchmark data from 5 sources.
Instead of hardcoding models or guessing, Smart Spawn analyzes what you're doing and routes to the optimal model for the job — factoring in task type, budget, benchmarks, speed, and your own feedback history.
You don't need to host anything. The public API runs at ss.deeflect.com.
Install the plugin:
openclaw plugins install @deeflectcom/smart-spawn
openclaw gateway restartUse it in conversation:
"Research the latest developments in WebGPU"
Smart Spawn picks Gemini 2.5 Flash (fast, free, great context) and spawns a research sub-agent on it.
"Build me a React dashboard with auth"
Smart Spawn picks the best coding model in your budget tier and spawns a coder sub-agent.
Plugin config (optional — add to your OpenClaw config under plugins.entries.smart-spawn.config):
{
"apiUrl": "https://ss.deeflect.com/api",
"defaultBudget": "medium",
"defaultMode": "single"
}| Setting | Default | Options |
|---|---|---|
apiUrl |
https://ss.deeflect.com/api |
Your own API URL if self-hosting |
defaultBudget |
medium |
low, medium, high, any |
defaultMode |
single |
single, collective, cascade, plan, swarm |
collectiveCount |
3 |
Number of models for collective mode (2-5 recommended) |
telemetryOptIn |
false |
Opt-in to anonymous community telemetry |
communityUrl |
apiUrl |
Alternate community telemetry endpoint |
- Single — Pick one best model, spawn one agent
- Collective — Pick N diverse models, spawn parallel agents, merge results
- Cascade — Start cheap, escalate to premium if quality is insufficient
- Plan — Decompose sequential multi-step tasks and assign best model per step
- Swarm — Decompose complex tasks into a DAG of sub-tasks with optimal model per step
┌─────────────────────────────────────────────────────┐
│ Data Sources (5) │
│ │
│ OpenRouter ─── model catalog, pricing, capabilities │
│ Artificial Analysis ─── intelligence/coding/math idx │
│ HuggingFace Open LLM Leaderboard ─── MMLU, BBH, etc│
│ LMArena (Chatbot Arena) ─── ELO from human prefs │
│ LiveBench ─── contamination-free coding/reasoning │
└──────────────────────┬──────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ Enrichment Pipeline │
│ │
│ 1. Pull raw data from all 5 sources │
│ 2. Alias matching (map model names across sources) │
│ 3. Z-score normalization per benchmark │
│ 4. Category scoring (coding/reasoning/creative/...) │
│ 5. Cost-efficiency calculation │
│ 6. Tier + capability classification │
│ 7. Blend: benchmarks + personal + community scores │
│ │
│ Refreshes every 6 hours automatically │
└──────────────────────┬──────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ SQLite Cache → API → Plugin → Agent │
└──────────────────────────────────────────────────────┘
Z-score normalization — Each benchmark source uses different scales. An "intelligence index" of 65 from Artificial Analysis means something completely different than an Arena ELO of 1350. We normalize everything:
- Compute mean and stddev for each benchmark across all models
- Convert to z-scores:
(value - mean) / stddev - Map to 0-100 scale: z=-2.5→0, z=0→50, z=+1→70, z=+2→90
This means a model that's 2σ above average on LiveCodeBench gets the same score as one 2σ above average on Arena ELO — both are "equally exceptional" on their metric.
Category scores — Models get scored per category (coding, reasoning, creative, vision, research, fast-cheap, general) using weighted combinations of relevant benchmarks:
| Category | Key Benchmarks |
|---|---|
| Coding | LiveCodeBench, Agentic Coding, Coding Index |
| Reasoning | GPQA, Arena ELO, MATH-500, BBH |
| Creative | Arena ELO (human preference), LiveBench Language |
| Vision | Intelligence Index (vision-capable models) |
| Research | Arena ELO, context length bonus |
| Fast-cheap | Speed (tokens/sec), low pricing |
Score blending — Final score = weighted mix of:
- Benchmark score (primary)
- Personal feedback (your own ratings from past spawns)
- Community scores (anonymous aggregated ratings from other instances)
- Context boost (task-specific signals like "needs vision" or "long context")
| Budget | Price Range (per 1M input tokens) | Examples |
|---|---|---|
low |
$0 – $1 | DeepSeek, Kimi K2.5, Gemini Flash |
medium |
$0 – $5 | Claude Sonnet, GPT-4o, Gemini Pro |
high |
$2 – $20 | Claude Opus, GPT-5, o3 |
any |
No limit | Best available regardless of cost |
Every model is automatically classified with:
- Tier: premium / standard / budget (based on provider + pricing)
- Categories: which task types it's good at (derived from benchmarks + capabilities)
- Tags: specific traits like "fast", "vision", "reasoning", "large-context"
- Cost efficiency: quality-per-dollar ratio per category
Base URL: https://ss.deeflect.com/api
Pick the single best model for a task.
curl "https://ss.deeflect.com/api/pick?task=build+a+react+app&budget=medium"| Param | Required | Description |
|---|---|---|
task |
Yes | Task description or category name |
budget |
No | low, medium, high, any (default: medium) |
exclude |
No | Comma-separated model IDs to skip |
context |
No | Context tags (e.g. vision,long-context) |
{
"data": {
"id": "anthropic/claude-opus-4.6",
"name": "Claude Opus 4.6",
"score": 86,
"pricing": { "prompt": 5, "completion": 25 },
"budget": "medium",
"reason": "Best general model at medium budget ($0-5/M) — score: 86"
}
}Get multiple model recommendations with provider diversity.
curl "https://ss.deeflect.com/api/recommend?task=coding&budget=low&count=3"| Param | Required | Description |
|---|---|---|
task or category |
Yes | Task description or category name |
budget |
No | Budget tier (default: medium) |
count |
No | Number of recommendations, 1-5 (default: 1) |
exclude |
No | Comma-separated model IDs to skip |
require |
No | Required capabilities: vision, functionCalling, json, reasoning |
minContext |
No | Minimum context window length |
context |
No | Context tags for routing boost |
Side-by-side model comparison.
curl "https://ss.deeflect.com/api/compare?models=anthropic/claude-opus-4.6,openai/gpt-5.2"| Param | Required | Description |
|---|---|---|
models |
Yes | Comma-separated OpenRouter model IDs |
Browse the full model catalog.
curl "https://ss.deeflect.com/api/models?category=coding&sort=score&limit=10"| Param | Required | Description |
|---|---|---|
category |
No | Filter by category |
tier |
No | Filter by tier: premium, standard, budget |
sort |
No | score (default), cost, efficiency, or any category name |
limit |
No | Results to return, 1-500 (default: 50) |
Break a complex task into sequential steps with optimal model per step.
curl -X POST "https://ss.deeflect.com/api/decompose" \
-H "Content-Type: application/json" \
-d '{"task": "Build and deploy a SaaS landing page", "budget": "medium"}'Decompose a task into a parallel DAG of sub-tasks with dependency tracking.
curl -X POST "https://ss.deeflect.com/api/swarm" \
-H "Content-Type: application/json" \
-d '{"task": "Research competitors and build a pitch deck", "budget": "low"}'API health and data freshness.
curl "https://ss.deeflect.com/api/status"Force a data refresh (pulls from all 5 sources). Protected by API key if REFRESH_API_KEY is set.
curl -X POST "https://ss.deeflect.com/api/refresh" \
-H "Authorization: Bearer YOUR_KEY"Log a spawn event (used by the plugin for feedback/learning).
Report task outcome rating (1-5) for the learning loop.
Anonymous community outcome report for shared intelligence.
Compose a role-enriched prompt from persona/stack/domain blocks.
curl -X POST "https://ss.deeflect.com/api/roles/compose" \
-H "Content-Type: application/json" \
-d '{
"task": "Build a dashboard with auth and billing",
"persona": "fullstack-engineer",
"stack": ["nextjs", "typescript", "postgres", "stripe"],
"domain": "saas",
"format": "full-implementation",
"guardrails": ["code", "security", "production"]
}'Returns:
hasRole— whether any valid blocks were resolvedfullPrompt— composed prompt that includes role blocks + taskwarnings— unknown block IDs, if any
List available role block IDs for persona, stack, domain, format, and guardrails.
curl "https://ss.deeflect.com/api/roles/blocks"The API is open source. Run your own if you want full control.
git clone https://github.com/deeflect/smart-spawn.git
cd smart-spawn
bun install
bun run dev # starts on http://localhost:3000Smart Spawn now includes a local MCP server that can run async multi-agent workflows and return merged results to Codex/Claude/any MCP client.
cd mcp-server
npm install
OPENROUTER_API_KEY=your_key_here bun run startDefault local storage:
<current-working-directory>/.smart-spawn-mcp/db.sqlite<current-working-directory>/.smart-spawn-mcp/artifacts/<run_id>/...
Root scripts:
bun run mcp:dev
bun run mcp:start
bun run mcp:typecheck
bun run mcp:testRequired env vars for execution:
OPENROUTER_API_KEY
Optional env vars:
SMART_SPAWN_API_URL(default:https://ss.deeflect.com/api)SMART_SPAWN_MCP_HOME(default:<cwd>/.smart-spawn-mcp)MAX_PARALLEL_RUNS(default:2)MAX_PARALLEL_NODES_PER_RUN(default:4)MAX_USD_PER_RUN(default:5)NODE_TIMEOUT_SECONDS(default:180)RUN_TIMEOUT_SECONDS(default:1800)
Register the MCP server as a stdio process in your MCP client.
Example (claude_desktop_config.json):
{
"mcpServers": {
"smart-spawn": {
"command": "bun",
"args": [
"run",
"--cwd",
"/absolute/path/to/smart-spawn/mcp-server",
"start"
],
"env": {
"OPENROUTER_API_KEY": "your_openrouter_key_here",
"SMART_SPAWN_API_URL": "https://ss.deeflect.com/api"
}
}
}
}For Codex or any other MCP host, use the same stdio command + env values in that host's MCP server config format.
smartspawn_health— health checks for OpenRouter/API/DB/storage/workersmartspawn_run_create— create async run and returnrun_idsmartspawn_run_status— get status/progress for a runsmartspawn_run_result— get merged output (and optional raw outputs)smartspawn_artifact_get— fetch a stored artifact byrun_id+node_idsmartspawn_run_list— list recent runssmartspawn_run_cancel— cancel queued/running run
- Check health:
{"name":"smartspawn_health","arguments":{}}- Create run:
{
"name": "smartspawn_run_create",
"arguments": {
"task": "Design and implement a small REST API with tests",
"mode": "swarm",
"budget": "medium",
"role": {
"persona": "backend-engineer",
"stack": ["typescript", "nodejs", "postgres"],
"format": "full-implementation",
"guardrails": ["code", "security", "production"]
}
}
}- Poll status until terminal state (
completed,failed,canceled):
{"name":"smartspawn_run_status","arguments":{"run_id":"<run_id>"}}- Get merged result:
{"name":"smartspawn_run_result","arguments":{"run_id":"<run_id>"}}- Optional: inspect artifacts directly (example: merged output artifact):
{"name":"smartspawn_artifact_get","arguments":{"run_id":"<run_id>","node_id":"merged"}}docker build -t smart-spawn .
docker run -p 3000:3000 -v smart-spawn-data:/app/data smart-spawnThe repo includes railway.json and Dockerfile. Just connect your repo and deploy.
| Variable | Required | Description |
|---|---|---|
PORT |
No | Server port (default: 3000) |
REFRESH_API_KEY |
No | Protects /refresh endpoint. If set, requires Authorization: Bearer <key> |
- 200 requests/min per IP (all endpoints)
- 2 requests/hour per IP on
/refresh - Returns
429 Too Many RequestswithRetry-Afterheader
These are generous enough for agent use. If you're hitting limits, self-host.
smart-spawn/
├── src/ # API server
│ ├── index.ts # Hono app, middleware, startup
│ ├── db.ts # SQLite (cache, spawn logs, scores)
│ ├── types.ts # All TypeScript types
│ ├── model-selection.ts # Score sorting, blending logic
│ ├── scoring-utils.ts # Category classification, score helpers
│ ├── context-signals.ts # Context tag parsing and boost calculation
│ ├── task-splitter.ts # Task decomposition for cascade/swarm
│ ├── enrichment/
│ │ ├── pipeline.ts # Main pipeline: pull → enrich → cache
│ │ ├── scoring.ts # Z-score normalization, score computation
│ │ ├── rules.ts # Tier classification, category derivation
│ │ ├── alias-map.ts # Cross-source model name matching
│ │ └── sources/ # Data source adapters
│ │ ├── openrouter.ts # OpenRouter model catalog
│ │ ├── artificial.ts # Artificial Analysis benchmarks
│ │ ├── hf-leaderboard.ts # HuggingFace Open LLM Leaderboard
│ │ ├── lmarena.ts # LMArena / Chatbot Arena ELO
│ │ └── livebench.ts # LiveBench scores
│ ├── routes/ # API endpoints
│ ├── roles/ # Role composition blocks
│ ├── middleware/ # Rate limiting, response caching
│ └── utils/ # Input validation
├── smart-spawn/ # OpenClaw plugin
│ ├── index.ts # Plugin entry point (tool registration)
│ ├── openclaw.plugin.json # Plugin manifest
│ ├── src/api-client.ts # API client for plugin
│ └── skills/smart-spawn/ # Companion SKILL.md
├── skills/ # API-only skill (no plugin required)
│ └── SKILL.md
├── mcp-server/ # Universal MCP server (async orchestration)
│ ├── src/index.ts # MCP stdio entrypoint
│ ├── src/tools.ts # MCP tool contracts
│ ├── src/runtime/ # Planner + queue + executor
│ ├── src/db.ts # Run/node/event/artifact persistence
│ └── src/storage.ts # Artifact filesystem manager
├── data/ # SQLite database (auto-created)
├── Dockerfile
├── railway.json
└── .env.example
MIT — see LICENSE.
Built by @deeflect.