Data-driven LLM model routing. Fetches cost, intelligence, and performance data from the Artificial Analysis API, computes the Pareto frontier, and generates tier-based routing profiles.
See ROUTER_SPEC.md for the runtime implementation spec (classifier, fallback logic, optimizations).
bun installGet an API key from artificialanalysis.ai (free tier: 1,000 requests/day).
export AA_API_KEY=your_key_herebun run tracker:saveThat's it. The tracker fetches model data, auto-calibrates tier thresholds from the cost/intelligence distribution, builds the model registry, generates all 5 routing profiles, and writes everything to config/.
Tier thresholds are auto-calibrated from the data by default. The system uses percentiles of the cost and intelligence distributions to derive boundaries:
| Tier | Cost ceiling | Intelligence floor |
|---|---|---|
| simple | P35 cost | P5 intelligence |
| medium | P60 cost | P20 intelligence |
| complex | P85 cost | P40 intelligence |
| reasoning | P85 cost | P40 intelligence + P30 math |
| premium | uncapped | P70 intelligence |
Bands deliberately overlap so cost-effective models (e.g. MiniMax) appear across multiple tiers, creating a mixture. Within each tier, the profile strategy (cost-efficiency, maximize intelligence, etc.) determines the ranking.
As the model landscape shifts (new models, price changes, score updates), thresholds automatically adapt on each run.
After each run, config/MODELS.md shows the full breakdown of which models land in which tiers.
To override with manual thresholds, set calibration.method to "manual":
bun run tracker:save # Fetch, calibrate, write config + changelog
bun run tracker:dry-run # Fetch, calibrate, print report (no writes)
bun run tracker:explore # Dump scatter table + Pareto frontier + auto thresholds| Flag | Effect |
|---|---|
--explore |
Dump scatter + Pareto frontier + auto thresholds, no writes |
--dry-run |
Analyze + report, no writes (default) |
--save |
Write config files + changelog |
--no-cache |
Force fresh API fetch (skip 6h local cache) |
Responses are cached locally at config/.aa-cache.json with a 6-hour TTL. Repeated runs within that window don't hit the API. Use --no-cache to bypass.
bun test # run tests
bun run typecheck # tsc --noEmit
bun run lint # biome check
bun run lint:fix # biome check --write.github/workflows/tracker.yml runs the tracker weekly (Monday 9am UTC). On config changes it commits to a branch and opens a PR for review.
Requires AA_API_KEY set as a repository secret.
Model benchmarks, pricing, and performance data provided by Artificial Analysis.
{ "calibration": { "method": "manual" }, "tiers": { "simple": { "maxBlendedCost": 1.0, "minIntelligence": 20 }, "medium": { "maxBlendedCost": 3.0, "minIntelligence": 35 }, "complex": { "maxBlendedCost": 10.0, "minIntelligence": 50 }, "reasoning": { "maxBlendedCost": 10.0, "minIntelligence": 50, "minMathIndex": 40 }, "premium": { "maxBlendedCost": null, "minIntelligence": 60 } } }