Scoring-based multi-model LLM router with circuit breakers, quality gates, and session persistence.
Routes your prompts to the best available LLM based on:
- Capability fit — matches task type (code, reasoning, quick, vision) to model strengths
- Cost — prefers cheaper models when quality is sufficient
- Latency — historical response times from health telemetry
- Reliability — recent success rate per model
- Circuit breaker — automatically stops routing to failing models
User message
→ Task classifier (code/reasoning/quick/vision)
→ Scoring router (weighted composite score)
→ Provider call (OpenAI/Anthropic/Google/Ollama)
→ Quality gate (empty/greeting/refusal/repetition/JSON checks)
→ Retry/fallback on failure
→ SQLite persistence (sessions, turns, health telemetry)
hot-source chat "explain this error" # Auto-route to best model
hot-source chat "@gemini describe this" # Force specific provider
hot-source rank "fix this bug" # Show model ranking
hot-source stats # Model health + breaker states
hot-source reset-breaker gemini-2.5-flash # Reset tripped breakerSupports any combination — only needs API keys for providers you use:
- OpenAI (
OPENAI_API_KEY) - Anthropic (
ANTHROPIC_API_KEY) - Google/Gemini (
GOOGLE_API_KEYorGEMINI_API_KEY) - Ollama (local, no key needed)
from src.engine import HotSourceEngine
from src.providers.openai_provider import OpenAIProvider
engine = HotSourceEngine()
engine.add_provider(OpenAIProvider())
sid = engine.session()
result = engine.chat("explain quantum computing", session_id=sid)
print(result.content)pip install -e ".[dev]"
pytest37 tests covering DB, circuit breaker, quality gate, scorer, and eval framework.
MIT