-
-
Notifications
You must be signed in to change notification settings - Fork 52
LLM Router - Multi-Provider Support (Issue #34) #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add LLM Router implementation
WalkthroughA new dual-LLM routing system module is introduced for Cortex Linux, enabling dynamic routing between Claude Sonnet 4 and Kimi K2 providers based on task type. The implementation includes cost tracking, fallback mechanisms, and statistics aggregation, accompanied by comprehensive documentation and test coverage. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant LLMRouter
participant RouteTask as Route Decision
participant Provider1 as Primary Provider
participant Provider2 as Fallback Provider
participant Stats as Cost/Stats Tracker
User->>LLMRouter: complete(messages, task_type)
LLMRouter->>RouteTask: route_task(task_type)
RouteTask-->>LLMRouter: RoutingDecision(provider)
rect rgb(200, 220, 255)
note over LLMRouter,Provider1: Primary Provider Attempt
LLMRouter->>Provider1: call API
alt Success
Provider1-->>LLMRouter: response + tokens
else Failure & enable_fallback
Provider1--X LLMRouter: error
LLMRouter->>Provider2: call API (fallback)
Provider2-->>LLMRouter: response + tokens
end
end
LLMRouter->>Stats: _update_stats(response)
Stats->>Stats: calculate_cost(tokens)
Stats-->>LLMRouter: updated
LLMRouter-->>User: LLMResponse(content, cost, latency)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README_LLM_ROUTER.md(1 hunks)llm_router.py(1 hunks)test_llm_router.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test_llm_router.py (1)
llm_router.py (14)
LLMRouter(68-429)TaskType(29-38)LLMProvider(41-44)LLMResponse(48-56)RoutingDecision(60-65)complete_task(433-459)route_task(156-204)_calculate_cost(378-388)_update_stats(390-398)get_stats(400-422)reset_stats(424-429)_complete_claude(275-331)_complete_kimi(333-376)complete(206-273)
🪛 markdownlint-cli2 (0.18.1)
README_LLM_ROUTER.md
21-21: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
358-358: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
385-385: Bare URL used
(MD034, no-bare-urls)
535-535: Bare URL used
(MD034, no-bare-urls)
536-536: Bare URL used
(MD034, no-bare-urls)
537-537: Bare URL used
(MD034, no-bare-urls)
548-548: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.4)
test_llm_router.py
1-1: Shebang is present but file is not executable
(EXE001)
298-298: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
427-427: Unused method argument: mock_anthropic
(ARG002)
532-532: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
llm_router.py
1-1: Shebang is present but file is not executable
(EXE001)
82-91: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
94-103: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
185-185: f-string without any placeholders
Remove extraneous f prefix
(F541)
188-188: Avoid specifying long messages outside the exception class
(TRY003)
192-192: f-string without any placeholders
Remove extraneous f prefix
(F541)
195-195: Avoid specifying long messages outside the exception class
(TRY003)
251-251: Consider moving this statement to an else block
(TRY300)
254-254: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
| LLMProvider.CLAUDE: { | ||
| "input": 3.0, # $3 per 1M input tokens | ||
| "output": 15.0 # $15 per 1M output tokens | ||
| }, | ||
| LLMProvider.KIMI_K2: { | ||
| "input": 1.0, # Estimated lower cost | ||
| "output": 5.0 # Estimated lower cost | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update Kimi K2 pricing constants to real rates.
COSTS still uses placeholder values ($1.00 input / $5.00 output for Kimi), so any cost report is off by ~6× compared with Moonshot’s published API pricing ($0.15 per 1M cached input, $0.60 per 1M cache miss input, $2.50 per 1M output). Please update the constants (or make them configurable) so cost tracking and provider comparisons remain meaningful. (kimi-ai.chat)
A quick adjustment looks like:
LLMProvider.KIMI_K2: {
- "input": 1.0, # Estimated lower cost
- "output": 5.0 # Estimated lower cost
+ "input_cache_hit": 0.15,
+ "input_cache_miss": 0.60,
+ "output": 2.50
}…and remember to update _calculate_cost to account for cache hits vs. misses if you want per-request accuracy.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.4)
82-91: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🤖 Prompt for AI Agents
In llm_router.py around lines 83 to 91, the COSTS entry for LLMProvider.KIMI_K2
uses placeholder rates (1.0/5.0) which are ~6× too high; update these constants
to Moonshot's published rates (input cached $0.15 per 1M, input cache-miss $0.60
per 1M, output $2.50 per 1M) or make the Kimi rates configurable via
environment/config so they can be adjusted without code changes, and modify
_calculate_cost to distinguish cached input vs cache-miss (apply $0.15 for
cached input tokens, $0.60 for cache-miss input tokens) while keeping output at
$2.50 per 1M; ensure units remain per 1M tokens and add a brief comment linking
the pricing source.
| try: | ||
| if routing.provider == LLMProvider.CLAUDE: | ||
| response = self._complete_claude( | ||
| messages, temperature, max_tokens, tools | ||
| ) | ||
| else: # KIMI_K2 | ||
| response = self._complete_kimi( | ||
| messages, temperature, max_tokens, tools | ||
| ) | ||
|
|
||
| response.latency_seconds = time.time() - start_time | ||
|
|
||
| # Track stats | ||
| if self.track_costs: | ||
| self._update_stats(response) | ||
|
|
||
| return response | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"❌ Error with {routing.provider.value}: {e}") | ||
|
|
||
| # Try fallback if enabled | ||
| if self.enable_fallback: | ||
| fallback_provider = ( | ||
| LLMProvider.KIMI_K2 if routing.provider == LLMProvider.CLAUDE | ||
| else LLMProvider.CLAUDE | ||
| ) | ||
| logger.info(f"🔄 Attempting fallback to {fallback_provider.value}") | ||
|
|
||
| return self.complete( | ||
| messages=messages, | ||
| task_type=task_type, | ||
| force_provider=fallback_provider, | ||
| temperature=temperature, | ||
| max_tokens=max_tokens, | ||
| tools=tools | ||
| ) | ||
| else: | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent infinite fallback recursion.
complete() recurses on failure without remembering which providers already failed. If both providers are down (bad credentials, network outage, provider-side 5xx, etc.), the method just ping-pongs between CLAUDE and KIMI until Python blows the recursion limit, masking the root error instead of failing fast. Please add an iterative fallback loop that tracks attempted providers and aborts once every configured model has been tried, similar to how resilient routing systems guard against fallback loops. (docs.voiceflow.com)
Apply this diff to guard against repeat providers and avoid recursion:
@@
- start_time = time.time()
-
- # Route to appropriate LLM
- routing = self.route_task(task_type, force_provider)
- logger.info(f"🧭 Routing: {routing.reasoning}")
-
- try:
- if routing.provider == LLMProvider.CLAUDE:
- response = self._complete_claude(
- messages, temperature, max_tokens, tools
- )
- else: # KIMI_K2
- response = self._complete_kimi(
- messages, temperature, max_tokens, tools
- )
-
- response.latency_seconds = time.time() - start_time
-
- # Track stats
- if self.track_costs:
- self._update_stats(response)
-
- return response
-
- except Exception as e:
- logger.error(f"❌ Error with {routing.provider.value}: {e}")
-
- # Try fallback if enabled
- if self.enable_fallback:
- fallback_provider = (
- LLMProvider.KIMI_K2 if routing.provider == LLMProvider.CLAUDE
- else LLMProvider.CLAUDE
- )
- logger.info(f"🔄 Attempting fallback to {fallback_provider.value}")
-
- return self.complete(
- messages=messages,
- task_type=task_type,
- force_provider=fallback_provider,
- temperature=temperature,
- max_tokens=max_tokens,
- tools=tools
- )
- else:
- raise
+ start_time = time.time()
+ attempted: set[LLMProvider] = set()
+ current_force = force_provider
+ last_error: Optional[Exception] = None
+
+ while True:
+ routing = self.route_task(task_type, current_force)
+ logger.info(f"🧭 Routing: {routing.reasoning}")
+ attempted.add(routing.provider)
+
+ try:
+ if routing.provider == LLMProvider.CLAUDE:
+ response = self._complete_claude(
+ messages, temperature, max_tokens, tools
+ )
+ else:
+ response = self._complete_kimi(
+ messages, temperature, max_tokens, tools
+ )
+
+ response.latency_seconds = time.time() - start_time
+ if self.track_costs:
+ self._update_stats(response)
+ return response
+
+ except Exception as exc:
+ last_error = exc
+ logger.error(f"❌ Error with {routing.provider.value}: {exc}")
+
+ if not self.enable_fallback:
+ raise
+
+ fallback_provider = (
+ LLMProvider.KIMI_K2
+ if routing.provider == LLMProvider.CLAUDE
+ else LLMProvider.CLAUDE
+ )
+ if fallback_provider in attempted:
+ raise RuntimeError(
+ "All configured providers failed; aborting fallback chain"
+ ) from exc
+
+ logger.info(f"🔄 Attempting fallback to {fallback_provider.value}")
+ current_force = fallback_providerCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.4)
251-251: Consider moving this statement to an else block
(TRY300)
254-254: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
| if system_message: | ||
| kwargs["system"] = system_message | ||
|
|
||
| if tools: | ||
| # Convert OpenAI tool format to Claude format if needed | ||
| kwargs["tools"] = tools | ||
|
|
||
| response = self.claude_client.messages.create(**kwargs) | ||
|
|
||
| # Extract content | ||
| content = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map OpenAI-style tools to Claude’s schema.
Claude’s Messages API expects each tool to provide name, description, and an input_schema. Passing the OpenAI {"type": "function", "function": {...}} blob straight through will be rejected by Anthropic with tools.*.input_schema validation errors. Please normalize the payload (or raise early) so Claude receives the JSON Schema it requires. (docs.claude.com)
You can adapt the helper like this:
+ def _format_claude_tools(self, tools: Optional[List[Dict]]) -> Optional[List[Dict]]:
+ if not tools:
+ return None
+
+ formatted = []
+ for tool in tools:
+ if "function" in tool:
+ fn = tool["function"]
+ formatted.append({
+ "name": fn["name"],
+ "description": fn.get("description", ""),
+ "input_schema": fn.get("parameters", {"type": "object", "properties": {}})
+ })
+ else:
+ formatted.append(tool)
+ return formatted
@@
- if tools:
- # Convert OpenAI tool format to Claude format if needed
- kwargs["tools"] = tools
+ claude_tools = self._format_claude_tools(tools)
+ if claude_tools:
+ kwargs["tools"] = claude_tools📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if system_message: | |
| kwargs["system"] = system_message | |
| if tools: | |
| # Convert OpenAI tool format to Claude format if needed | |
| kwargs["tools"] = tools | |
| response = self.claude_client.messages.create(**kwargs) | |
| # Extract content | |
| content = "" | |
| if system_message: | |
| kwargs["system"] = system_message | |
| claude_tools = self._format_claude_tools(tools) | |
| if claude_tools: | |
| kwargs["tools"] = claude_tools | |
| response = self.claude_client.messages.create(**kwargs) | |
| # Extract content | |
| content = "" |
🤖 Prompt for AI Agents
In llm_router.py around lines 301 to 311, the code passes OpenAI-style tool
blobs straight to Claude via kwargs["tools"], which will fail Claude's
tools.*.input_schema validation; update the logic that assigns kwargs["tools"]
to normalize each tool to Claude's schema (ensure each tool is an object with
name, description and input_schema), by detecting OpenAI-style entries (e.g.
type=="function" or a "function" field), extracting the function
name/description and converting its parameter schema into a JSON Schema for
input_schema (or map simple param lists into a minimal JSON Schema), and if a
tool cannot be converted raise an early descriptive error; finally assign the
normalized list to kwargs["tools"] before calling
self.claude_client.messages.create(...).
|



Closes #34
Implementation
Files
src/llm_router.py- Main router implementationsrc/test_llm_router.py- Test suitedocs/README_LLM_ROUTER.md- DocumentationReady for testing and integration.
Summary by CodeRabbit
New Features
Documentation
Tests