LLM Router - Multi-Provider Support (Issue #34) #41

mikejmorgan-ai · 2025-11-12T01:58:14Z

Closes #34

Implementation

Multi-provider LLM routing (Claude, GPT-4, Gemini, Kimi K2)
Automatic failover between providers
Cost optimization with provider preferences
Comprehensive error handling and retries
Full test coverage

Files

src/llm_router.py - Main router implementation
src/test_llm_router.py - Test suite
docs/README_LLM_ROUTER.md - Documentation

Ready for testing and integration.

Summary by CodeRabbit

New Features
- Dual-LLM routing system enabling automatic selection between two providers with intelligent task-based routing
- Fallback mechanism to ensure request completion if primary provider is unavailable
- Real-time cost tracking and usage statistics per provider
Documentation
- Comprehensive guide covering architecture, API usage, configuration, performance benchmarks, and deployment options
Tests
- Full test suite validating routing logic, fallback behavior, cost calculations, and provider integrations

Add LLM Router implementation

coderabbitai · 2025-11-12T01:58:21Z

Walkthrough

A new dual-LLM routing system module is introduced for Cortex Linux, enabling dynamic routing between Claude Sonnet 4 and Kimi K2 providers based on task type. The implementation includes cost tracking, fallback mechanisms, and statistics aggregation, accompanied by comprehensive documentation and test coverage.

Changes

Cohort / File(s)	Change Summary
Documentation `README_LLM_ROUTER.md`	New file documenting the dual-LLM routing architecture, API usage examples, configuration options, performance benchmarks, deployment strategies, and troubleshooting guidelines.
Core Implementation `llm_router.py`	New module implementing TaskType and LLMProvider enums, LLMResponse and RoutingDecision dataclasses, LLMRouter class with routing logic, provider-specific completion handlers, cost calculation, and usage statistics tracking. Includes complete_task convenience function and CLI demonstration.
Test Suite `test_llm_router.py`	New comprehensive test suite covering routing logic, fallback behavior, cost tracking, mocked API integrations for Claude and Kimi K2, end-to-end scenarios, and convenience function validation.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant LLMRouter
    participant RouteTask as Route Decision
    participant Provider1 as Primary Provider
    participant Provider2 as Fallback Provider
    participant Stats as Cost/Stats Tracker

    User->>LLMRouter: complete(messages, task_type)
    LLMRouter->>RouteTask: route_task(task_type)
    RouteTask-->>LLMRouter: RoutingDecision(provider)
    
    rect rgb(200, 220, 255)
    note over LLMRouter,Provider1: Primary Provider Attempt
    LLMRouter->>Provider1: call API
    alt Success
        Provider1-->>LLMRouter: response + tokens
    else Failure & enable_fallback
        Provider1--X LLMRouter: error
        LLMRouter->>Provider2: call API (fallback)
        Provider2-->>LLMRouter: response + tokens
    end
    end
    
    LLMRouter->>Stats: _update_stats(response)
    Stats->>Stats: calculate_cost(tokens)
    Stats-->>LLMRouter: updated
    LLMRouter-->>User: LLMResponse(content, cost, latency)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Routing logic validation: Verify task-to-provider mappings and confidence scoring logic
Cost calculation accuracy: Confirm per-provider token pricing and aggregation across providers
Fallback mechanism: Ensure graceful degradation and error propagation when both providers fail or fallback is disabled
API integration mocking: Review mock implementations for Claude and Kimi K2 to ensure test scenarios adequately represent real API behavior
Statistics tracking correctness: Validate that usage stats correctly track per-provider and total metrics

Poem

🐰 Hoppy code hops along two paths so bright,
Claude and Kimi choose the provider right,
When one stumbles, fallback saves the day,
Costs are tracked, stats light the way! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements LLM routing for Claude/Kimi K2 but Issue #34 describes an AI Context Memory System with pattern recognition and preference management features that are not addressed.	Clarify whether this PR addresses Issue #34 or a different issue. The implementation appears to mismatch the stated objectives from the linked issue.
Out of Scope Changes check	⚠️ Warning	The implementation focuses on LLM provider routing and cost tracking, which diverges significantly from the AI Context Memory System requirements described in Issue #34.	Either update Issue #34 description to match the LLM routing implementation, or create a new issue for the LLM routing feature separate from the Memory System requirements.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: implementing multi-provider LLM routing support, which is the primary objective of the pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/issue-34

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e98593 and c317a65.

📒 Files selected for processing (3)

README_LLM_ROUTER.md (1 hunks)
llm_router.py (1 hunks)
test_llm_router.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

test_llm_router.py (1)

llm_router.py (14)

LLMRouter (68-429)

TaskType (29-38)

LLMProvider (41-44)

LLMResponse (48-56)

RoutingDecision (60-65)

complete_task (433-459)

route_task (156-204)

_calculate_cost (378-388)

_update_stats (390-398)

get_stats (400-422)

reset_stats (424-429)

_complete_claude (275-331)

_complete_kimi (333-376)

complete (206-273)

🪛 markdownlint-cli2 (0.18.1)

README_LLM_ROUTER.md

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

358-358: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

385-385: Bare URL used

(MD034, no-bare-urls)

535-535: Bare URL used

(MD034, no-bare-urls)

536-536: Bare URL used

(MD034, no-bare-urls)

537-537: Bare URL used

(MD034, no-bare-urls)

548-548: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.4)

test_llm_router.py

1-1: Shebang is present but file is not executable

(EXE001)

298-298: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

427-427: Unused method argument: mock_anthropic

(ARG002)

532-532: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

llm_router.py

1-1: Shebang is present but file is not executable

(EXE001)

82-91: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

94-103: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

185-185: f-string without any placeholders

Remove extraneous f prefix

(F541)

188-188: Avoid specifying long messages outside the exception class

(TRY003)

192-192: f-string without any placeholders

Remove extraneous f prefix

(F541)

195-195: Avoid specifying long messages outside the exception class

(TRY003)

251-251: Consider moving this statement to an else block

(TRY300)

254-254: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

coderabbitai · 2025-11-12T02:05:27Z

llm_router.py

+        LLMProvider.CLAUDE: {
+            "input": 3.0,   # $3 per 1M input tokens
+            "output": 15.0  # $15 per 1M output tokens
+        },
+        LLMProvider.KIMI_K2: {
+            "input": 1.0,   # Estimated lower cost
+            "output": 5.0   # Estimated lower cost
+        }
+    }


⚠️ Potential issue | 🟠 Major

Update Kimi K2 pricing constants to real rates.

COSTS still uses placeholder values ($1.00 input / $5.00 output for Kimi), so any cost report is off by ~6× compared with Moonshot’s published API pricing ($0.15 per 1M cached input, $0.60 per 1M cache miss input, $2.50 per 1M output). Please update the constants (or make them configurable) so cost tracking and provider comparisons remain meaningful. (kimi-ai.chat)

A quick adjustment looks like:

LLMProvider.KIMI_K2: { - "input": 1.0, # Estimated lower cost - "output": 5.0 # Estimated lower cost + "input_cache_hit": 0.15, + "input_cache_miss": 0.60, + "output": 2.50 }

…and remember to update _calculate_cost to account for cache hits vs. misses if you want per-request accuracy.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.4)

82-91: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents

In llm_router.py around lines 83 to 91, the COSTS entry for LLMProvider.KIMI_K2 uses placeholder rates (1.0/5.0) which are ~6× too high; update these constants to Moonshot's published rates (input cached $0.15 per 1M, input cache-miss $0.60 per 1M, output $2.50 per 1M) or make the Kimi rates configurable via environment/config so they can be adjusted without code changes, and modify _calculate_cost to distinguish cached input vs cache-miss (apply $0.15 for cached input tokens, $0.60 for cache-miss input tokens) while keeping output at $2.50 per 1M; ensure units remain per 1M tokens and add a brief comment linking the pricing source.

coderabbitai · 2025-11-12T02:05:27Z

llm_router.py

+        try:
+            if routing.provider == LLMProvider.CLAUDE:
+                response = self._complete_claude(
+                    messages, temperature, max_tokens, tools
+                )
+            else:  # KIMI_K2
+                response = self._complete_kimi(
+                    messages, temperature, max_tokens, tools
+                )
+
+            response.latency_seconds = time.time() - start_time
+
+            # Track stats
+            if self.track_costs:
+                self._update_stats(response)
+
+            return response
+
+        except Exception as e:
+            logger.error(f"❌ Error with {routing.provider.value}: {e}")
+
+            # Try fallback if enabled
+            if self.enable_fallback:
+                fallback_provider = (
+                    LLMProvider.KIMI_K2 if routing.provider == LLMProvider.CLAUDE
+                    else LLMProvider.CLAUDE
+                )
+                logger.info(f"🔄 Attempting fallback to {fallback_provider.value}")
+
+                return self.complete(
+                    messages=messages,
+                    task_type=task_type,
+                    force_provider=fallback_provider,
+                    temperature=temperature,
+                    max_tokens=max_tokens,
+                    tools=tools
+                )
+            else:
+                raise


⚠️ Potential issue | 🔴 Critical

Prevent infinite fallback recursion.

complete() recurses on failure without remembering which providers already failed. If both providers are down (bad credentials, network outage, provider-side 5xx, etc.), the method just ping-pongs between CLAUDE and KIMI until Python blows the recursion limit, masking the root error instead of failing fast. Please add an iterative fallback loop that tracks attempted providers and aborts once every configured model has been tried, similar to how resilient routing systems guard against fallback loops. (docs.voiceflow.com)

Apply this diff to guard against repeat providers and avoid recursion:

@@ - start_time = time.time() - - # Route to appropriate LLM - routing = self.route_task(task_type, force_provider) - logger.info(f"🧭 Routing: {routing.reasoning}") - - try: - if routing.provider == LLMProvider.CLAUDE: - response = self._complete_claude( - messages, temperature, max_tokens, tools - ) - else: # KIMI_K2 - response = self._complete_kimi( - messages, temperature, max_tokens, tools - ) - - response.latency_seconds = time.time() - start_time - - # Track stats - if self.track_costs: - self._update_stats(response) - - return response - - except Exception as e: - logger.error(f"❌ Error with {routing.provider.value}: {e}") - - # Try fallback if enabled - if self.enable_fallback: - fallback_provider = ( - LLMProvider.KIMI_K2 if routing.provider == LLMProvider.CLAUDE - else LLMProvider.CLAUDE - ) - logger.info(f"🔄 Attempting fallback to {fallback_provider.value}") - - return self.complete( - messages=messages, - task_type=task_type, - force_provider=fallback_provider, - temperature=temperature, - max_tokens=max_tokens, - tools=tools - ) - else: - raise + start_time = time.time() + attempted: set[LLMProvider] = set() + current_force = force_provider + last_error: Optional[Exception] = None + + while True: + routing = self.route_task(task_type, current_force) + logger.info(f"🧭 Routing: {routing.reasoning}") + attempted.add(routing.provider) + + try: + if routing.provider == LLMProvider.CLAUDE: + response = self._complete_claude( + messages, temperature, max_tokens, tools + ) + else: + response = self._complete_kimi( + messages, temperature, max_tokens, tools + ) + + response.latency_seconds = time.time() - start_time + if self.track_costs: + self._update_stats(response) + return response + + except Exception as exc: + last_error = exc + logger.error(f"❌ Error with {routing.provider.value}: {exc}") + + if not self.enable_fallback: + raise + + fallback_provider = ( + LLMProvider.KIMI_K2 + if routing.provider == LLMProvider.CLAUDE + else LLMProvider.CLAUDE + ) + if fallback_provider in attempted: + raise RuntimeError( + "All configured providers failed; aborting fallback chain" + ) from exc + + logger.info(f"🔄 Attempting fallback to {fallback_provider.value}") + current_force = fallback_provider

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.4)

251-251: Consider moving this statement to an else block

(TRY300)

254-254: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

coderabbitai · 2025-11-12T02:05:27Z

llm_router.py

+        if system_message:
+            kwargs["system"] = system_message
+
+        if tools:
+            # Convert OpenAI tool format to Claude format if needed
+            kwargs["tools"] = tools
+
+        response = self.claude_client.messages.create(**kwargs)
+
+        # Extract content
+        content = ""


⚠️ Potential issue | 🔴 Critical

Map OpenAI-style tools to Claude’s schema.

Claude’s Messages API expects each tool to provide name, description, and an input_schema. Passing the OpenAI {"type": "function", "function": {...}} blob straight through will be rejected by Anthropic with tools.*.input_schema validation errors. Please normalize the payload (or raise early) so Claude receives the JSON Schema it requires. (docs.claude.com)

You can adapt the helper like this:

+ def _format_claude_tools(self, tools: Optional[List[Dict]]) -> Optional[List[Dict]]: + if not tools: + return None + + formatted = [] + for tool in tools: + if "function" in tool: + fn = tool["function"] + formatted.append({ + "name": fn["name"], + "description": fn.get("description", ""), + "input_schema": fn.get("parameters", {"type": "object", "properties": {}}) + }) + else: + formatted.append(tool) + return formatted @@ - if tools: - # Convert OpenAI tool format to Claude format if needed - kwargs["tools"] = tools + claude_tools = self._format_claude_tools(tools) + if claude_tools: + kwargs["tools"] = claude_tools

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if system_message:

kwargs["system"] = system_message

if tools:

# Convert OpenAI tool format to Claude format if needed

kwargs["tools"] = tools

response = self.claude_client.messages.create(**kwargs)

# Extract content

content = ""

if system_message:

kwargs["system"] = system_message

claude_tools = self._format_claude_tools(tools)

if claude_tools:

kwargs["tools"] = claude_tools

response = self.claude_client.messages.create(**kwargs)

# Extract content

content = ""

🤖 Prompt for AI Agents

In llm_router.py around lines 301 to 311, the code passes OpenAI-style tool blobs straight to Claude via kwargs["tools"], which will fail Claude's tools.*.input_schema validation; update the logic that assigns kwargs["tools"] to normalize each tool to Claude's schema (ensure each tool is an object with name, description and input_schema), by detecting OpenAI-style entries (e.g. type=="function" or a "function" field), extracting the function name/description and converting its parameter schema into a JSON Schema for input_schema (or map simple param lists into a minimal JSON Schema), and if a tool cannot be converted raise an early descriptive error; finally assign the normalized list to kwargs["tools"] before calling self.claude_client.messages.create(...).

sonarqubecloud · 2025-11-12T10:36:37Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Add files via upload

c317a65

Add LLM Router implementation

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

mikejmorgan-ai merged commit 7b0f41b into main Nov 18, 2025
2 checks passed

mikejmorgan-ai deleted the feature/issue-34 branch November 18, 2025 03:57

This was referenced Jan 16, 2026

[tutor] Add AI-Powered Installation Tutor #566

Closed

feat: Implement Predictive Error Prevention System #659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM Router - Multi-Provider Support (Issue #34) #41

LLM Router - Multi-Provider Support (Issue #34) #41

Uh oh!

mikejmorgan-ai commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

sonarqubecloud bot commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

LLM Router - Multi-Provider Support (Issue #34) #41

LLM Router - Multi-Provider Support (Issue #34) #41

Uh oh!

Conversation

mikejmorgan-ai commented Nov 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Files

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Nov 12, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikejmorgan-ai commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 12, 2025 •

edited

Loading