Skip to content

Conversation

@mikejmorgan-ai
Copy link
Member

@mikejmorgan-ai mikejmorgan-ai commented Nov 12, 2025

Closes #34

Implementation

  • Multi-provider LLM routing (Claude, GPT-4, Gemini, Kimi K2)
  • Automatic failover between providers
  • Cost optimization with provider preferences
  • Comprehensive error handling and retries
  • Full test coverage

Files

  • src/llm_router.py - Main router implementation
  • src/test_llm_router.py - Test suite
  • docs/README_LLM_ROUTER.md - Documentation

Ready for testing and integration.

Summary by CodeRabbit

  • New Features

    • Dual-LLM routing system enabling automatic selection between two providers with intelligent task-based routing
    • Fallback mechanism to ensure request completion if primary provider is unavailable
    • Real-time cost tracking and usage statistics per provider
  • Documentation

    • Comprehensive guide covering architecture, API usage, configuration, performance benchmarks, and deployment options
  • Tests

    • Full test suite validating routing logic, fallback behavior, cost calculations, and provider integrations

Add LLM Router implementation
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 12, 2025

Walkthrough

A new dual-LLM routing system module is introduced for Cortex Linux, enabling dynamic routing between Claude Sonnet 4 and Kimi K2 providers based on task type. The implementation includes cost tracking, fallback mechanisms, and statistics aggregation, accompanied by comprehensive documentation and test coverage.

Changes

Cohort / File(s) Change Summary
Documentation
README_LLM_ROUTER.md
New file documenting the dual-LLM routing architecture, API usage examples, configuration options, performance benchmarks, deployment strategies, and troubleshooting guidelines.
Core Implementation
llm_router.py
New module implementing TaskType and LLMProvider enums, LLMResponse and RoutingDecision dataclasses, LLMRouter class with routing logic, provider-specific completion handlers, cost calculation, and usage statistics tracking. Includes complete_task convenience function and CLI demonstration.
Test Suite
test_llm_router.py
New comprehensive test suite covering routing logic, fallback behavior, cost tracking, mocked API integrations for Claude and Kimi K2, end-to-end scenarios, and convenience function validation.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant LLMRouter
    participant RouteTask as Route Decision
    participant Provider1 as Primary Provider
    participant Provider2 as Fallback Provider
    participant Stats as Cost/Stats Tracker

    User->>LLMRouter: complete(messages, task_type)
    LLMRouter->>RouteTask: route_task(task_type)
    RouteTask-->>LLMRouter: RoutingDecision(provider)
    
    rect rgb(200, 220, 255)
    note over LLMRouter,Provider1: Primary Provider Attempt
    LLMRouter->>Provider1: call API
    alt Success
        Provider1-->>LLMRouter: response + tokens
    else Failure & enable_fallback
        Provider1--X LLMRouter: error
        LLMRouter->>Provider2: call API (fallback)
        Provider2-->>LLMRouter: response + tokens
    end
    end
    
    LLMRouter->>Stats: _update_stats(response)
    Stats->>Stats: calculate_cost(tokens)
    Stats-->>LLMRouter: updated
    LLMRouter-->>User: LLMResponse(content, cost, latency)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Routing logic validation: Verify task-to-provider mappings and confidence scoring logic
  • Cost calculation accuracy: Confirm per-provider token pricing and aggregation across providers
  • Fallback mechanism: Ensure graceful degradation and error propagation when both providers fail or fallback is disabled
  • API integration mocking: Review mock implementations for Claude and Kimi K2 to ensure test scenarios adequately represent real API behavior
  • Statistics tracking correctness: Validate that usage stats correctly track per-provider and total metrics

Poem

🐰 Hoppy code hops along two paths so bright,
Claude and Kimi choose the provider right,
When one stumbles, fallback saves the day,
Costs are tracked, stats light the way! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR implements LLM routing for Claude/Kimi K2 but Issue #34 describes an AI Context Memory System with pattern recognition and preference management features that are not addressed. Clarify whether this PR addresses Issue #34 or a different issue. The implementation appears to mismatch the stated objectives from the linked issue.
Out of Scope Changes check ⚠️ Warning The implementation focuses on LLM provider routing and cost tracking, which diverges significantly from the AI Context Memory System requirements described in Issue #34. Either update Issue #34 description to match the LLM routing implementation, or create a new issue for the LLM routing feature separate from the Memory System requirements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: implementing multi-provider LLM routing support, which is the primary objective of the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/issue-34

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e98593 and c317a65.

📒 Files selected for processing (3)
  • README_LLM_ROUTER.md (1 hunks)
  • llm_router.py (1 hunks)
  • test_llm_router.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test_llm_router.py (1)
llm_router.py (14)
  • LLMRouter (68-429)
  • TaskType (29-38)
  • LLMProvider (41-44)
  • LLMResponse (48-56)
  • RoutingDecision (60-65)
  • complete_task (433-459)
  • route_task (156-204)
  • _calculate_cost (378-388)
  • _update_stats (390-398)
  • get_stats (400-422)
  • reset_stats (424-429)
  • _complete_claude (275-331)
  • _complete_kimi (333-376)
  • complete (206-273)
🪛 markdownlint-cli2 (0.18.1)
README_LLM_ROUTER.md

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


358-358: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


385-385: Bare URL used

(MD034, no-bare-urls)


535-535: Bare URL used

(MD034, no-bare-urls)


536-536: Bare URL used

(MD034, no-bare-urls)


537-537: Bare URL used

(MD034, no-bare-urls)


548-548: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.4)
test_llm_router.py

1-1: Shebang is present but file is not executable

(EXE001)


298-298: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)


427-427: Unused method argument: mock_anthropic

(ARG002)


532-532: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

llm_router.py

1-1: Shebang is present but file is not executable

(EXE001)


82-91: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


94-103: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


185-185: f-string without any placeholders

Remove extraneous f prefix

(F541)


188-188: Avoid specifying long messages outside the exception class

(TRY003)


192-192: f-string without any placeholders

Remove extraneous f prefix

(F541)


195-195: Avoid specifying long messages outside the exception class

(TRY003)


251-251: Consider moving this statement to an else block

(TRY300)


254-254: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

Comment on lines +83 to +91
LLMProvider.CLAUDE: {
"input": 3.0, # $3 per 1M input tokens
"output": 15.0 # $15 per 1M output tokens
},
LLMProvider.KIMI_K2: {
"input": 1.0, # Estimated lower cost
"output": 5.0 # Estimated lower cost
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update Kimi K2 pricing constants to real rates.

COSTS still uses placeholder values ($1.00 input / $5.00 output for Kimi), so any cost report is off by ~6× compared with Moonshot’s published API pricing ($0.15 per 1M cached input, $0.60 per 1M cache miss input, $2.50 per 1M output). Please update the constants (or make them configurable) so cost tracking and provider comparisons remain meaningful. (kimi-ai.chat)

A quick adjustment looks like:

         LLMProvider.KIMI_K2: {
-            "input": 1.0,   # Estimated lower cost
-            "output": 5.0   # Estimated lower cost
+            "input_cache_hit": 0.15,
+            "input_cache_miss": 0.60,
+            "output": 2.50
         }

…and remember to update _calculate_cost to account for cache hits vs. misses if you want per-request accuracy.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.4)

82-91: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In llm_router.py around lines 83 to 91, the COSTS entry for LLMProvider.KIMI_K2
uses placeholder rates (1.0/5.0) which are ~6× too high; update these constants
to Moonshot's published rates (input cached $0.15 per 1M, input cache-miss $0.60
per 1M, output $2.50 per 1M) or make the Kimi rates configurable via
environment/config so they can be adjusted without code changes, and modify
_calculate_cost to distinguish cached input vs cache-miss (apply $0.15 for
cached input tokens, $0.60 for cache-miss input tokens) while keeping output at
$2.50 per 1M; ensure units remain per 1M tokens and add a brief comment linking
the pricing source.

Comment on lines +235 to +273
try:
if routing.provider == LLMProvider.CLAUDE:
response = self._complete_claude(
messages, temperature, max_tokens, tools
)
else: # KIMI_K2
response = self._complete_kimi(
messages, temperature, max_tokens, tools
)

response.latency_seconds = time.time() - start_time

# Track stats
if self.track_costs:
self._update_stats(response)

return response

except Exception as e:
logger.error(f"❌ Error with {routing.provider.value}: {e}")

# Try fallback if enabled
if self.enable_fallback:
fallback_provider = (
LLMProvider.KIMI_K2 if routing.provider == LLMProvider.CLAUDE
else LLMProvider.CLAUDE
)
logger.info(f"🔄 Attempting fallback to {fallback_provider.value}")

return self.complete(
messages=messages,
task_type=task_type,
force_provider=fallback_provider,
temperature=temperature,
max_tokens=max_tokens,
tools=tools
)
else:
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prevent infinite fallback recursion.

complete() recurses on failure without remembering which providers already failed. If both providers are down (bad credentials, network outage, provider-side 5xx, etc.), the method just ping-pongs between CLAUDE and KIMI until Python blows the recursion limit, masking the root error instead of failing fast. Please add an iterative fallback loop that tracks attempted providers and aborts once every configured model has been tried, similar to how resilient routing systems guard against fallback loops. (docs.voiceflow.com)

Apply this diff to guard against repeat providers and avoid recursion:

@@
-        start_time = time.time()
-        
-        # Route to appropriate LLM
-        routing = self.route_task(task_type, force_provider)
-        logger.info(f"🧭 Routing: {routing.reasoning}")
-        
-        try:
-            if routing.provider == LLMProvider.CLAUDE:
-                response = self._complete_claude(
-                    messages, temperature, max_tokens, tools
-                )
-            else:  # KIMI_K2
-                response = self._complete_kimi(
-                    messages, temperature, max_tokens, tools
-                )
-            
-            response.latency_seconds = time.time() - start_time
-            
-            # Track stats
-            if self.track_costs:
-                self._update_stats(response)
-            
-            return response
-            
-        except Exception as e:
-            logger.error(f"❌ Error with {routing.provider.value}: {e}")
-            
-            # Try fallback if enabled
-            if self.enable_fallback:
-                fallback_provider = (
-                    LLMProvider.KIMI_K2 if routing.provider == LLMProvider.CLAUDE
-                    else LLMProvider.CLAUDE
-                )
-                logger.info(f"🔄 Attempting fallback to {fallback_provider.value}")
-                
-                return self.complete(
-                    messages=messages,
-                    task_type=task_type,
-                    force_provider=fallback_provider,
-                    temperature=temperature,
-                    max_tokens=max_tokens,
-                    tools=tools
-                )
-            else:
-                raise
+        start_time = time.time()
+        attempted: set[LLMProvider] = set()
+        current_force = force_provider
+        last_error: Optional[Exception] = None
+
+        while True:
+            routing = self.route_task(task_type, current_force)
+            logger.info(f"🧭 Routing: {routing.reasoning}")
+            attempted.add(routing.provider)
+
+            try:
+                if routing.provider == LLMProvider.CLAUDE:
+                    response = self._complete_claude(
+                        messages, temperature, max_tokens, tools
+                    )
+                else:
+                    response = self._complete_kimi(
+                        messages, temperature, max_tokens, tools
+                    )
+
+                response.latency_seconds = time.time() - start_time
+                if self.track_costs:
+                    self._update_stats(response)
+                return response
+
+            except Exception as exc:
+                last_error = exc
+                logger.error(f"❌ Error with {routing.provider.value}: {exc}")
+
+                if not self.enable_fallback:
+                    raise
+
+                fallback_provider = (
+                    LLMProvider.KIMI_K2
+                    if routing.provider == LLMProvider.CLAUDE
+                    else LLMProvider.CLAUDE
+                )
+                if fallback_provider in attempted:
+                    raise RuntimeError(
+                        "All configured providers failed; aborting fallback chain"
+                    ) from exc
+
+                logger.info(f"🔄 Attempting fallback to {fallback_provider.value}")
+                current_force = fallback_provider

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.4)

251-251: Consider moving this statement to an else block

(TRY300)


254-254: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

Comment on lines +301 to +311
if system_message:
kwargs["system"] = system_message

if tools:
# Convert OpenAI tool format to Claude format if needed
kwargs["tools"] = tools

response = self.claude_client.messages.create(**kwargs)

# Extract content
content = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Map OpenAI-style tools to Claude’s schema.

Claude’s Messages API expects each tool to provide name, description, and an input_schema. Passing the OpenAI {"type": "function", "function": {...}} blob straight through will be rejected by Anthropic with tools.*.input_schema validation errors. Please normalize the payload (or raise early) so Claude receives the JSON Schema it requires. (docs.claude.com)

You can adapt the helper like this:

+    def _format_claude_tools(self, tools: Optional[List[Dict]]) -> Optional[List[Dict]]:
+        if not tools:
+            return None
+
+        formatted = []
+        for tool in tools:
+            if "function" in tool:
+                fn = tool["function"]
+                formatted.append({
+                    "name": fn["name"],
+                    "description": fn.get("description", ""),
+                    "input_schema": fn.get("parameters", {"type": "object", "properties": {}})
+                })
+            else:
+                formatted.append(tool)
+        return formatted
@@
-        if tools:
-            # Convert OpenAI tool format to Claude format if needed
-            kwargs["tools"] = tools
+        claude_tools = self._format_claude_tools(tools)
+        if claude_tools:
+            kwargs["tools"] = claude_tools
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if system_message:
kwargs["system"] = system_message
if tools:
# Convert OpenAI tool format to Claude format if needed
kwargs["tools"] = tools
response = self.claude_client.messages.create(**kwargs)
# Extract content
content = ""
if system_message:
kwargs["system"] = system_message
claude_tools = self._format_claude_tools(tools)
if claude_tools:
kwargs["tools"] = claude_tools
response = self.claude_client.messages.create(**kwargs)
# Extract content
content = ""
🤖 Prompt for AI Agents
In llm_router.py around lines 301 to 311, the code passes OpenAI-style tool
blobs straight to Claude via kwargs["tools"], which will fail Claude's
tools.*.input_schema validation; update the logic that assigns kwargs["tools"]
to normalize each tool to Claude's schema (ensure each tool is an object with
name, description and input_schema), by detecting OpenAI-style entries (e.g.
type=="function" or a "function" field), extracting the function
name/description and converting its parameter schema into a JSON Schema for
input_schema (or map simple param lists into a minimal JSON Schema), and if a
tool cannot be converted raise an early descriptive error; finally assign the
normalized list to kwargs["tools"] before calling
self.claude_client.messages.create(...).

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants