Skip to content

Conversation

@dhvll
Copy link
Collaborator

@dhvll dhvll commented Dec 19, 2025

  • Introduced a new test suite for parallel LLM call functionality, including tests for async completion, batch processing, error diagnosis, and hardware configuration checks.
  • Added a demo script showcasing the usage of parallel LLM calls for multi-package queries, error diagnosis, and hardware config checks, demonstrating expected performance improvements.
  • Enhanced the LLMRouter class with async capabilities and rate limiting for concurrent API calls.
  • Updated existing tests to support new async methods and ensure comprehensive coverage of parallel processing features.

Closes #276

Summary by CodeRabbit

  • New Features

    • Added asynchronous LLM request support enabling concurrent operations
    • Introduced batch processing with configurable rate limiting for parallel requests
    • Added helper utilities for parallel package queries, error diagnosis, and hardware configuration checks
    • Exposed rate-limit configuration to control concurrent call limits
  • Tests

    • New comprehensive test suite for parallel operations, batch processing, and rate limiting
  • Documentation

    • Added example demonstrations showing parallel LLM usage patterns

✏️ Tip: You can customize this high-level summary in your review settings.

- Introduced a new test suite for parallel LLM call functionality, including tests for async completion, batch processing, error diagnosis, and hardware configuration checks.
- Added a demo script showcasing the usage of parallel LLM calls for multi-package queries, error diagnosis, and hardware config checks, demonstrating expected performance improvements.
- Enhanced the LLMRouter class with async capabilities and rate limiting for concurrent API calls.
- Updated existing tests to support new async methods and ensure comprehensive coverage of parallel processing features.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

Walkthrough

Added asynchronous LLM request handling with concurrent batch processing and rate limiting to the LLMRouter. Introduced async client support for Claude and Kimi APIs, parallel execution helpers for multi-package queries and error diagnosis, and comprehensive test coverage. Includes example demonstrations of parallelized workflows.

Changes

Cohort / File(s) Summary
Core async and rate-limiting implementation
cortex/llm_router.py
Added async client initialization (AsyncAnthropic, AsyncOpenAI), set_rate_limit method for semaphore configuration, acomplete async workflow with provider routing, _acomplete_claude and _acomplete_kimi async helpers, complete_batch for parallel request processing, and module-level async helper functions (query_multiple_packages, diagnose_errors_parallel, check_hardware_configs_parallel). Enhanced complete_task signature.
Demonstration scripts
examples/parallel_llm_demo.py
New demo file showcasing async parallel operations: demo_multi_package_queries, demo_parallel_error_diagnosis, demo_hardware_config_checks, demo_batch_completion, demo_sequential_vs_parallel, and main orchestrator.
Test coverage
test_parallel_llm.py
New test module with async test functions: test_async_completion, test_batch_processing, test_rate_limiting, test_helper_functions, test_performance_comparison, and main test orchestrator.
Existing test updates
tests/test_llm_router.py
Added TestParallelProcessing class with async test methods for parallel provider selection, batch processing, and helper function validation. Updated imports for AsyncMock and new public APIs.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Router as LLMRouter
    participant RateLimit as Rate Limiter<br/>(Semaphore)
    participant Router_Internal as Router:<br/>route_task
    participant Claude as Async<br/>Claude API
    participant Kimi as Async<br/>Kimi API
    
    Caller->>Router: complete_batch(requests)
    Router->>RateLimit: acquire semaphore
    Note over Router: For each request in parallel
    Router->>Router_Internal: route_task(request)
    Router_Internal-->>Router: provider choice<br/>(Claude or Kimi)
    
    par Parallel Execution
        Router->>Claude: _acomplete_claude(request)
        Claude-->>Router: response (latency, tokens)
    and
        Router->>Kimi: _acomplete_kimi(request)
        Kimi-->>Router: response (latency, tokens)
    end
    
    Router->>RateLimit: release semaphore
    Router->>Router: aggregate responses
    Router-->>Caller: list[LLMResponse]<br/>with costs & timings
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Multiple async implementations: Parallel execution paths for Claude and Kimi providers require careful validation of async/await patterns and error handling.
  • Concurrency control: Rate-limiting semaphore logic, concurrent request batching, and exception handling in parallel contexts need thorough review.
  • Public API expansion: Seven new public methods/functions significantly broaden the surface area; ensure backward compatibility and clear documentation.
  • Test coverage variance: Three test files with different testing approaches (integration demo, unit tests, existing suite extensions) create multiple validation touchpoints.
  • Helper function dependencies: Three new parallel helper functions (query_multiple_packages, diagnose_errors_parallel, check_hardware_configs_parallel) rely on the core async infrastructure and require cross-validation.

Possibly related PRs

Poem

🐰 With whiskers twitching and paws all a-blur,
We've harnessed the async, no more sequential stir!
Rate limits keep order, while Claude and Kimi race,
Parallel queries at lightning-fast pace!
Three-fold speedups, what a glorious day,
For batched LLM calls the concurrent way! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers main changes and closes issue #276, but does not follow the provided template structure with required sections. Reorganize description to match template: add 'Related Issue' header, 'Summary' header, and 'Checklist' section with items. Include required PR title format [#XX] in checklist validation.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: adding parallel LLM call tests and demo scripts.
Linked Issues check ✅ Passed All coding requirements from issue #276 are implemented: async completion, batch processing, rate limiting, and parallel helper functions for multi-package queries, error diagnosis, and hardware checks.
Out of Scope Changes check ✅ Passed All changes align with issue #276 requirements: LLMRouter async enhancements, parallel processing tests, demo scripts, and rate limiting for concurrent calls.
Docstring Coverage ✅ Passed Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (9)
tests/test_llm_router.py (2)

515-516: Unused mock variable.

mock_message is created but never used; mock_content is used instead. Consider removing the unused variable.

🔎 Proposed fix
     @patch("cortex.llm_router.AsyncAnthropic")
     @patch("cortex.llm_router.AsyncOpenAI")
     def test_acomplete_claude(self, mock_async_openai, mock_async_anthropic):
         """Test async completion with Claude."""
         # Mock async Claude client
-        mock_message = Mock()
-        mock_message.text = "Async Claude response"
-
         mock_content = Mock()
         mock_content.text = "Async Claude response"

715-720: Accessing private semaphore attribute.

Line 720 accesses _rate_limit_semaphore._value, which is a private implementation detail of asyncio.Semaphore. While acceptable for testing, this could break if the internal implementation changes.

Consider verifying the semaphore's behavior through its public interface instead, or document this as a known test fragility.

test_parallel_llm.py (2)

1-18: Consider moving test file to tests/ directory.

This test file is located at the repository root rather than in the tests/ directory. For consistency with the project structure and test discovery, consider moving it to tests/test_parallel_llm.py.


29-60: Consider adding type hints to async test functions.

Per coding guidelines, type hints are required. The async test functions lack return type annotations.

🔎 Example fix for test_async_completion
-async def test_async_completion():
+async def test_async_completion() -> bool:
     """Test basic async completion."""
examples/parallel_llm_demo.py (1)

185-214: Consider clarifying that the performance comparison is simulated.

The demo_sequential_vs_parallel function uses asyncio.sleep(0.1) to simulate API calls rather than making real requests. While this makes the demo runnable without API keys, the output might be misleading to users who expect real performance metrics. Consider adding an explicit note in the output.

cortex/llm_router.py (4)

432-439: Consider adding input validation.

The set_rate_limit method doesn't validate that max_concurrent is positive. A value ≤ 0 would cause unexpected behavior.

🔎 Proposed fix
     def set_rate_limit(self, max_concurrent: int = 10):
         """
         Set rate limit for parallel API calls.

         Args:
             max_concurrent: Maximum number of concurrent API calls
         """
+        if max_concurrent <= 0:
+            raise ValueError("max_concurrent must be positive")
         self._rate_limit_semaphore = asyncio.Semaphore(max_concurrent)

648-655: Semaphore handling could be simplified.

The code accesses _rate_limit_semaphore._value (line 650), which is a private attribute. Additionally, a new semaphore is created (line 655) even if set_rate_limit was called, making the stored semaphore unused in complete_batch.

Consider using the stored semaphore directly if it exists, or document that complete_batch uses its own semaphore.

🔎 Proposed fix
         # Use provided max_concurrent or semaphore limit or default
         if max_concurrent is None:
             if self._rate_limit_semaphore:
-                max_concurrent = self._rate_limit_semaphore._value
+                semaphore = self._rate_limit_semaphore
             else:
                 max_concurrent = 10
                 self.set_rate_limit(max_concurrent)
+                semaphore = self._rate_limit_semaphore
+        else:
+            semaphore = asyncio.Semaphore(max_concurrent)

-        semaphore = asyncio.Semaphore(max_concurrent)

679-686: Hardcoded provider in error response.

Error responses use provider=LLMProvider.CLAUDE regardless of which provider was intended. Consider extracting the intended provider from the request, or use a more explicit sentinel.


391-399: Stats tracking is not thread-safe.

_update_stats modifies shared state (total_cost_usd, request_count, provider_stats) without synchronization. While safe for single-threaded async usage, this could cause data races if the router is used across multiple threads (as mentioned in PR objectives for "future free-threading" support).

Consider adding a lock if multi-threaded usage is anticipated.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c482df9 and e335c76.

📒 Files selected for processing (4)
  • cortex/llm_router.py (6 hunks)
  • examples/parallel_llm_demo.py (1 hunks)
  • test_parallel_llm.py (1 hunks)
  • tests/test_llm_router.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • test_parallel_llm.py
  • examples/parallel_llm_demo.py
  • tests/test_llm_router.py
  • cortex/llm_router.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_llm_router.py
🧬 Code graph analysis (3)
test_parallel_llm.py (4)
cortex/llm_router.py (8)
  • LLMRouter (74-691)
  • TaskType (31-41)
  • check_hardware_configs_parallel (822-867)
  • diagnose_errors_parallel (772-819)
  • query_multiple_packages (725-769)
  • acomplete (441-505)
  • complete_batch (610-691)
  • set_rate_limit (432-439)
examples/parallel_llm_demo.py (1)
  • main (217-249)
cortex/semantic_cache.py (1)
  • total (30-32)
cortex/sandbox/sandbox_executor.py (1)
  • success (60-62)
examples/parallel_llm_demo.py (1)
cortex/llm_router.py (8)
  • LLMRouter (74-691)
  • TaskType (31-41)
  • check_hardware_configs_parallel (822-867)
  • diagnose_errors_parallel (772-819)
  • query_multiple_packages (725-769)
  • set_rate_limit (432-439)
  • complete_batch (610-691)
  • get_stats (401-423)
tests/test_llm_router.py (1)
cortex/llm_router.py (10)
  • LLMProvider (44-48)
  • LLMResponse (52-61)
  • LLMRouter (74-691)
  • TaskType (31-41)
  • check_hardware_configs_parallel (822-867)
  • complete_task (695-721)
  • diagnose_errors_parallel (772-819)
  • query_multiple_packages (725-769)
  • complete_batch (610-691)
  • set_rate_limit (432-439)
🪛 GitHub Actions: CI
examples/parallel_llm_demo.py

[error] 13-13: I001 Import block is un-sorted or un-formatted

🪛 GitHub Check: Lint
test_parallel_llm.py

[failure] 247-247: Ruff (F541)
test_parallel_llm.py:247:15: F541 f-string without any placeholders


[failure] 236-236: Ruff (W291)
test_parallel_llm.py:236:89: W291 Trailing whitespace


[failure] 145-145: Ruff (F541)
test_parallel_llm.py:145:15: F541 f-string without any placeholders


[failure] 98-98: Ruff (F541)
test_parallel_llm.py:98:15: F541 f-string without any placeholders


[failure] 51-51: Ruff (F541)
test_parallel_llm.py:51:15: F541 f-string without any placeholders

examples/parallel_llm_demo.py

[failure] 13-21: Ruff (I001)
examples/parallel_llm_demo.py:13:1: I001 Import block is un-sorted or un-formatted

🔇 Additional comments (11)
tests/test_llm_router.py (4)

503-509: LGTM! New parallel processing test suite added.

The test class structure is well-organized and covers key parallel features: async completion, batch processing, helper functions, and rate limiting.


544-575: LGTM!

The async Kimi completion test correctly mocks the OpenAI-compatible response structure and verifies proper routing.


577-628: LGTM!

Good test coverage for batch completion with mixed providers. The test correctly verifies that different task types route to their expected providers.


630-713: LGTM! Helper function tests cover essential functionality.

The tests for query_multiple_packages, diagnose_errors_parallel, and check_hardware_configs_parallel verify correct response structure and dictionary key mapping.

examples/parallel_llm_demo.py (2)

24-126: LGTM! Well-structured demo functions.

The demo functions follow a consistent pattern with clear docstrings, timing measurements, and informative output. Good examples for users.


217-253: LGTM! Good main function structure.

The main function properly orchestrates demos with error handling and clear guidance when API keys are missing.

cortex/llm_router.py (5)

14-24: LGTM! Async client imports added.

The new imports for async clients (AsyncAnthropic, AsyncOpenAI) and asyncio properly support the new parallel processing capabilities.


139-162: LGTM! Async client initialization follows sync pattern.

The async clients are properly initialized alongside their synchronous counterparts, maintaining consistency in the initialization logic.


441-505: LGTM! Well-implemented async completion method.

The acomplete method correctly mirrors the synchronous complete() method with proper async/await usage, routing, latency tracking, and fallback logic.


507-608: LGTM! Async provider methods correctly implemented.

Both _acomplete_claude and _acomplete_kimi properly check for client initialization, handle parameters, and return properly structured LLMResponse objects.


724-867: LGTM! Well-documented parallel helper functions.

The helper functions query_multiple_packages, diagnose_errors_parallel, and check_hardware_configs_parallel are well-documented with comprehensive docstrings, type hints, and usage examples. The implementation correctly leverages complete_batch for parallel execution.

Comment on lines +13 to +21
import asyncio
import time
from cortex.llm_router import (
LLMRouter,
TaskType,
check_hardware_configs_parallel,
diagnose_errors_parallel,
query_multiple_packages,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Import block is un-sorted (Ruff I001) - blocking CI.

The imports are not properly sorted, causing the CI pipeline to fail. Standard library imports should come before third-party imports.

🔎 Proposed fix
 import asyncio
 import time
+
 from cortex.llm_router import (
     LLMRouter,
     TaskType,
     check_hardware_configs_parallel,
     diagnose_errors_parallel,
     query_multiple_packages,
 )
🧰 Tools
🪛 GitHub Actions: CI

[error] 13-13: I001 Import block is un-sorted or un-formatted

🪛 GitHub Check: Lint

[failure] 13-21: Ruff (I001)
examples/parallel_llm_demo.py:13:1: I001 Import block is un-sorted or un-formatted

🤖 Prompt for AI Agents
In examples/parallel_llm_demo.py around lines 13 to 21, the import block is not
sorted which triggers Ruff I001 and fails CI; reorder imports so standard
library imports (asyncio, time) appear first, then third-party/third-party-like
imports (cortex.llm_router) grouped separately, and sort names alphabetically
within each group; ensure blank line between stdlib and third-party groups and
keep the existing multi-line from-import formatting.

)
elapsed = time.time() - start

print(f"✅ Async completion successful!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

f-string without placeholders (Ruff F541).

This f-string has no interpolated values. Use a regular string instead.

🔎 Proposed fix
-        print(f"✅ Async completion successful!")
+        print("✅ Async completion successful!")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"✅ Async completion successful!")
print("✅ Async completion successful!")
🧰 Tools
🪛 GitHub Check: Lint

[failure] 51-51: Ruff (F541)
test_parallel_llm.py:51:15: F541 f-string without any placeholders

🤖 Prompt for AI Agents
In test_parallel_llm.py around line 51, the print call uses an f-string with no
placeholders ("print(f\"✅ Async completion successful!\")"); replace the
f-string with a regular string literal by removing the leading "f" so it becomes
print("✅ Async completion successful!"), eliminating the Ruff F541 warning.

responses = await router.complete_batch(requests, max_concurrent=3)
elapsed = time.time() - start

print(f"✅ Batch processing successful!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

f-string without placeholders (Ruff F541).

Use a regular string instead.

🔎 Proposed fix
-        print(f"✅ Batch processing successful!")
+        print("✅ Batch processing successful!")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"✅ Batch processing successful!")
print("✅ Batch processing successful!")
🧰 Tools
🪛 GitHub Check: Lint

[failure] 98-98: Ruff (F541)
test_parallel_llm.py:98:15: F541 f-string without any placeholders

🤖 Prompt for AI Agents
In test_parallel_llm.py around line 98, the print call uses an f-string with no
placeholders which triggers Ruff F541; replace the f-string with a plain string
literal (change print(f"✅ Batch processing successful!") to print("✅ Batch
processing successful!")) to remove the unnecessary f-prefix and satisfy the
linter.

responses = await router.complete_batch(requests, max_concurrent=2)
elapsed = time.time() - start

print(f"✅ Rate limiting working!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

f-string without placeholders (Ruff F541).

Use a regular string instead.

🔎 Proposed fix
-        print(f"✅ Rate limiting working!")
+        print("✅ Rate limiting working!")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"✅ Rate limiting working!")
print("✅ Rate limiting working!")
🧰 Tools
🪛 GitHub Check: Lint

[failure] 145-145: Ruff (F541)
test_parallel_llm.py:145:15: F541 f-string without any placeholders

🤖 Prompt for AI Agents
In test_parallel_llm.py around line 145, there's an f-string with no
placeholders (print(f"✅ Rate limiting working!")) which triggers Ruff F541;
replace the f-string with a plain string literal (remove the leading "f") so the
call becomes print("✅ Rate limiting working!"), keeping the message unchanged.

Comment on lines +236 to +237
await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"},
task_type=req["task_type"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Trailing whitespace (Ruff W291).

Remove the trailing whitespace on line 236.

🔎 Proposed fix
-        for req in requests:
-            await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"}, 
-                                 task_type=req["task_type"])
+        for req in requests:
+            await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"},
+                                   task_type=req["task_type"])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"},
task_type=req["task_type"])
for req in requests:
await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"},
task_type=req["task_type"])
🧰 Tools
🪛 GitHub Check: Lint

[failure] 236-236: Ruff (W291)
test_parallel_llm.py:236:89: W291 Trailing whitespace

🤖 Prompt for AI Agents
In test_parallel_llm.py around lines 236 to 237, line 236 contains trailing
whitespace; remove the trailing space(s) at the end of that line so it no longer
triggers Ruff W291 (ensure the line ends immediately after the closing
parenthesis).

elapsed_par = time.time() - start_par

speedup = elapsed_seq / elapsed_par if elapsed_par > 0 else 1.0
print(f"\n✅ Performance comparison:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

f-string without placeholders (Ruff F541).

Use a regular string instead.

🔎 Proposed fix
-        print(f"\n✅ Performance comparison:")
+        print("\n✅ Performance comparison:")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"\n✅ Performance comparison:")
print("\n✅ Performance comparison:")
🧰 Tools
🪛 GitHub Check: Lint

[failure] 247-247: Ruff (F541)
test_parallel_llm.py:247:15: F541 f-string without any placeholders

🤖 Prompt for AI Agents
In test_parallel_llm.py around line 247, the print uses an f-string with no
placeholders (Ruff F541); replace the f-string with a regular string literal by
removing the leading f (i.e., change print(f"\n✅ Performance comparison:") to
print("\n✅ Performance comparison:")).

@mikejmorgan-ai mikejmorgan-ai merged commit f6bfa49 into cortexlinux:main Dec 20, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel LLM Calls Implementation

2 participants