-
-
Notifications
You must be signed in to change notification settings - Fork 52
feat: add parallel LLM call tests and demo scripts #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced a new test suite for parallel LLM call functionality, including tests for async completion, batch processing, error diagnosis, and hardware configuration checks. - Added a demo script showcasing the usage of parallel LLM calls for multi-package queries, error diagnosis, and hardware config checks, demonstrating expected performance improvements. - Enhanced the LLMRouter class with async capabilities and rate limiting for concurrent API calls. - Updated existing tests to support new async methods and ensure comprehensive coverage of parallel processing features.
WalkthroughAdded asynchronous LLM request handling with concurrent batch processing and rate limiting to the LLMRouter. Introduced async client support for Claude and Kimi APIs, parallel execution helpers for multi-package queries and error diagnosis, and comprehensive test coverage. Includes example demonstrations of parallelized workflows. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant Router as LLMRouter
participant RateLimit as Rate Limiter<br/>(Semaphore)
participant Router_Internal as Router:<br/>route_task
participant Claude as Async<br/>Claude API
participant Kimi as Async<br/>Kimi API
Caller->>Router: complete_batch(requests)
Router->>RateLimit: acquire semaphore
Note over Router: For each request in parallel
Router->>Router_Internal: route_task(request)
Router_Internal-->>Router: provider choice<br/>(Claude or Kimi)
par Parallel Execution
Router->>Claude: _acomplete_claude(request)
Claude-->>Router: response (latency, tokens)
and
Router->>Kimi: _acomplete_kimi(request)
Kimi-->>Router: response (latency, tokens)
end
Router->>RateLimit: release semaphore
Router->>Router: aggregate responses
Router-->>Caller: list[LLMResponse]<br/>with costs & timings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (9)
tests/test_llm_router.py (2)
515-516: Unused mock variable.
mock_messageis created but never used;mock_contentis used instead. Consider removing the unused variable.🔎 Proposed fix
@patch("cortex.llm_router.AsyncAnthropic") @patch("cortex.llm_router.AsyncOpenAI") def test_acomplete_claude(self, mock_async_openai, mock_async_anthropic): """Test async completion with Claude.""" # Mock async Claude client - mock_message = Mock() - mock_message.text = "Async Claude response" - mock_content = Mock() mock_content.text = "Async Claude response"
715-720: Accessing private semaphore attribute.Line 720 accesses
_rate_limit_semaphore._value, which is a private implementation detail ofasyncio.Semaphore. While acceptable for testing, this could break if the internal implementation changes.Consider verifying the semaphore's behavior through its public interface instead, or document this as a known test fragility.
test_parallel_llm.py (2)
1-18: Consider moving test file totests/directory.This test file is located at the repository root rather than in the
tests/directory. For consistency with the project structure and test discovery, consider moving it totests/test_parallel_llm.py.
29-60: Consider adding type hints to async test functions.Per coding guidelines, type hints are required. The async test functions lack return type annotations.
🔎 Example fix for test_async_completion
-async def test_async_completion(): +async def test_async_completion() -> bool: """Test basic async completion."""examples/parallel_llm_demo.py (1)
185-214: Consider clarifying that the performance comparison is simulated.The
demo_sequential_vs_parallelfunction usesasyncio.sleep(0.1)to simulate API calls rather than making real requests. While this makes the demo runnable without API keys, the output might be misleading to users who expect real performance metrics. Consider adding an explicit note in the output.cortex/llm_router.py (4)
432-439: Consider adding input validation.The
set_rate_limitmethod doesn't validate thatmax_concurrentis positive. A value ≤ 0 would cause unexpected behavior.🔎 Proposed fix
def set_rate_limit(self, max_concurrent: int = 10): """ Set rate limit for parallel API calls. Args: max_concurrent: Maximum number of concurrent API calls """ + if max_concurrent <= 0: + raise ValueError("max_concurrent must be positive") self._rate_limit_semaphore = asyncio.Semaphore(max_concurrent)
648-655: Semaphore handling could be simplified.The code accesses
_rate_limit_semaphore._value(line 650), which is a private attribute. Additionally, a new semaphore is created (line 655) even ifset_rate_limitwas called, making the stored semaphore unused incomplete_batch.Consider using the stored semaphore directly if it exists, or document that
complete_batchuses its own semaphore.🔎 Proposed fix
# Use provided max_concurrent or semaphore limit or default if max_concurrent is None: if self._rate_limit_semaphore: - max_concurrent = self._rate_limit_semaphore._value + semaphore = self._rate_limit_semaphore else: max_concurrent = 10 self.set_rate_limit(max_concurrent) + semaphore = self._rate_limit_semaphore + else: + semaphore = asyncio.Semaphore(max_concurrent) - semaphore = asyncio.Semaphore(max_concurrent)
679-686: Hardcoded provider in error response.Error responses use
provider=LLMProvider.CLAUDEregardless of which provider was intended. Consider extracting the intended provider from the request, or use a more explicit sentinel.
391-399: Stats tracking is not thread-safe.
_update_statsmodifies shared state (total_cost_usd,request_count,provider_stats) without synchronization. While safe for single-threaded async usage, this could cause data races if the router is used across multiple threads (as mentioned in PR objectives for "future free-threading" support).Consider adding a lock if multi-threaded usage is anticipated.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cortex/llm_router.py(6 hunks)examples/parallel_llm_demo.py(1 hunks)test_parallel_llm.py(1 hunks)tests/test_llm_router.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
test_parallel_llm.pyexamples/parallel_llm_demo.pytests/test_llm_router.pycortex/llm_router.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Maintain >80% test coverage for pull requests
Files:
tests/test_llm_router.py
🧬 Code graph analysis (3)
test_parallel_llm.py (4)
cortex/llm_router.py (8)
LLMRouter(74-691)TaskType(31-41)check_hardware_configs_parallel(822-867)diagnose_errors_parallel(772-819)query_multiple_packages(725-769)acomplete(441-505)complete_batch(610-691)set_rate_limit(432-439)examples/parallel_llm_demo.py (1)
main(217-249)cortex/semantic_cache.py (1)
total(30-32)cortex/sandbox/sandbox_executor.py (1)
success(60-62)
examples/parallel_llm_demo.py (1)
cortex/llm_router.py (8)
LLMRouter(74-691)TaskType(31-41)check_hardware_configs_parallel(822-867)diagnose_errors_parallel(772-819)query_multiple_packages(725-769)set_rate_limit(432-439)complete_batch(610-691)get_stats(401-423)
tests/test_llm_router.py (1)
cortex/llm_router.py (10)
LLMProvider(44-48)LLMResponse(52-61)LLMRouter(74-691)TaskType(31-41)check_hardware_configs_parallel(822-867)complete_task(695-721)diagnose_errors_parallel(772-819)query_multiple_packages(725-769)complete_batch(610-691)set_rate_limit(432-439)
🪛 GitHub Actions: CI
examples/parallel_llm_demo.py
[error] 13-13: I001 Import block is un-sorted or un-formatted
🪛 GitHub Check: Lint
test_parallel_llm.py
[failure] 247-247: Ruff (F541)
test_parallel_llm.py:247:15: F541 f-string without any placeholders
[failure] 236-236: Ruff (W291)
test_parallel_llm.py:236:89: W291 Trailing whitespace
[failure] 145-145: Ruff (F541)
test_parallel_llm.py:145:15: F541 f-string without any placeholders
[failure] 98-98: Ruff (F541)
test_parallel_llm.py:98:15: F541 f-string without any placeholders
[failure] 51-51: Ruff (F541)
test_parallel_llm.py:51:15: F541 f-string without any placeholders
examples/parallel_llm_demo.py
[failure] 13-21: Ruff (I001)
examples/parallel_llm_demo.py:13:1: I001 Import block is un-sorted or un-formatted
🔇 Additional comments (11)
tests/test_llm_router.py (4)
503-509: LGTM! New parallel processing test suite added.The test class structure is well-organized and covers key parallel features: async completion, batch processing, helper functions, and rate limiting.
544-575: LGTM!The async Kimi completion test correctly mocks the OpenAI-compatible response structure and verifies proper routing.
577-628: LGTM!Good test coverage for batch completion with mixed providers. The test correctly verifies that different task types route to their expected providers.
630-713: LGTM! Helper function tests cover essential functionality.The tests for
query_multiple_packages,diagnose_errors_parallel, andcheck_hardware_configs_parallelverify correct response structure and dictionary key mapping.examples/parallel_llm_demo.py (2)
24-126: LGTM! Well-structured demo functions.The demo functions follow a consistent pattern with clear docstrings, timing measurements, and informative output. Good examples for users.
217-253: LGTM! Good main function structure.The main function properly orchestrates demos with error handling and clear guidance when API keys are missing.
cortex/llm_router.py (5)
14-24: LGTM! Async client imports added.The new imports for async clients (
AsyncAnthropic,AsyncOpenAI) andasyncioproperly support the new parallel processing capabilities.
139-162: LGTM! Async client initialization follows sync pattern.The async clients are properly initialized alongside their synchronous counterparts, maintaining consistency in the initialization logic.
441-505: LGTM! Well-implemented async completion method.The
acompletemethod correctly mirrors the synchronouscomplete()method with proper async/await usage, routing, latency tracking, and fallback logic.
507-608: LGTM! Async provider methods correctly implemented.Both
_acomplete_claudeand_acomplete_kimiproperly check for client initialization, handle parameters, and return properly structuredLLMResponseobjects.
724-867: LGTM! Well-documented parallel helper functions.The helper functions
query_multiple_packages,diagnose_errors_parallel, andcheck_hardware_configs_parallelare well-documented with comprehensive docstrings, type hints, and usage examples. The implementation correctly leveragescomplete_batchfor parallel execution.
| import asyncio | ||
| import time | ||
| from cortex.llm_router import ( | ||
| LLMRouter, | ||
| TaskType, | ||
| check_hardware_configs_parallel, | ||
| diagnose_errors_parallel, | ||
| query_multiple_packages, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import block is un-sorted (Ruff I001) - blocking CI.
The imports are not properly sorted, causing the CI pipeline to fail. Standard library imports should come before third-party imports.
🔎 Proposed fix
import asyncio
import time
+
from cortex.llm_router import (
LLMRouter,
TaskType,
check_hardware_configs_parallel,
diagnose_errors_parallel,
query_multiple_packages,
)🧰 Tools
🪛 GitHub Actions: CI
[error] 13-13: I001 Import block is un-sorted or un-formatted
🪛 GitHub Check: Lint
[failure] 13-21: Ruff (I001)
examples/parallel_llm_demo.py:13:1: I001 Import block is un-sorted or un-formatted
🤖 Prompt for AI Agents
In examples/parallel_llm_demo.py around lines 13 to 21, the import block is not
sorted which triggers Ruff I001 and fails CI; reorder imports so standard
library imports (asyncio, time) appear first, then third-party/third-party-like
imports (cortex.llm_router) grouped separately, and sort names alphabetically
within each group; ensure blank line between stdlib and third-party groups and
keep the existing multi-line from-import formatting.
| ) | ||
| elapsed = time.time() - start | ||
|
|
||
| print(f"✅ Async completion successful!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string without placeholders (Ruff F541).
This f-string has no interpolated values. Use a regular string instead.
🔎 Proposed fix
- print(f"✅ Async completion successful!")
+ print("✅ Async completion successful!")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"✅ Async completion successful!") | |
| print("✅ Async completion successful!") |
🧰 Tools
🪛 GitHub Check: Lint
[failure] 51-51: Ruff (F541)
test_parallel_llm.py:51:15: F541 f-string without any placeholders
🤖 Prompt for AI Agents
In test_parallel_llm.py around line 51, the print call uses an f-string with no
placeholders ("print(f\"✅ Async completion successful!\")"); replace the
f-string with a regular string literal by removing the leading "f" so it becomes
print("✅ Async completion successful!"), eliminating the Ruff F541 warning.
| responses = await router.complete_batch(requests, max_concurrent=3) | ||
| elapsed = time.time() - start | ||
|
|
||
| print(f"✅ Batch processing successful!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string without placeholders (Ruff F541).
Use a regular string instead.
🔎 Proposed fix
- print(f"✅ Batch processing successful!")
+ print("✅ Batch processing successful!")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"✅ Batch processing successful!") | |
| print("✅ Batch processing successful!") |
🧰 Tools
🪛 GitHub Check: Lint
[failure] 98-98: Ruff (F541)
test_parallel_llm.py:98:15: F541 f-string without any placeholders
🤖 Prompt for AI Agents
In test_parallel_llm.py around line 98, the print call uses an f-string with no
placeholders which triggers Ruff F541; replace the f-string with a plain string
literal (change print(f"✅ Batch processing successful!") to print("✅ Batch
processing successful!")) to remove the unnecessary f-prefix and satisfy the
linter.
| responses = await router.complete_batch(requests, max_concurrent=2) | ||
| elapsed = time.time() - start | ||
|
|
||
| print(f"✅ Rate limiting working!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string without placeholders (Ruff F541).
Use a regular string instead.
🔎 Proposed fix
- print(f"✅ Rate limiting working!")
+ print("✅ Rate limiting working!")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"✅ Rate limiting working!") | |
| print("✅ Rate limiting working!") |
🧰 Tools
🪛 GitHub Check: Lint
[failure] 145-145: Ruff (F541)
test_parallel_llm.py:145:15: F541 f-string without any placeholders
🤖 Prompt for AI Agents
In test_parallel_llm.py around line 145, there's an f-string with no
placeholders (print(f"✅ Rate limiting working!")) which triggers Ruff F541;
replace the f-string with a plain string literal (remove the leading "f") so the
call becomes print("✅ Rate limiting working!"), keeping the message unchanged.
| await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"}, | ||
| task_type=req["task_type"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace (Ruff W291).
Remove the trailing whitespace on line 236.
🔎 Proposed fix
- for req in requests:
- await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"},
- task_type=req["task_type"])
+ for req in requests:
+ await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"},
+ task_type=req["task_type"])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"}, | |
| task_type=req["task_type"]) | |
| for req in requests: | |
| await router.acomplete(**{k: v for k, v in req.items() if k != "task_type"}, | |
| task_type=req["task_type"]) |
🧰 Tools
🪛 GitHub Check: Lint
[failure] 236-236: Ruff (W291)
test_parallel_llm.py:236:89: W291 Trailing whitespace
🤖 Prompt for AI Agents
In test_parallel_llm.py around lines 236 to 237, line 236 contains trailing
whitespace; remove the trailing space(s) at the end of that line so it no longer
triggers Ruff W291 (ensure the line ends immediately after the closing
parenthesis).
| elapsed_par = time.time() - start_par | ||
|
|
||
| speedup = elapsed_seq / elapsed_par if elapsed_par > 0 else 1.0 | ||
| print(f"\n✅ Performance comparison:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string without placeholders (Ruff F541).
Use a regular string instead.
🔎 Proposed fix
- print(f"\n✅ Performance comparison:")
+ print("\n✅ Performance comparison:")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"\n✅ Performance comparison:") | |
| print("\n✅ Performance comparison:") |
🧰 Tools
🪛 GitHub Check: Lint
[failure] 247-247: Ruff (F541)
test_parallel_llm.py:247:15: F541 f-string without any placeholders
🤖 Prompt for AI Agents
In test_parallel_llm.py around line 247, the print uses an f-string with no
placeholders (Ruff F541); replace the f-string with a regular string literal by
removing the leading f (i.e., change print(f"\n✅ Performance comparison:") to
print("\n✅ Performance comparison:")).



Closes #276
Summary by CodeRabbit
New Features
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.