fix(backend): implement weighted load balancing in selectModel#49
Conversation
The selectModel function in api-helpers.ts had a TODO comment but was only returning the first candidate, ignoring the weight configuration. This fix implements proper weighted random selection algorithm that distributes requests across multiple providers based on their configured weights. The same algorithm was already implemented in utils/model.ts (used by embeddings endpoint) but not in api-helpers.ts (used by chat completions, messages, and responses endpoints). Affected endpoints: - /v1/chat/completions - /v1/messages - /v1/responses Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @pescn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a crucial fix to the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 Walkthrough总览在 变更
估计代码审查工作量🎯 2 (简单) | ⏱️ ~12 分钟 诗
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (2)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Code Review
This pull request correctly implements weighted load balancing in the selectModel function within api-helpers.ts, resolving an issue where model weights were ignored. The implementation now aligns with the existing logic in utils/model.ts, ensuring consistent behavior. The change is well-described and tested. My review includes one suggestion to improve code clarity.
| // oxlint-disable-next-line no-unnecessary-type-assertion | ||
| return candidates[0]!; // TypeScript needs assertion here |
There was a problem hiding this comment.
The non-null assertion ! appears to be unnecessary here. The if (candidates.length === 1) check on line 110 guarantees that candidates[0] is defined. The oxlint-disable-next-line comment also suggests that a linter agrees the assertion is not needed. You can simplify the code by removing the assertion and the related comments.
| // oxlint-disable-next-line no-unnecessary-type-assertion | |
| return candidates[0]!; // TypeScript needs assertion here | |
| return candidates[0]; |
…quests Implement automatic failover capability that switches to alternate providers when a request fails with retriable errors (5xx, 429, network timeouts). Key features: - Failover service with configurable retry behavior - Exponential backoff with jitter for retries - Same-provider retries for transient errors (429) - Cross-provider failover for persistent failures - Weighted random selection for failover candidates - Detailed error logging with all attempted providers Also includes the weighted load balancing fix from PR #49: - Implement proper weighted random selection in selectModel() - Add filterCandidates() helper for failover use Affected endpoints: - /v1/chat/completions - /v1/messages - /v1/responses Configuration defaults: - maxProviderAttempts: 3 - sameProviderRetries: 1 - retriableStatusCodes: [429, 500, 502, 503, 504] - timeoutMs: 120000 (2 minutes) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…quests Implement automatic failover capability that switches to alternate providers when a request fails with retriable errors (5xx, 429, network timeouts). Key features: - Failover service with configurable retry behavior - Exponential backoff with jitter for retries - Same-provider retries for transient errors (429) - Cross-provider failover for persistent failures - Weighted random selection for failover candidates - Detailed error logging with all attempted providers Also includes the weighted load balancing fix from PR #49: - Implement proper weighted random selection in selectModel() - Add filterCandidates() helper for failover use Affected endpoints: - /v1/chat/completions - /v1/messages - /v1/responses Configuration defaults: - maxProviderAttempts: 3 - sameProviderRetries: 1 - retriableStatusCodes: [429, 500, 502, 503, 504] - timeoutMs: 120000 (2 minutes) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…quests (#50) * feat(backend): add automatic failover and retry for multi-provider requests Implement automatic failover capability that switches to alternate providers when a request fails with retriable errors (5xx, 429, network timeouts). Key features: - Failover service with configurable retry behavior - Exponential backoff with jitter for retries - Same-provider retries for transient errors (429) - Cross-provider failover for persistent failures - Weighted random selection for failover candidates - Detailed error logging with all attempted providers Also includes the weighted load balancing fix from PR #49: - Implement proper weighted random selection in selectModel() - Add filterCandidates() helper for failover use Affected endpoints: - /v1/chat/completions - /v1/messages - /v1/responses Configuration defaults: - maxProviderAttempts: 3 - sameProviderRetries: 1 - retriableStatusCodes: [429, 500, 502, 503, 504] - timeoutMs: 120000 (2 minutes) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address PR review comments for failover implementation - Change modelId fallback from ?? 0 to undefined to avoid database integrity issues - Add AbortError handling in isRetriableNetworkError for timeout scenarios - Fix SSE error format in responses.ts to match OpenAI Responses API spec - Add error capture to catch blocks for better logging - Optimize selectMultipleCandidates by calculating totalWeight once Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: extract shared failover error handling to reduce code duplication - Add processFailoverError helper function in api-helpers.ts - Refactor completions.ts, messages.ts, responses.ts to use shared helper - Reduces ~120 lines of duplicated error handling code - Maintains same behavior: forward non-retriable errors, return 502 on exhaustion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address code review feedback for failover service - Use more specific error message patterns in isRetriableNetworkError to avoid false positives (e.g., "network error" instead of "network") - Simplify provider iteration loop using for...of with entries() since candidates are already unique and ordered from selectMultipleCandidates - Remove redundant triedProviders Set Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address additional code review feedback - Fix SSE error type in messages.ts to use "server_error" per Anthropic spec - Remove unused "internal_error" case from FailoverErrorResult type Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Summary
selectModelfunction inapi-helpers.tsthat was ignoring weight configurationTODO: implement weighted load balancingcomment but only returned the first candidateutils/model.tsProblem
When multiple providers offered the same model with different weights (e.g., provider-A weight=1, provider-B weight=2, provider-C weight=3), all requests were routed to provider-A (100%) instead of being distributed according to weights (~16.7%, ~33.3%, ~50.0%).
Test Results
Affected Endpoints
/v1/chat/completions/v1/messages/v1/responsesNote:
/v1/embeddingswas already using the correct weighted selection fromutils/model.ts.Test plan
scripts/test-selectmodel-comparison.ts)bun run check)bun run lint)🤖 Generated with Claude Code
Summary by CodeRabbit
改进
✏️ Tip: You can customize this high-level summary in your review settings.