Skip to content

fix(backend): implement weighted load balancing in selectModel#49

Merged
pescn merged 1 commit into
mainfrom
fix/weighted-load-balancing
Jan 19, 2026
Merged

fix(backend): implement weighted load balancing in selectModel#49
pescn merged 1 commit into
mainfrom
fix/weighted-load-balancing

Conversation

@pescn
Copy link
Copy Markdown
Contributor

@pescn pescn commented Jan 19, 2026

Summary

  • Fixed the selectModel function in api-helpers.ts that was ignoring weight configuration
  • Previously, the function had a TODO: implement weighted load balancing comment but only returned the first candidate
  • Now implements proper weighted random selection algorithm matching the implementation in utils/model.ts

Problem

When multiple providers offered the same model with different weights (e.g., provider-A weight=1, provider-B weight=2, provider-C weight=3), all requests were routed to provider-A (100%) instead of being distributed according to weights (~16.7%, ~33.3%, ~50.0%).

Test Results

Scenario provider-A (w=1) provider-B (w=2) provider-C (w=3)
Expected ~16.7% ~33.3% ~50.0%
Before fix 100% ❌ 0% 0%
After fix 16.9% ✅ 32.4% ✅ 50.7% ✅

Affected Endpoints

  • /v1/chat/completions
  • /v1/messages
  • /v1/responses

Note: /v1/embeddings was already using the correct weighted selection from utils/model.ts.

Test plan

  • Verified with simulation test script (scripts/test-selectmodel-comparison.ts)
  • Type check passes (bun run check)
  • Lint passes (bun run lint)

🤖 Generated with Claude Code

Summary by CodeRabbit

改进

  • 改进
    • 升级模型选择机制,实现加权随机负载均衡算法,在多个可用模型间更智能地分配请求,提高系统稳定性和可靠性。

✏️ Tip: You can customize this high-level summary in your review settings.

The selectModel function in api-helpers.ts had a TODO comment but was
only returning the first candidate, ignoring the weight configuration.

This fix implements proper weighted random selection algorithm that
distributes requests across multiple providers based on their configured
weights. The same algorithm was already implemented in utils/model.ts
(used by embeddings endpoint) but not in api-helpers.ts (used by chat
completions, messages, and responses endpoints).

Affected endpoints:
- /v1/chat/completions
- /v1/messages
- /v1/responses

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @pescn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial fix to the selectModel function, which was previously failing to apply weighted load balancing for model selection. The change ensures that API requests are now distributed across available models based on their configured weights, preventing disproportionate routing to a single model and improving the overall efficiency and fairness of resource utilization for several key endpoints.

Highlights

  • Weighted Load Balancing Implementation: The selectModel function in api-helpers.ts has been updated to correctly implement weighted load balancing, addressing a previous TODO comment and ensuring model selection respects configured weights.
  • Corrected Model Distribution: This fix resolves an issue where all requests were routed to the first candidate model regardless of weight, now distributing traffic according to the specified weights for /v1/chat/completions, /v1/messages, and /v1/responses endpoints.
  • Alignment with Existing Logic: The newly implemented weighted random selection algorithm in api-helpers.ts now matches the logic already present in utils/model.ts, ensuring consistency across the codebase.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 19, 2026

📝 Walkthrough

总览

selectModel 函数中实现了加权随机负载均衡,增加了单个候选项的处理、基于 model.weight 的多候选项加权随机选择、选择日志记录以及回退机制。同时更新了相应的文档注释。

变更

类别 / 文件 变更摘要
加权负载均衡
backend/src/utils/api-helpers.ts
实现 selectModel 函数的加权随机选择逻辑:添加单候选项直接返回、多候选项基于权重的随机选择、选择事件日志记录,以及首候选项回退机制

估计代码审查工作量

🎯 2 (简单) | ⏱️ ~12 分钟

🐰 权重舞动在数字间,
负载均衡展开笑颜,
随机漫步选中之道,
日志记录每个选择,
稳妥回退保驾护航!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR标题清晰准确地描述了主要变更:在selectModel函数中实现加权负载均衡,与代码改动完全相关。
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae4895b and 4003963.

📒 Files selected for processing (1)
  • backend/src/utils/api-helpers.ts
🔇 Additional comments (2)
backend/src/utils/api-helpers.ts (2)

109-113: LGTM!

单一候选项的优化处理正确。通过提前返回避免不必要的权重计算,! 非空断言在 length === 1 检查后是安全的。


134-135: Fallback 处理合理。

作为防御性编程,在理论上不应触达的代码路径上提供 fallback 是好的实践。使用 ?? null 确保返回类型一致。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly implements weighted load balancing in the selectModel function within api-helpers.ts, resolving an issue where model weights were ignored. The implementation now aligns with the existing logic in utils/model.ts, ensuring consistent behavior. The change is well-described and tested. My review includes one suggestion to improve code clarity.

Comment on lines +111 to +112
// oxlint-disable-next-line no-unnecessary-type-assertion
return candidates[0]!; // TypeScript needs assertion here
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The non-null assertion ! appears to be unnecessary here. The if (candidates.length === 1) check on line 110 guarantees that candidates[0] is defined. The oxlint-disable-next-line comment also suggests that a linter agrees the assertion is not needed. You can simplify the code by removing the assertion and the related comments.

Suggested change
// oxlint-disable-next-line no-unnecessary-type-assertion
return candidates[0]!; // TypeScript needs assertion here
return candidates[0];

pescn added a commit that referenced this pull request Jan 19, 2026
…quests

Implement automatic failover capability that switches to alternate providers
when a request fails with retriable errors (5xx, 429, network timeouts).

Key features:
- Failover service with configurable retry behavior
- Exponential backoff with jitter for retries
- Same-provider retries for transient errors (429)
- Cross-provider failover for persistent failures
- Weighted random selection for failover candidates
- Detailed error logging with all attempted providers

Also includes the weighted load balancing fix from PR #49:
- Implement proper weighted random selection in selectModel()
- Add filterCandidates() helper for failover use

Affected endpoints:
- /v1/chat/completions
- /v1/messages
- /v1/responses

Configuration defaults:
- maxProviderAttempts: 3
- sameProviderRetries: 1
- retriableStatusCodes: [429, 500, 502, 503, 504]
- timeoutMs: 120000 (2 minutes)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn pescn merged commit d1c02d9 into main Jan 19, 2026
2 checks passed
pescn added a commit that referenced this pull request Jan 19, 2026
…quests

Implement automatic failover capability that switches to alternate providers
when a request fails with retriable errors (5xx, 429, network timeouts).

Key features:
- Failover service with configurable retry behavior
- Exponential backoff with jitter for retries
- Same-provider retries for transient errors (429)
- Cross-provider failover for persistent failures
- Weighted random selection for failover candidates
- Detailed error logging with all attempted providers

Also includes the weighted load balancing fix from PR #49:
- Implement proper weighted random selection in selectModel()
- Add filterCandidates() helper for failover use

Affected endpoints:
- /v1/chat/completions
- /v1/messages
- /v1/responses

Configuration defaults:
- maxProviderAttempts: 3
- sameProviderRetries: 1
- retriableStatusCodes: [429, 500, 502, 503, 504]
- timeoutMs: 120000 (2 minutes)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
pescn added a commit that referenced this pull request Jan 19, 2026
…quests (#50)

* feat(backend): add automatic failover and retry for multi-provider requests

Implement automatic failover capability that switches to alternate providers
when a request fails with retriable errors (5xx, 429, network timeouts).

Key features:
- Failover service with configurable retry behavior
- Exponential backoff with jitter for retries
- Same-provider retries for transient errors (429)
- Cross-provider failover for persistent failures
- Weighted random selection for failover candidates
- Detailed error logging with all attempted providers

Also includes the weighted load balancing fix from PR #49:
- Implement proper weighted random selection in selectModel()
- Add filterCandidates() helper for failover use

Affected endpoints:
- /v1/chat/completions
- /v1/messages
- /v1/responses

Configuration defaults:
- maxProviderAttempts: 3
- sameProviderRetries: 1
- retriableStatusCodes: [429, 500, 502, 503, 504]
- timeoutMs: 120000 (2 minutes)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address PR review comments for failover implementation

- Change modelId fallback from ?? 0 to undefined to avoid database integrity issues
- Add AbortError handling in isRetriableNetworkError for timeout scenarios
- Fix SSE error format in responses.ts to match OpenAI Responses API spec
- Add error capture to catch blocks for better logging
- Optimize selectMultipleCandidates by calculating totalWeight once

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: extract shared failover error handling to reduce code duplication

- Add processFailoverError helper function in api-helpers.ts
- Refactor completions.ts, messages.ts, responses.ts to use shared helper
- Reduces ~120 lines of duplicated error handling code
- Maintains same behavior: forward non-retriable errors, return 502 on exhaustion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review feedback for failover service

- Use more specific error message patterns in isRetriableNetworkError
  to avoid false positives (e.g., "network error" instead of "network")
- Simplify provider iteration loop using for...of with entries()
  since candidates are already unique and ordered from selectMultipleCandidates
- Remove redundant triedProviders Set

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address additional code review feedback

- Fix SSE error type in messages.ts to use "server_error" per Anthropic spec
- Remove unused "internal_error" case from FailoverErrorResult type

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
@pescn pescn deleted the fix/weighted-load-balancing branch January 24, 2026 22:54
@pescn pescn restored the fix/weighted-load-balancing branch January 24, 2026 22:57
@pescn pescn deleted the fix/weighted-load-balancing branch January 24, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant