Skip to content

feat(qwen): add rate limiting and quota error handling#1676

Merged
luispater merged 1 commit intorouter-for-me:devfrom
piexian:feat/qwen-quota-handling-clean
Feb 24, 2026
Merged

feat(qwen): add rate limiting and quota error handling#1676
luispater merged 1 commit intorouter-for-me:devfrom
piexian:feat/qwen-quota-handling-clean

Conversation

@piexian
Copy link
Copy Markdown
Contributor

@piexian piexian commented Feb 22, 2026

  • Add 60 requests/minute rate limiting per credential using sliding window
  • Detect insufficient_quota errors and set cooldown until next day (Beijing time)
  • Map quota errors (HTTP 403/429) to 429 with retryAfter for conductor integration
  • Cache Beijing timezone at package level to avoid repeated syscalls
  • Add redactAuthID function to protect credentials in logs
  • Extract wrapQwenError helper to consolidate error handling

- Add 60 requests/minute rate limiting per credential using sliding window
- Detect insufficient_quota errors and set cooldown until next day (Beijing time)
- Map quota errors (HTTP 403/429) to 429 with retryAfter for conductor integration
- Cache Beijing timezone at package level to avoid repeated syscalls
- Add redactAuthID function to protect credentials in logs
- Extract wrapQwenError helper to consolidate error handling
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @piexian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and reliability of the Qwen API executor by introducing comprehensive rate limiting and intelligent quota error handling. It ensures that API calls respect Qwen's usage policies, preventing excessive requests and gracefully managing daily quota exhaustion by applying appropriate cooldown periods. These changes improve the system's resilience and provide better feedback for upstream services like conductor.

Highlights

  • Rate Limiting Implemented: Added a 60 requests/minute sliding window rate limit per credential for Qwen API calls to prevent abuse and adhere to service limits.
  • Quota Error Handling: Introduced detection for 'insufficient_quota' errors and applied a cooldown until the next day (Beijing time) to manage daily quota exhaustion gracefully.
  • Error Mapping for Conductor: Mapped Qwen quota errors (HTTP 403/429) to a generic 429 status with 'retryAfter' for better integration with upstream conductor services.
  • Performance Optimization: Cached the Beijing timezone at the package level to reduce repeated 'LoadLocation' syscalls and improve efficiency.
  • Security & Logging: Added a 'redactAuthID' function to safely log credential IDs by redacting sensitive parts, enhancing security and log readability.
  • Error Handling Refinement: Extracted a 'wrapQwenError' helper to centralize and consolidate Qwen-specific error processing logic.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/runtime/executor/qwen_executor.go
    • Added 'sync' package import for concurrency control.
    • Defined constants for Qwen rate limiting parameters.
    • Introduced 'qwenBeijingLoc' to cache the "Asia/Shanghai" timezone for consistent time calculations.
    • Created 'qwenQuotaCodes' to identify specific Qwen quota exhaustion error codes.
    • Implemented 'qwenRateLimiter' with a mutex and map to track request timestamps per credential.
    • Added 'redactAuthID' function to mask sensitive parts of authentication IDs for logging.
    • Developed 'checkQwenRateLimit' function to enforce the 60 requests/minute sliding window limit.
    • Created 'isQwenQuotaError' function to parse Qwen API responses for quota exhaustion indicators.
    • Implemented 'wrapQwenError' to transform Qwen HTTP error responses, specifically mapping quota errors to HTTP 429 with a calculated 'retryAfter' duration.
    • Added 'timeUntilNextDay' function to determine the duration until the next day's midnight in Beijing time.
    • Integrated 'checkQwenRateLimit' into both 'Execute' and 'ExecuteStream' methods to apply rate limiting before making API calls.
    • Modified error handling in 'Execute' and 'ExecuteStream' to utilize 'wrapQwenError' for consistent error processing and 'retryAfter' calculation.
    • Removed redundant 'authID' variable declarations in 'Execute' and 'ExecuteStream' where it was no longer directly used after refactoring.
Activity
  • No human activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces rate limiting and quota error handling for Qwen API calls. It includes a sliding window rate limiter, detection of insufficient_quota errors to set a cooldown until the next day (Beijing time), and mapping of Qwen quota errors to HTTP 429 with retryAfter. Additionally, it caches the Beijing timezone, redacts authentication IDs for logging, and consolidates error handling into a helper function. The changes are well-structured and address important aspects of API interaction, such as robustness and logging.

"io"
"net/http"
"strings"
"sync"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sync package is imported but not used in this diff hunk. While it's used later in the file, it's good practice to only import packages when they are immediately needed to keep the import list clean and avoid potential circular dependencies or unnecessary compilation.

Comment on lines +95 to +97
if len(validTimestamps) == 0 {
delete(qwenRateLimiter.requests, authID)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment "Delete empty entries, otherwise update with pruned slice" is slightly misleading. The if len(validTimestamps) == 0 block deletes the entry, but if validTimestamps is not empty, the qwenRateLimiter.requests[authID] = validTimestamps line (line 117) updates the entry, effectively pruning it. Consider rephrasing for clarity.

Suggested change
if len(validTimestamps) == 0 {
delete(qwenRateLimiter.requests, authID)
}
// Delete entry if no valid timestamps remain, otherwise update with pruned slice
if len(validTimestamps) == 0 {
delete(qwenRateLimiter.requests, authID)
}

Comment on lines +104 to +106
if retryAfter < time.Second {
retryAfter = time.Second
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The retryAfter calculation ensures a minimum of 1 second. This is a good defensive measure, but it might be clearer to define time.Second as a named constant if it's used in multiple places or if the value might change in the future.

Comment on lines +137 to +140
msg := strings.ToLower(gjson.GetBytes(body, "error.message").String())
if strings.Contains(msg, "insufficient_quota") || strings.Contains(msg, "quota exceeded") ||
strings.Contains(msg, "free allocated quota exceeded") {
return true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback check for error.message is less reliable as noted in the comment. While necessary, ensure that the qwenQuotaCodes map is kept up-to-date with any new official error codes from Qwen to minimize reliance on string matching in messages.

Comment on lines +259 to 260
var authLabel, authType, authValue string
if auth != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The authID variable is declared and used in the checkQwenRateLimit call, but then it's removed from the var authLabel, authType, authValue string declaration. This is fine, but ensure that authID is still correctly passed to recordAPIRequest (line 270) and that its removal from the var declaration doesn't cause any issues with other parts of the code that might expect it there.

Comment on lines +369 to 370
var authLabel, authType, authValue string
if auth != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the non-streaming Execute method, the authID variable is removed from this var declaration. Ensure its usage in recordAPIRequest (line 380) is still correct.

@luispater luispater changed the base branch from main to dev February 24, 2026 17:07
@luispater luispater merged commit f1e9a78 into router-for-me:dev Feb 24, 2026
2 checks passed
AoaoMH pushed a commit to AoaoMH/CLIProxyAPI-Aoao that referenced this pull request Mar 3, 2026
…ndling-clean

feat(qwen): add rate limiting and quota error handling
@piexian piexian deleted the feat/qwen-quota-handling-clean branch March 13, 2026 09:48
xiaojiou176 pushed a commit to xiaojiou176/CLIProxyAPI that referenced this pull request Mar 27, 2026
…ndling-clean

feat(qwen): add rate limiting and quota error handling
xiaojiou176 pushed a commit to xiaojiou176/CLIProxyAPI that referenced this pull request Mar 29, 2026
…ndling-clean

feat(qwen): add rate limiting and quota error handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants