Skip to content

feat: support quota mode for BackendTrafficPolicy#7999

Closed
yuzisun wants to merge 7 commits intoenvoyproxy:mainfrom
yuzisun:quota_mode
Closed

feat: support quota mode for BackendTrafficPolicy#7999
yuzisun wants to merge 7 commits intoenvoyproxy:mainfrom
yuzisun:quota_mode

Conversation

@yuzisun
Copy link
Copy Markdown

@yuzisun yuzisun commented Jan 20, 2026

What this PR does / why we need it:
Add quota mode API support for BackendTrafficPolicy which is implemented in envoy ratelimit envoyproxy/ratelimit#1045.

  • Add QuotaMode field added at the rate limit rule level
  • Validation prevents quota mode from being used with local rate limits (only global is supported)
  • xDS translation propagates quota_mode to all descriptor types in the rate limit service configuration
  • Documentation automatically generated showing the new field

Release Notes: Yes

@yuzisun yuzisun requested a review from a team as a code owner January 20, 2026 12:56
@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 20, 2026

Deploy Preview for cerulean-figolla-1f9435 ready!

Name Link
🔨 Latest commit 0ca3b7d
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/69828ba90aa78d000804fd3a
😎 Deploy Preview https://deploy-preview-7999--cerulean-figolla-1f9435.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

// Only supported for Global Rate Limits.
//
// +optional
QuotaMode *bool `json:"quotaMode,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: can you use shadow and quota at the same time?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see test https://github.com/envoyproxy/ratelimit/pull/1045/files#diff-2491de0d8f7753e12d204b647aa71c2e8ab961dd656e76a475966d72e82bd2d4R878.

Global shadow mode overrides the overall response code to OK, Individual descriptor statuses remain accurate (showing which would be over limit).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what I want to know is that what will happen if we set them both on one descriptor?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With both modes turned on then only quota violations are recorded, routing decision is not changed. The quota is over limit on backend A, envoy won't reroute to backend B.

for mIdx, match := range rule.HeaderMatches {
pbDesc := new(rlsconfv3.RateLimitDescriptor)
pbDesc.ShadowMode = isRuleShadowMode(rule)
pbDesc.QuotaMode = isRuleQuotaMode(rule)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to bump go-control-plane?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ye looks like need to update go-control-plane first

//
// +optional
ShadowMode *bool `json:"shadowMode,omitempty"`
// QuotaMode indicates whether this rate-limit rule runs in quota mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this be used in Envoy Gateway ? which metadata will be populated ? does this replace shadow mode ?

Copy link
Copy Markdown
Author

@yuzisun yuzisun Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quotaModeViolations are populated in the metadata with which descriptor indices violated quotas. It is not going to replace shadow mode as the key difference here is that this mode affects the routing not simply observing and envoy is going to use this metadata to do quota aware routing which @yanavlasov is implementing from the envoy side.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks for highlighting this, this feels like a new feature and piggying off ratelimit API doesnt feel right, can we rethink what a quota based routing API would look like

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg this is the prerequisite, in order to do quota based routing, we need rate limit to populate the dynamic metadata and not rejecting with 429 when over the limit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leveraging the ratelimit service to generate quota decision is an implementation detail, the feature here is quota based routing, so the APIs in Envoy Gateway should be geared towards that imo

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg see the proposal here envoyproxy/ai-gateway#1813, we use the rate limit dynamic metadata in the response to set the routing header to be able to route to different endpoint pools when quota limit is over. So this is something between shadow and the normal mode to allow extproc or envoy to make the routing decision by accessing the metadata. It could be a load balancing type on BackendTrafficPolicy in the future, but for now we can use header to select the endpoint pools, what APIs you have in mind ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg are you suggesting making quota configuration independent from the rate limit config?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading the doc dan linked, here's an example of the user facing API in AI Gateway

 perModelQuota:
    - modelName: claude-4-sonnet
      costExpression: input_tokens + 3 * output_tokens + 0.1 * cached_input_tokens + 1.25 * cache_creation_input_tokens
      rules:
        - clientSelectors:
          - headers:
              - name: service_tier
                value: reserved
          quotaValue:
            limit: 1M
            duration: 30s
        - clientSelectors:
            - headers:
              - name: service_tier
                value: default
          quotaValue:
            limit: 2M
            duration: 60s

to achieve this, here's the plumbing Envoy AI Gateway needs to do

  1. Use the EG API to configure its Global RateLimit which in turn configures Envoy Proxy as well Envoy RLS and for this case set the quotaMode in the RLS entry envoyproxy/ratelimit@a28b84d
  2. Edit the xDS Cluster to enable this feature

From an Envoy Gateway perspective the Quota Mode is vague because it doesnt provide an end to end solution, like what AI Gateway provides, it only sets some fields in the RLS Entry which generates the metadata.

One solution could be to piggyback off ShadowMode and also emit this metadata for that case, and document this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @nacx

Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.71%. Comparing base (c3f2982) to head (1c28823).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7999      +/-   ##
==========================================
- Coverage   73.71%   73.71%   -0.01%     
==========================================
  Files         241      241              
  Lines       36552    36561       +9     
==========================================
+ Hits        26944    26950       +6     
- Misses       7703     7704       +1     
- Partials     1905     1907       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

//
// +optional
ShadowMode *bool `json:"shadowMode,omitempty"`
// QuotaMode indicates whether this rate-limit rule runs in quota mode.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg are you suggesting making quota configuration independent from the rate limit config?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 7, 2026

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

@github-actions github-actions Bot added the stale label Mar 7, 2026
@github-actions github-actions Bot closed this Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants