feat: support quota mode for BackendTrafficPolicy#7999
feat: support quota mode for BackendTrafficPolicy#7999yuzisun wants to merge 7 commits intoenvoyproxy:mainfrom
Conversation
✅ Deploy Preview for cerulean-figolla-1f9435 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
| // Only supported for Global Rate Limits. | ||
| // | ||
| // +optional | ||
| QuotaMode *bool `json:"quotaMode,omitempty"` |
There was a problem hiding this comment.
qq: can you use shadow and quota at the same time?
There was a problem hiding this comment.
Global shadow mode overrides the overall response code to OK, Individual descriptor statuses remain accurate (showing which would be over limit).
There was a problem hiding this comment.
Sorry, what I want to know is that what will happen if we set them both on one descriptor?
There was a problem hiding this comment.
With both modes turned on then only quota violations are recorded, routing decision is not changed. The quota is over limit on backend A, envoy won't reroute to backend B.
| for mIdx, match := range rule.HeaderMatches { | ||
| pbDesc := new(rlsconfv3.RateLimitDescriptor) | ||
| pbDesc.ShadowMode = isRuleShadowMode(rule) | ||
| pbDesc.QuotaMode = isRuleQuotaMode(rule) |
There was a problem hiding this comment.
ye looks like need to update go-control-plane first
| // | ||
| // +optional | ||
| ShadowMode *bool `json:"shadowMode,omitempty"` | ||
| // QuotaMode indicates whether this rate-limit rule runs in quota mode. |
There was a problem hiding this comment.
how will this be used in Envoy Gateway ? which metadata will be populated ? does this replace shadow mode ?
There was a problem hiding this comment.
quotaModeViolations are populated in the metadata with which descriptor indices violated quotas. It is not going to replace shadow mode as the key difference here is that this mode affects the routing not simply observing and envoy is going to use this metadata to do quota aware routing which @yanavlasov is implementing from the envoy side.
There was a problem hiding this comment.
cool, thanks for highlighting this, this feels like a new feature and piggying off ratelimit API doesnt feel right, can we rethink what a quota based routing API would look like
There was a problem hiding this comment.
@arkodg this is the prerequisite, in order to do quota based routing, we need rate limit to populate the dynamic metadata and not rejecting with 429 when over the limit.
There was a problem hiding this comment.
leveraging the ratelimit service to generate quota decision is an implementation detail, the feature here is quota based routing, so the APIs in Envoy Gateway should be geared towards that imo
There was a problem hiding this comment.
@arkodg see the proposal here envoyproxy/ai-gateway#1813, we use the rate limit dynamic metadata in the response to set the routing header to be able to route to different endpoint pools when quota limit is over. So this is something between shadow and the normal mode to allow extproc or envoy to make the routing decision by accessing the metadata. It could be a load balancing type on BackendTrafficPolicy in the future, but for now we can use header to select the endpoint pools, what APIs you have in mind ?
There was a problem hiding this comment.
@arkodg are you suggesting making quota configuration independent from the rate limit config?
There was a problem hiding this comment.
reading the doc dan linked, here's an example of the user facing API in AI Gateway
perModelQuota:
- modelName: claude-4-sonnet
costExpression: input_tokens + 3 * output_tokens + 0.1 * cached_input_tokens + 1.25 * cache_creation_input_tokens
rules:
- clientSelectors:
- headers:
- name: service_tier
value: reserved
quotaValue:
limit: 1M
duration: 30s
- clientSelectors:
- headers:
- name: service_tier
value: default
quotaValue:
limit: 2M
duration: 60s
to achieve this, here's the plumbing Envoy AI Gateway needs to do
- Use the EG API to configure its Global RateLimit which in turn configures Envoy Proxy as well Envoy RLS and for this case set the
quotaModein the RLS entry envoyproxy/ratelimit@a28b84d - Edit the xDS Cluster to enable this feature
From an Envoy Gateway perspective the Quota Mode is vague because it doesnt provide an end to end solution, like what AI Gateway provides, it only sets some fields in the RLS Entry which generates the metadata.
One solution could be to piggyback off ShadowMode and also emit this metadata for that case, and document this
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7999 +/- ##
==========================================
- Coverage 73.71% 73.71% -0.01%
==========================================
Files 241 241
Lines 36552 36561 +9
==========================================
+ Hits 26944 26950 +6
- Misses 7703 7704 +1
- Partials 1905 1907 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
| // | ||
| // +optional | ||
| ShadowMode *bool `json:"shadowMode,omitempty"` | ||
| // QuotaMode indicates whether this rate-limit rule runs in quota mode. |
There was a problem hiding this comment.
@arkodg are you suggesting making quota configuration independent from the rate limit config?
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions! |
What this PR does / why we need it:
Add quota mode API support for BackendTrafficPolicy which is implemented in envoy ratelimit envoyproxy/ratelimit#1045.
Release Notes: Yes