local rate limit: add cross local cluster rate limit support#34276
local rate limit: add cross local cluster rate limit support#34276wbpcode merged 16 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
…ross-instance-local-limit
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Signed-off-by: wbpcode <wbphub@live.com>
| // Configuration that used to enable local cluster level rate limiting where the token buckets | ||
| // will be shared across all the Envoy instances in the local cluster. | ||
| // A share will be calculated based on the membership of the local cluster dynamically | ||
| // and the configuration. When the limiter refilling the token bucket, the share will be |
There was a problem hiding this comment.
What configuration?
There was a problem hiding this comment.
The configuration from LocalClusterRateLimit. It's empty now because only evenly share is supported.
There was a problem hiding this comment.
@wbpcode could you add something like your reply above It's empty now because only evenly share is supported.
It is not obvious for future reader. we can do it in the next PR
There was a problem hiding this comment.
Got it. Will do it with a new PR.
|
Can we have Tianyu take a pass and then I will send to a maintainer? |
|
Seems @markdroth is busy recently, re-assign this to @adisuissa for API review. |
adisuissa
left a comment
There was a problem hiding this comment.
Interesting idea, thanks!
Left high-level comments/questions.
…s-instance-local-limit
…ross-instance-local-limit
|
/retest |
|
@adisuissa looks like this is ready for review - the main merge is just for review notes |
adisuissa
left a comment
There was a problem hiding this comment.
Thanks!
I've left a few high-level comments.
One other idea: will it be possible to redesign this so that each instance owns a reference to the endpointStats() object or the membership_total object of the static local cluster, and when tokensPerFill is called, it will fetch the current number of endpoints?
I think this will still need to by guarded (mutex), if the membership_total is not atomic. However, it will remove the additional share-monitor, and the member-update callback (may simplify the solution).
…ross-instance-local-limit
It's my initial implementation when doing the POC. It's actually simpler but means it's hard to use different algorithms to calculate the ratio. For example, we may want take the weight into the account in the future. Current design provides a well defined interface and abstraction for more complex share calculating. |
Signed-off-by: wbpcode <wbphub@live.com>
adisuissa
left a comment
There was a problem hiding this comment.
Thanks, overall LGTM.
Left some minor comments.
Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: code <wangbaiping@corp.netease.com>
…e_limit.proto Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: code <wangbaiping@corp.netease.com>
Signed-off-by: wbpcode <wbphub@live.com>
|
/retest |
…ross-instance-local-limit
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
There was a problem hiding this comment.
Interesting and nice idea!
We have worked on sharing the token bucket across threads within Envoy instance. This PR is sharing the token bucket across instances within cluster.
AFAIU, the keys are cluster owns the Envoy running this configuration and compute share for the member instances within cluster (there is no actual data sharing between instances)
| // Configuration that used to enable local cluster level rate limiting where the token buckets | ||
| // will be shared across all the Envoy instances in the local cluster. | ||
| // A share will be calculated based on the membership of the local cluster dynamically | ||
| // and the configuration. When the limiter refilling the token bucket, the share will be |
There was a problem hiding this comment.
@wbpcode could you add something like your reply above It's empty now because only evenly share is supported.
It is not obvious for future reader. we can do it in the next PR
…oxy#34276) * local rate limit: add cross local cluster rate limit support Signed-off-by: wbpcode <wbphub@live.com> * change log Signed-off-by: wbpcode <wbphub@live.com> * fix typo Signed-off-by: wbpcode <wbphub@live.com> * add integration tests Signed-off-by: wbpcode <wbphub@live.com> * fix test Signed-off-by: wbpcode <wbphub@live.com> * main thread assert Signed-off-by: wbpcode <wbphub@live.com> * Update api/envoy/extensions/common/ratelimit/v3/ratelimit.proto Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: code <wangbaiping@corp.netease.com> * Update api/envoy/extensions/filters/http/local_ratelimit/v3/local_rate_limit.proto Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: code <wangbaiping@corp.netease.com> * address comments Signed-off-by: wbpcode <wbphub@live.com> * remove macro after envoyproxy#34766 Signed-off-by: wbpcode <wbphub@live.com> * resolve confliction after merge main Signed-off-by: wbpcode <wbphub@live.com> --------- Signed-off-by: wbpcode <wbphub@live.com> Signed-off-by: code <wangbaiping@corp.netease.com> Co-authored-by: wbpcode <wbphub@live.com> Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: Nigel Brittain <nbaws@amazon.com> change init behaviour Signed-off-by: Nigel Brittain <nbaws@amazon.com> extraneous comments Signed-off-by: Nigel Brittain <nbaws@amazon.com> fix cluster init Signed-off-by: Nigel Brittain <nbaws@amazon.com> cluster fix Signed-off-by: Nigel Brittain <nbaws@amazon.com> format Signed-off-by: Nigel Brittain <nbaws@amazon.com> Signed-off-by: code <wbphub@gmail.com>
…oxy#34276) * local rate limit: add cross local cluster rate limit support Signed-off-by: wbpcode <wbphub@live.com> * change log Signed-off-by: wbpcode <wbphub@live.com> * fix typo Signed-off-by: wbpcode <wbphub@live.com> * add integration tests Signed-off-by: wbpcode <wbphub@live.com> * fix test Signed-off-by: wbpcode <wbphub@live.com> * main thread assert Signed-off-by: wbpcode <wbphub@live.com> * Update api/envoy/extensions/common/ratelimit/v3/ratelimit.proto Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: code <wangbaiping@corp.netease.com> * Update api/envoy/extensions/filters/http/local_ratelimit/v3/local_rate_limit.proto Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: code <wangbaiping@corp.netease.com> * address comments Signed-off-by: wbpcode <wbphub@live.com> * remove macro after envoyproxy#34766 Signed-off-by: wbpcode <wbphub@live.com> * resolve confliction after merge main Signed-off-by: wbpcode <wbphub@live.com> --------- Signed-off-by: wbpcode <wbphub@live.com> Signed-off-by: code <wangbaiping@corp.netease.com> Co-authored-by: wbpcode <wbphub@live.com> Co-authored-by: Adi (Suissa) Peleg <adip@google.com> Signed-off-by: antoniovleonti <leonti@google.com>
Commit Message: local rate limit: add cross local cluster rate limit support
Additional Description:
Envoy provides lots of rate limit related filters/features. The global rate limit provides the most powerful feature. But it also introduce additional dependencies (rate limit server, redis, etc) and latency (additional calling to rate limit server).
The local rate limit is more stable and has better performance, and has no dependency to external server. But the local rate limit is work at single instance or connection scope. That means that if local rate limit is used, we cannot get a stable total limit for a Envoy cluster. Because the total limit of local rate limit filter is single instance limit multiply the instance number of Envoy. But the instance number may changed when the cluster scaling.
This PR add a new interesting feature. It make the local rate limit filter could aware the membership of the local cluster (the cluster contains current Envoy self). That means the local rate limit could compute it's tokens based on the membership of local cluster and achieve the target to share the total limit between multiple Envoy instances (a Envoy cluster).
See #34230 for more discussion.
Risk Level: low. (nothing will be changed if we don't enable the feature explicitly)
Testing: unit.
Docs Changes: n/a.
Release Notes: n/a.
Platform Specific Features: n/a.