redis: Add zone-aware routing support for Redis Cluster proxy#43012
redis: Add zone-aware routing support for Redis Cluster proxy#43012nezdolik merged 16 commits intoenvoyproxy:mainfrom
Conversation
|
Hi @bellatoris, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
84f9f19 to
b7b0088
Compare
This change implements zone-aware routing for Redis Cluster, allowing read requests to be routed to replicas in the same availability zone as the client. Key changes: - Add enable_zone_discovery config option to redis_cluster.proto - Add az_affinity and az_affinity_replicas_and_primary read policies - Implement INFO command-based zone discovery during cluster slot updates - Store zone info in host locality for standard Envoy locality handling - RedisShard groups replicas by zone for efficient zone-aware routing Zone Discovery Flow: 1. CLUSTER SLOTS response triggers zone discovery when enabled 2. INFO command sent to each unique node to get availability_zone 3. Zones stored in host->locality().zone() when hosts are created 4. RedisShard reads zone from host locality, groups replicas by zone Read Policies: - AzAffinity: local replicas -> any replica -> primary - AzAffinityReplicasAndPrimary: local replicas -> local primary -> any replica -> primary Note: This feature currently works with Valkey only. Valkey exposes availability_zone in its INFO response. Standard Redis does not support this field. Signed-off-by: Doogie Min <doogie.min@sendbird.com>
b7b0088 to
085d27f
Compare
nezdolik
left a comment
There was a problem hiding this comment.
Here is partial review, will need to do few more rounds
|
And looks like tests need to be added/improved, as coverage check is failing: |
Rename the read policy enum values for clarity: - AZ_AFFINITY -> LOCAL_ZONE_AFFINITY - AZ_AFFINITY_REPLICAS_AND_PRIMARY -> LOCAL_ZONE_AFFINITY_REPLICAS_AND_PRIMARY Also includes minor code cleanup: - Remove redundant unique_addresses set (use address_is_primary map keys instead) - Use try_emplace instead of find+insert pattern - Use structured bindings for map iteration - Add clarifying comments for fetch_sub return value semantics - Use static constexpr for string literal constant Signed-off-by: Doogie Min <doogie.min@sendbird.com>
|
/lgtm api |
fd233b4 to
2b3aa82
Compare
Add tests to improve code coverage for the Redis cluster zone discovery feature and new read policies: - Add parseAvailabilityZone tests for various INFO response formats - Add ZoneDiscoveryConfig test fixture for zone discovery testing - Add tests for LOCAL_ZONE_AFFINITY and LOCAL_ZONE_AFFINITY_REPLICAS_AND_PRIMARY read policies in client_impl_test - Add friend declaration to RedisDiscoverySession for test access These tests cover previously untested code paths in redis_cluster.cc and client_impl.cc to help meet coverage thresholds. Signed-off-by: Doogie Min <doogie.min@sendbird.com>
2db9fa9 to
186289b
Compare
5cd4e9b to
13f2bef
Compare
Done, could you review again @nezdolik @markdroth ? |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces zone-aware routing for the Redis Cluster proxy, a valuable feature for reducing cross-zone traffic. The changes are well-structured, touching the API, core discovery and load balancing logic, and connection pooling. The implementation includes a new zone discovery mechanism using the INFO command, which is cleanly integrated into the existing cluster discovery session. The load balancing logic is extended to support two new read policies with clear fallback semantics. The addition of comprehensive unit and integration tests covering both the discovery and load balancing aspects is commendable. I found one high-severity memory safety issue that should be addressed.
|
@markdroth ptal. |
…ing hazard onZoneResponse and onZoneDiscoveryFailure receive address as const ref aliasing ZoneDiscoveryCallback::address_. When zone_callbacks_.erase(address) destroys the callback, the reference becomes dangling during the erase call. Pass by value to ensure the string outlives the erase. Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
448cfc5 to
94f0de7
Compare
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
|
/retest |
|
@wbpcode could you do an api shepherd review? |
|
Hey @markdroth, would you mind taking another look at this? Looks like your earlier approval was dismissed by a new commit. |
|
/lgtm api |
…roxy#43012) Summary Zone-aware routing reduces cross-zone network traffic and latency by preferring replicas in the same availability zone as the client. This is particularly valuable in cloud environments where cross-zone data transfer incurs additional costs. New Configuration redis_cluster.proto: ``` // Enable zone discovery via INFO command google.protobuf.BoolValue enable_zone_discovery = 7; ``` redis_proxy.proto - New ReadPolicy values: ``` - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary ``` How It Works ``` ┌─────────────────────────────────────────────────────────┐ │ CLUSTER SLOTS Response │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Zone Discovery (if enable_zone_discovery=true) │ │ Send INFO to each node → parse availability_zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Create hosts with zone in locality.zone() │ │ RedisShard groups replicas by zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Request Routing │ │ Client zone from node.locality.zone (bootstrap) │ │ AZ_AFFINITY: local replicas → any replica → primary │ └─────────────────────────────────────────────────────────┘ ``` Limitations - Valkey only: Zone discovery relies on availability_zone field in INFO response, which is exposed by Valkey but not standard Redis. - Client zone must be configured via node.locality.zone in Envoy bootstrap config. Risk Level: Low This is an opt-in feature that requires explicit configuration (enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing deployments are unaffected. Testing - Added comprehensive unit tests for zone-aware load balancing Related Issues Closes envoyproxy#43011 --------- Signed-off-by: Doogie Min <doogie.min@sendbird.com>
…roxy#43012) Summary Zone-aware routing reduces cross-zone network traffic and latency by preferring replicas in the same availability zone as the client. This is particularly valuable in cloud environments where cross-zone data transfer incurs additional costs. New Configuration redis_cluster.proto: ``` // Enable zone discovery via INFO command google.protobuf.BoolValue enable_zone_discovery = 7; ``` redis_proxy.proto - New ReadPolicy values: ``` - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary ``` How It Works ``` ┌─────────────────────────────────────────────────────────┐ │ CLUSTER SLOTS Response │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Zone Discovery (if enable_zone_discovery=true) │ │ Send INFO to each node → parse availability_zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Create hosts with zone in locality.zone() │ │ RedisShard groups replicas by zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Request Routing │ │ Client zone from node.locality.zone (bootstrap) │ │ AZ_AFFINITY: local replicas → any replica → primary │ └─────────────────────────────────────────────────────────┘ ``` Limitations - Valkey only: Zone discovery relies on availability_zone field in INFO response, which is exposed by Valkey but not standard Redis. - Client zone must be configured via node.locality.zone in Envoy bootstrap config. Risk Level: Low This is an opt-in feature that requires explicit configuration (enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing deployments are unaffected. Testing - Added comprehensive unit tests for zone-aware load balancing Related Issues Closes envoyproxy#43011 --------- Signed-off-by: Doogie Min <doogie.min@sendbird.com> Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
…roxy#43012) Summary Zone-aware routing reduces cross-zone network traffic and latency by preferring replicas in the same availability zone as the client. This is particularly valuable in cloud environments where cross-zone data transfer incurs additional costs. New Configuration redis_cluster.proto: ``` // Enable zone discovery via INFO command google.protobuf.BoolValue enable_zone_discovery = 7; ``` redis_proxy.proto - New ReadPolicy values: ``` - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary ``` How It Works ``` ┌─────────────────────────────────────────────────────────┐ │ CLUSTER SLOTS Response │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Zone Discovery (if enable_zone_discovery=true) │ │ Send INFO to each node → parse availability_zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Create hosts with zone in locality.zone() │ │ RedisShard groups replicas by zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Request Routing │ │ Client zone from node.locality.zone (bootstrap) │ │ AZ_AFFINITY: local replicas → any replica → primary │ └─────────────────────────────────────────────────────────┘ ``` Limitations - Valkey only: Zone discovery relies on availability_zone field in INFO response, which is exposed by Valkey but not standard Redis. - Client zone must be configured via node.locality.zone in Envoy bootstrap config. Risk Level: Low This is an opt-in feature that requires explicit configuration (enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing deployments are unaffected. Testing - Added comprehensive unit tests for zone-aware load balancing Related Issues Closes envoyproxy#43011 --------- Signed-off-by: Doogie Min <doogie.min@sendbird.com>
…roxy#43012) Summary Zone-aware routing reduces cross-zone network traffic and latency by preferring replicas in the same availability zone as the client. This is particularly valuable in cloud environments where cross-zone data transfer incurs additional costs. New Configuration redis_cluster.proto: ``` // Enable zone discovery via INFO command google.protobuf.BoolValue enable_zone_discovery = 7; ``` redis_proxy.proto - New ReadPolicy values: ``` - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary ``` How It Works ``` ┌─────────────────────────────────────────────────────────┐ │ CLUSTER SLOTS Response │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Zone Discovery (if enable_zone_discovery=true) │ │ Send INFO to each node → parse availability_zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Create hosts with zone in locality.zone() │ │ RedisShard groups replicas by zone │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Request Routing │ │ Client zone from node.locality.zone (bootstrap) │ │ AZ_AFFINITY: local replicas → any replica → primary │ └─────────────────────────────────────────────────────────┘ ``` Limitations - Valkey only: Zone discovery relies on availability_zone field in INFO response, which is exposed by Valkey but not standard Redis. - Client zone must be configured via node.locality.zone in Envoy bootstrap config. Risk Level: Low This is an opt-in feature that requires explicit configuration (enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing deployments are unaffected. Testing - Added comprehensive unit tests for zone-aware load balancing Related Issues Closes envoyproxy#43011 --------- Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Upstream-merged zone-aware patch (envoyproxy#43012) was authored against envoy main where HostImpl was already updated to take std::shared_ptr<const Locality>. On our v1.36-based release/9b72caf-sendbird-custom branch, HostImpl still takes a const Locality& reference, so passing makeLocalityWithZone(...) directly fails to compile. Dereference the shared_ptr to match HostImpl's expected signature on this branch.
Summary
Zone-aware routing reduces cross-zone network traffic and latency by preferring replicas in the same availability zone as the client. This is particularly valuable in cloud environments where cross-zone data transfer incurs additional costs.
New Configuration
redis_cluster.proto:
redis_proxy.proto - New ReadPolicy values:
How It Works
Limitations
Risk Level: Low
This is an opt-in feature that requires explicit configuration (enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing deployments are unaffected.
Testing
Related Issues
Closes #43011