Skip to content

redis: Add zone-aware routing support for Redis Cluster proxy#43012

Merged
nezdolik merged 16 commits intoenvoyproxy:mainfrom
bellatoris:doogie/redis-zone-aware-routing
Apr 7, 2026
Merged

redis: Add zone-aware routing support for Redis Cluster proxy#43012
nezdolik merged 16 commits intoenvoyproxy:mainfrom
bellatoris:doogie/redis-zone-aware-routing

Conversation

@bellatoris
Copy link
Copy Markdown
Contributor

Summary

Zone-aware routing reduces cross-zone network traffic and latency by preferring replicas in the same availability zone as the client. This is particularly valuable in cloud environments where cross-zone data transfer incurs additional costs.

New Configuration

redis_cluster.proto:

  // Enable zone discovery via INFO command
  google.protobuf.BoolValue enable_zone_discovery = 7;

redis_proxy.proto - New ReadPolicy values:

  - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary
  - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary

How It Works

  ┌─────────────────────────────────────────────────────────┐
  │              CLUSTER SLOTS Response                     │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Zone Discovery (if enable_zone_discovery=true)      │
  │     Send INFO to each node → parse availability_zone    │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Create hosts with zone in locality.zone()           │
  │     RedisShard groups replicas by zone                  │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Request Routing                                     │
  │     Client zone from node.locality.zone (bootstrap)     │
  │     AZ_AFFINITY: local replicas → any replica → primary │
  └─────────────────────────────────────────────────────────┘

Limitations

  • Valkey only: Zone discovery relies on availability_zone field in INFO response, which is exposed by Valkey but not standard Redis.
  • Client zone must be configured via node.locality.zone in Envoy bootstrap config.

Risk Level: Low

This is an opt-in feature that requires explicit configuration (enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing deployments are unaffected.

Testing

  • Added comprehensive unit tests for zone-aware load balancing

Related Issues

Closes #43011

@repokitteh-read-only
Copy link
Copy Markdown

Hi @bellatoris, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #43012 was opened by bellatoris.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #43012 was opened by bellatoris.

see: more, trace.

@bellatoris bellatoris force-pushed the doogie/redis-zone-aware-routing branch 2 times, most recently from 84f9f19 to b7b0088 Compare January 15, 2026 07:57
This change implements zone-aware routing for Redis Cluster, allowing read
requests to be routed to replicas in the same availability zone as the client.

Key changes:
- Add enable_zone_discovery config option to redis_cluster.proto
- Add az_affinity and az_affinity_replicas_and_primary read policies
- Implement INFO command-based zone discovery during cluster slot updates
- Store zone info in host locality for standard Envoy locality handling
- RedisShard groups replicas by zone for efficient zone-aware routing

Zone Discovery Flow:
1. CLUSTER SLOTS response triggers zone discovery when enabled
2. INFO command sent to each unique node to get availability_zone
3. Zones stored in host->locality().zone() when hosts are created
4. RedisShard reads zone from host locality, groups replicas by zone

Read Policies:
- AzAffinity: local replicas -> any replica -> primary
- AzAffinityReplicasAndPrimary: local replicas -> local primary -> any replica -> primary

Note: This feature currently works with Valkey only. Valkey exposes
availability_zone in its INFO response. Standard Redis does not support this field.

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
@bellatoris bellatoris force-pushed the doogie/redis-zone-aware-routing branch from b7b0088 to 085d27f Compare January 15, 2026 07:59
@nezdolik nezdolik self-assigned this Jan 16, 2026
Copy link
Copy Markdown
Member

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is partial review, will need to do few more rounds

Comment thread source/extensions/clusters/redis/redis_cluster.cc Outdated
Comment thread source/extensions/clusters/redis/redis_cluster.cc Outdated
Comment thread source/extensions/clusters/redis/redis_cluster.cc Outdated
Comment thread source/extensions/clusters/redis/redis_cluster.cc Outdated
Comment thread source/extensions/clusters/redis/redis_cluster.cc Outdated
@nezdolik
Copy link
Copy Markdown
Member

And looks like tests need to be added/improved, as coverage check is failing:

FAILED: Directories not meeting coverage thresholds:
  ✗ source/extensions/clusters/redis: 84.7% (threshold: 96.6%)
  ✗ source/extensions/filters/network/common/redis: 96.4% (threshold: 96.6%)

Comment thread api/envoy/extensions/filters/network/redis_proxy/v3/redis_proxy.proto Outdated
Rename the read policy enum values for clarity:
- AZ_AFFINITY -> LOCAL_ZONE_AFFINITY
- AZ_AFFINITY_REPLICAS_AND_PRIMARY -> LOCAL_ZONE_AFFINITY_REPLICAS_AND_PRIMARY

Also includes minor code cleanup:
- Remove redundant unique_addresses set (use address_is_primary map keys instead)
- Use try_emplace instead of find+insert pattern
- Use structured bindings for map iteration
- Add clarifying comments for fetch_sub return value semantics
- Use static constexpr for string literal constant

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
@markdroth
Copy link
Copy Markdown
Contributor

/lgtm api

@bellatoris bellatoris force-pushed the doogie/redis-zone-aware-routing branch 9 times, most recently from fd233b4 to 2b3aa82 Compare January 29, 2026 13:11
Add tests to improve code coverage for the Redis cluster zone discovery
feature and new read policies:

- Add parseAvailabilityZone tests for various INFO response formats
- Add ZoneDiscoveryConfig test fixture for zone discovery testing
- Add tests for LOCAL_ZONE_AFFINITY and LOCAL_ZONE_AFFINITY_REPLICAS_AND_PRIMARY
  read policies in client_impl_test
- Add friend declaration to RedisDiscoverySession for test access

These tests cover previously untested code paths in redis_cluster.cc and
client_impl.cc to help meet coverage thresholds.

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
@bellatoris bellatoris force-pushed the doogie/redis-zone-aware-routing branch 2 times, most recently from 2db9fa9 to 186289b Compare January 29, 2026 15:39
@bellatoris bellatoris force-pushed the doogie/redis-zone-aware-routing branch from 5cd4e9b to 13f2bef Compare March 20, 2026 17:39
@bellatoris
Copy link
Copy Markdown
Contributor Author

overall lgtm, few nits while we wait for api review

Done, could you review again @nezdolik @markdroth ?

@bellatoris bellatoris requested a review from nezdolik March 20, 2026 18:02
@nezdolik
Copy link
Copy Markdown
Member

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces zone-aware routing for the Redis Cluster proxy, a valuable feature for reducing cross-zone traffic. The changes are well-structured, touching the API, core discovery and load balancing logic, and connection pooling. The implementation includes a new zone discovery mechanism using the INFO command, which is cleanly integrated into the existing cluster discovery session. The load balancing logic is extended to support two new read policies with clear fallback semantics. The addition of comprehensive unit and integration tests covering both the discovery and load balancing aspects is commendable. I found one high-severity memory safety issue that should be addressed.

Comment thread source/extensions/clusters/redis/redis_cluster.h Outdated
@nezdolik
Copy link
Copy Markdown
Member

@markdroth ptal.

…ing hazard

onZoneResponse and onZoneDiscoveryFailure receive address as const ref
aliasing ZoneDiscoveryCallback::address_. When zone_callbacks_.erase(address)
destroys the callback, the reference becomes dangling during the erase call.
Pass by value to ensure the string outlives the erase.

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
@bellatoris bellatoris force-pushed the doogie/redis-zone-aware-routing branch from 448cfc5 to 94f0de7 Compare March 24, 2026 02:19
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
@bellatoris
Copy link
Copy Markdown
Contributor Author

/retest

@nezdolik
Copy link
Copy Markdown
Member

@wbpcode could you do an api shepherd review?

@bellatoris
Copy link
Copy Markdown
Contributor Author

Hey @markdroth, would you mind taking another look at this? Looks like your earlier approval was dismissed by a new commit.

@markdroth
Copy link
Copy Markdown
Contributor

/lgtm api

@repokitteh-read-only repokitteh-read-only Bot removed the api label Apr 3, 2026
@nezdolik nezdolik enabled auto-merge (squash) April 7, 2026 08:33
@nezdolik nezdolik merged commit f1e84f0 into envoyproxy:main Apr 7, 2026
30 checks passed
@bellatoris bellatoris deleted the doogie/redis-zone-aware-routing branch April 7, 2026 08:42
gavin-jeong pushed a commit to sendbird/envoy that referenced this pull request Apr 10, 2026
…roxy#43012)

Summary

Zone-aware routing reduces cross-zone network traffic and latency by
preferring replicas in the same availability zone as the client. This is
particularly valuable in cloud environments where cross-zone data
transfer incurs additional costs.

  New Configuration

  redis_cluster.proto:
```
  // Enable zone discovery via INFO command
  google.protobuf.BoolValue enable_zone_discovery = 7;
```
  redis_proxy.proto - New ReadPolicy values:
```
  - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary
  - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary
```

  How It Works
```
  ┌─────────────────────────────────────────────────────────┐
  │              CLUSTER SLOTS Response                     │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Zone Discovery (if enable_zone_discovery=true)      │
  │     Send INFO to each node → parse availability_zone    │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Create hosts with zone in locality.zone()           │
  │     RedisShard groups replicas by zone                  │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Request Routing                                     │
  │     Client zone from node.locality.zone (bootstrap)     │
  │     AZ_AFFINITY: local replicas → any replica → primary │
  └─────────────────────────────────────────────────────────┘
```
  Limitations

- Valkey only: Zone discovery relies on availability_zone field in INFO
response, which is exposed by Valkey but not standard Redis.
- Client zone must be configured via node.locality.zone in Envoy
bootstrap config.

  Risk Level: Low

This is an opt-in feature that requires explicit configuration
(enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing
deployments are unaffected.

  Testing

  - Added comprehensive unit tests for zone-aware load balancing

  Related Issues

  Closes envoyproxy#43011

---------

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
nshipilov pushed a commit to nshipilov/envoy that referenced this pull request Apr 13, 2026
…roxy#43012)

Summary

Zone-aware routing reduces cross-zone network traffic and latency by
preferring replicas in the same availability zone as the client. This is
particularly valuable in cloud environments where cross-zone data
transfer incurs additional costs.

  New Configuration

  redis_cluster.proto:
```
  // Enable zone discovery via INFO command
  google.protobuf.BoolValue enable_zone_discovery = 7;
```
  redis_proxy.proto - New ReadPolicy values:
```
  - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary
  - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary
```

  How It Works
```
  ┌─────────────────────────────────────────────────────────┐
  │              CLUSTER SLOTS Response                     │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Zone Discovery (if enable_zone_discovery=true)      │
  │     Send INFO to each node → parse availability_zone    │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Create hosts with zone in locality.zone()           │
  │     RedisShard groups replicas by zone                  │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Request Routing                                     │
  │     Client zone from node.locality.zone (bootstrap)     │
  │     AZ_AFFINITY: local replicas → any replica → primary │
  └─────────────────────────────────────────────────────────┘
```
  Limitations

- Valkey only: Zone discovery relies on availability_zone field in INFO
response, which is exposed by Valkey but not standard Redis.
- Client zone must be configured via node.locality.zone in Envoy
bootstrap config.

  Risk Level: Low

This is an opt-in feature that requires explicit configuration
(enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing
deployments are unaffected.

  Testing

  - Added comprehensive unit tests for zone-aware load balancing

  Related Issues

  Closes envoyproxy#43011

---------

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
krinkinmu pushed a commit to grnmeira/envoy that referenced this pull request Apr 20, 2026
…roxy#43012)

Summary

Zone-aware routing reduces cross-zone network traffic and latency by
preferring replicas in the same availability zone as the client. This is
particularly valuable in cloud environments where cross-zone data
transfer incurs additional costs.

  New Configuration

  redis_cluster.proto:
```
  // Enable zone discovery via INFO command
  google.protobuf.BoolValue enable_zone_discovery = 7;
```
  redis_proxy.proto - New ReadPolicy values:
```
  - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary
  - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary
```

  How It Works
```
  ┌─────────────────────────────────────────────────────────┐
  │              CLUSTER SLOTS Response                     │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Zone Discovery (if enable_zone_discovery=true)      │
  │     Send INFO to each node → parse availability_zone    │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Create hosts with zone in locality.zone()           │
  │     RedisShard groups replicas by zone                  │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Request Routing                                     │
  │     Client zone from node.locality.zone (bootstrap)     │
  │     AZ_AFFINITY: local replicas → any replica → primary │
  └─────────────────────────────────────────────────────────┘
```
  Limitations

- Valkey only: Zone discovery relies on availability_zone field in INFO
response, which is exposed by Valkey but not standard Redis.
- Client zone must be configured via node.locality.zone in Envoy
bootstrap config.

  Risk Level: Low

This is an opt-in feature that requires explicit configuration
(enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing
deployments are unaffected.

  Testing

  - Added comprehensive unit tests for zone-aware load balancing


  Related Issues

  Closes envoyproxy#43011

---------

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
gavin-jeong pushed a commit to sendbird/envoy that referenced this pull request Apr 29, 2026
…roxy#43012)

Summary

Zone-aware routing reduces cross-zone network traffic and latency by
preferring replicas in the same availability zone as the client. This is
particularly valuable in cloud environments where cross-zone data
transfer incurs additional costs.

  New Configuration

  redis_cluster.proto:
```
  // Enable zone discovery via INFO command
  google.protobuf.BoolValue enable_zone_discovery = 7;
```
  redis_proxy.proto - New ReadPolicy values:
```
  - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary
  - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary
```

  How It Works
```
  ┌─────────────────────────────────────────────────────────┐
  │              CLUSTER SLOTS Response                     │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Zone Discovery (if enable_zone_discovery=true)      │
  │     Send INFO to each node → parse availability_zone    │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Create hosts with zone in locality.zone()           │
  │     RedisShard groups replicas by zone                  │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Request Routing                                     │
  │     Client zone from node.locality.zone (bootstrap)     │
  │     AZ_AFFINITY: local replicas → any replica → primary │
  └─────────────────────────────────────────────────────────┘
```
  Limitations

- Valkey only: Zone discovery relies on availability_zone field in INFO
response, which is exposed by Valkey but not standard Redis.
- Client zone must be configured via node.locality.zone in Envoy
bootstrap config.

  Risk Level: Low

This is an opt-in feature that requires explicit configuration
(enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing
deployments are unaffected.

  Testing

  - Added comprehensive unit tests for zone-aware load balancing

  Related Issues

  Closes envoyproxy#43011

---------

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
gavin-jeong added a commit to sendbird/envoy that referenced this pull request Apr 29, 2026
Upstream-merged zone-aware patch (envoyproxy#43012) was authored against envoy main
where HostImpl was already updated to take std::shared_ptr<const Locality>.
On our v1.36-based release/9b72caf-sendbird-custom branch, HostImpl still
takes a const Locality& reference, so passing makeLocalityWithZone(...)
directly fails to compile.

Dereference the shared_ptr to match HostImpl's expected signature on this
branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

redis: Support zone-aware routing support for Redis Cluster proxy

4 participants