Skip to content

CPLAT-9492: Update to Envoy 9b72caf (for Istio 1.28.5)#18

Closed
gavin-jeong wants to merge 13 commits intorelease/9b72caf-sendbird-customfrom
CPLAT-9492-update_to_809213a
Closed

CPLAT-9492: Update to Envoy 9b72caf (for Istio 1.28.5)#18
gavin-jeong wants to merge 13 commits intorelease/9b72caf-sendbird-customfrom
CPLAT-9492-update_to_809213a

Conversation

@gavin-jeong
Copy link
Copy Markdown

JIRA: https://sendbird.atlassian.net/browse/CPLAT-9492

Summary

Rebuild the Envoy release branch for Istio 1.28.2 using the actual upstream-merged commits, replacing our previous cherry-picked versions.

Upstream PRs now included as-merged:

  • f1e84f0ecd redis: Add zone-aware routing support for Redis Cluster proxy (#43012) — merged 2026-04-07
  • 3d0fc82b0e tls: Add certificate compression support (RFC 8879) (#42690) — merged 2026-03-07

Sendbird-only patches (unchanged from previous releases):

  • redis: eval_ro, evalsha_ro support
  • Custom tracing header
  • QUIC keylog (SSLKEYLOGFILE + TLS context integration)
  • Redis race condition / use-after-free fixes
  • PubSub commands
  • Trace ID / Request ID format (UUIDv4→UUIDv7) [CPLAT-8445, CPLAT-8549]
  • Per-shard Redis proxy metrics

Envoy Details

  • Base SHA: 809213ab4403f02b04521567715f97ad5a1ae597 (Envoy dev v1.36.5, Istio 1.28.2 base)
  • Sendbird release branch: release/809213a-sendbird-custom

Test Plan

  • CI passes on this PR
  • proxy-istio 1.28.2 build succeeds (see proxy-istio PR)
  • Test in dev cluster

Merge Order

⚠️ Merge this PR first, then the proxy-istio PR.


Security checklist (Infrastructure code)

  • I have checked below conditions:
    • This PR doesn't contain any SecurityGroup rule changes which are not allowed via INF Jira ticket
    • This PR doesn't contain any public open SecurityGroup inbound rules(0.0.0.0/0) except the predefined service port
    • This PR doesn't contain any resources in a public subnet except those are needed with a specific technical reason
    • This PR doesn't contain any credentials (AWS Secret key, password, API tokens, etc)
    • This PR doesn't contain any IAM users which are not allowed via UAC Jira approval
    • The contents of this PR is following the guide from OSWAP and SendBird Secure Software Development Lifecycle

dlunch and others added 13 commits April 10, 2026 12:50
Include code formatting improvements for consistent style in trace test files.
This commit introduces QUIC/HTTP3 keylog functionality in Envoy, enabling generation of NSS Key Log Format files for Wireshark and other debugging tools.

- Keylog callback registration in OnNewSslCtx()
- Implementation of EnvoyQuicProofSource::setupQuicKeylogCallback() and quicKeylogCallback()
- TLS context–based keylog configuration with per–filter chain caching and thread safety
- Address filtering via local/remote IP lists
- Fallback to SSLKEYLOGFILE environment variable for compatibility with existing workflows
- QuicKeylogBridge integration with Envoy’s existing TLS keylog infrastructure
- RawBufferSocket fallback fix in QuicServerTransportSocketFactory::createDownstreamTransportSocket()
- Comprehensive unit tests including edge cases

Signed-off-by: Chanhun Jeong <keyolk@gmail.com>
…uction

This commit combines multiple fixes for Redis cluster stability:

- Fix race conditions in cluster destruction by capturing is_destroying_ flag
- Add comprehensive null checks to prevent segfaults during cluster destruction
- Use local shared_ptr copies to prevent race conditions
- Use shared_from_this() to keep RedisDiscoverySession alive during timer callbacks
- Fix use-after-free by using session-owned flag instead of parent reference

These fixes ensure safe cleanup of Redis clusters and prevent crashes
during cluster removal and timer callback execution.
Add TraceSampledFormatter that uses Envoy's internal tracing decision
(stream_info.traceReason())

This approach works correctly at trace origin points (e.g., Istio Ingress
Gateway) where no incoming traceparent header exists.

Usage: %TRACE_SAMPLED% in access log format
Returns: "true" if traced, "false" otherwise

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add per-shard statistics for Redis proxy to track hot shard usage:

- enable_per_shard_stats: Emits per-shard request counters
  - upstream_rq_total: Total requests to each shard
  - upstream_rq_success: Successful requests
  - upstream_rq_failure: Failed requests
  - upstream_rq_active: Active requests (gauge)

- enable_per_shard_latency_stats: Emits latency histogram
  - upstream_rq_time: Request latency in microseconds

All metrics are scoped under: cluster.<cluster_name>.shard.<host_address>.*

Per-shard command-level stats are also recorded when enable_command_stats
is enabled alongside the per-shard options.

Note: These options may significantly increase metric cardinality in
large clusters. Use with caution in production environments.
…roxy#43012)

Summary

Zone-aware routing reduces cross-zone network traffic and latency by
preferring replicas in the same availability zone as the client. This is
particularly valuable in cloud environments where cross-zone data
transfer incurs additional costs.

  New Configuration

  redis_cluster.proto:
```
  // Enable zone discovery via INFO command
  google.protobuf.BoolValue enable_zone_discovery = 7;
```
  redis_proxy.proto - New ReadPolicy values:
```
  - AZ_AFFINITY: Prefer same-zone replicas → any replica → primary
  - AZ_AFFINITY_REPLICAS_AND_PRIMARY: Prefer same-zone replicas → same-zone primary → any replica → primary
```

  How It Works
```
  ┌─────────────────────────────────────────────────────────┐
  │              CLUSTER SLOTS Response                     │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Zone Discovery (if enable_zone_discovery=true)      │
  │     Send INFO to each node → parse availability_zone    │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Create hosts with zone in locality.zone()           │
  │     RedisShard groups replicas by zone                  │
  └─────────────────────────────────────────────────────────┘
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │     Request Routing                                     │
  │     Client zone from node.locality.zone (bootstrap)     │
  │     AZ_AFFINITY: local replicas → any replica → primary │
  └─────────────────────────────────────────────────────────┘
```
  Limitations

- Valkey only: Zone discovery relies on availability_zone field in INFO
response, which is exposed by Valkey but not standard Redis.
- Client zone must be configured via node.locality.zone in Envoy
bootstrap config.

  Risk Level: Low

This is an opt-in feature that requires explicit configuration
(enable_zone_discovery: true and read_policy: AZ_AFFINITY). Existing
deployments are unaffected.

  Testing

  - Added comprehensive unit tests for zone-aware load balancing

  Related Issues

  Closes envoyproxy#43011

---------

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Add TLS certificate compression with brotli and zlib algorithms.
This reduces TLS handshake size, especially beneficial for QUIC where
the ServerHello needs to fit in the initial
response.

The existing QUIC-only certificate compression implementation has been
refactored to be shared between QUIC and TCP TLS. The QUIC wrapper now
delegates to the common TLS implementation for
backward compatibility.

Fixes envoyproxy#42682

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Change //bazel:zlib to //bazel/foreign_cc:zlib in source/common/tls/BUILD
to match the zlib target name used in Envoy v1.36.5.
Change @brotli// to @org_brotli// in source/common/tls/BUILD.
In Envoy v1.36.5 the brotli repo is named org_brotli (renamed to
brotli in later versions).
@gavin-jeong gavin-jeong changed the title CPLAT-9492: Update to Envoy 809213a (for Istio 1.28.2) CPLAT-9492: Update to Envoy 9b72caf (for Istio 1.28.5) Apr 10, 2026
@gavin-jeong gavin-jeong changed the base branch from release/809213a-sendbird-custom to release/9b72caf-sendbird-custom April 10, 2026 06:14
@gavin-jeong
Copy link
Copy Markdown
Author

Superseded by new PR using Istio 1.28.5 envoy base (9b72caf) with tested v1.36.x-compatible patches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants