Skip to content

CPLAT-9492: Update to Envoy 9b72caf (for Istio 1.28.5)#19

Closed
gavin-jeong wants to merge 13 commits intorelease/9b72caf-sendbird-customfrom
CPLAT-9492-update_to_9b72caf
Closed

CPLAT-9492: Update to Envoy 9b72caf (for Istio 1.28.5)#19
gavin-jeong wants to merge 13 commits intorelease/9b72caf-sendbird-customfrom
CPLAT-9492-update_to_9b72caf

Conversation

@gavin-jeong
Copy link
Copy Markdown

JIRA: https://sendbird.atlassian.net/browse/CPLAT-9492

Summary

Rebuild Envoy release branch for Istio 1.28.5 using the v1.36 envoy base (9b72caf3e2) with all 12 tested Sendbird patches from release/v1.36.5-sendbird-custom.

Base: 9b72caf3e2 — Envoy v1.36 with security patches (used by Istio 1.28.5)

Sendbird patches (12 commits, all tested on v1.36.x):

  • redis: eval_ro, evalsha_ro support
  • Custom tracing header
  • QUIC keylog (SSLKEYLOGFILE + TLS context integration)
  • Redis race condition / use-after-free fixes
  • PubSub commands
  • Trace ID / Request ID format (UUIDv4 -> UUIDv7) [CPLAT-8445, CPLAT-8549]
  • Per-shard Redis proxy metrics
  • Zone-aware routing for Redis Cluster proxy
  • Certificate compression (RFC 8879)
  • Locality fix for RedisHost constructor

Test Plan

  • proxy-istio build succeeds with this envoy
  • Test image in dev cluster

Merge Order

Merge this first, then proxy-istio PR.


Security checklist (Infrastructure code)

  • No SecurityGroup rule changes
  • No public open SecurityGroup inbound rules
  • No resources in a public subnet
  • No credentials
  • No IAM users
  • Following OWASP and Sendbird SSDLC

dlunch and others added 13 commits April 10, 2026 15:13
Include code formatting improvements for consistent style in trace test files.
This commit introduces QUIC/HTTP3 keylog functionality in Envoy, enabling generation of NSS Key Log Format files for Wireshark and other debugging tools.

- Keylog callback registration in OnNewSslCtx()
- Implementation of EnvoyQuicProofSource::setupQuicKeylogCallback() and quicKeylogCallback()
- TLS context–based keylog configuration with per–filter chain caching and thread safety
- Address filtering via local/remote IP lists
- Fallback to SSLKEYLOGFILE environment variable for compatibility with existing workflows
- QuicKeylogBridge integration with Envoy’s existing TLS keylog infrastructure
- RawBufferSocket fallback fix in QuicServerTransportSocketFactory::createDownstreamTransportSocket()
- Comprehensive unit tests including edge cases

Signed-off-by: Chanhun Jeong <keyolk@gmail.com>
…uction

This commit combines multiple fixes for Redis cluster stability:

- Fix race conditions in cluster destruction by capturing is_destroying_ flag
- Add comprehensive null checks to prevent segfaults during cluster destruction
- Use local shared_ptr copies to prevent race conditions
- Use shared_from_this() to keep RedisDiscoverySession alive during timer callbacks
- Fix use-after-free by using session-owned flag instead of parent reference

These fixes ensure safe cleanup of Redis clusters and prevent crashes
during cluster removal and timer callback execution.
Add TraceSampledFormatter that uses Envoy's internal tracing decision
(stream_info.traceReason())

This approach works correctly at trace origin points (e.g., Istio Ingress
Gateway) where no incoming traceparent header exists.

Usage: %TRACE_SAMPLED% in access log format
Returns: "true" if traced, "false" otherwise

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add per-shard statistics for Redis proxy to track hot shard usage:

- enable_per_shard_stats: Emits per-shard request counters
  - upstream_rq_total: Total requests to each shard
  - upstream_rq_success: Successful requests
  - upstream_rq_failure: Failed requests
  - upstream_rq_active: Active requests (gauge)

- enable_per_shard_latency_stats: Emits latency histogram
  - upstream_rq_time: Request latency in microseconds

All metrics are scoped under: cluster.<cluster_name>.shard.<host_address>.*

Per-shard command-level stats are also recorded when enable_command_stats
is enabled alongside the per-shard options.

Note: These options may significantly increase metric cardinality in
large clusters. Use with caution in production environments.
This change implements zone-aware routing for Redis Cluster, allowing read
requests to be routed to replicas in the same availability zone as the client.

Key changes:
- Add enable_zone_discovery config option to redis_cluster.proto
- Add az_affinity and az_affinity_replicas_and_primary read policies
- Implement INFO command-based zone discovery during cluster slot updates
- Store zone info in host locality for standard Envoy locality handling
- RedisShard groups replicas by zone for efficient zone-aware routing

Zone Discovery Flow:
1. CLUSTER SLOTS response triggers zone discovery when enabled
2. INFO command sent to each unique node to get availability_zone
3. Zones stored in host->locality().zone() when hosts are created
4. RedisShard reads zone from host locality, groups replicas by zone

Read Policies:
- AzAffinity: local replicas -> any replica -> primary
- AzAffinityReplicasAndPrimary: local replicas -> local primary -> any replica -> primary

Note: This feature currently works with Valkey only. Valkey exposes
availability_zone in its INFO response. Standard Redis does not support this field.

Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Add TLS certificate compression with brotli, zstd, and zlib algorithms.
This reduces TLS handshake size, especially beneficial for QUIC where
the ServerHello needs to fit in the initial response.

Key changes:
- Move cert_compression from quic/ to tls/ for shared use
- Add brotli and zstd algorithms alongside existing zlib
- Add compression stats: ssl.certificate_compression.<algo>.*
- Add runtime flag (default: disabled) for safe rollout
- Fix SSL_CTX app_data crash risk for QUIC by using SSL_CTX_get_ex_new_index()

Runtime guard: envoy.reloadable_features.tls_support_certificate_compression

Cherry-picked from upstream PR envoyproxy#42690 (not yet merged).
The HostImpl constructor expects a const reference, not a shared_ptr.
Apply 5 security patches from upstream Istio 1.28.5:
- http: ensure decode methods are blocked after a downstream reset
- json: fix off-by-one write that could corrupt memory
- network: fix crash in getAddressWithPort when called with pipe address
- rbac: fix multivalue header bypass
- ratelimit: fix response phase limit race condition
@gavin-jeong gavin-jeong deleted the CPLAT-9492-update_to_9b72caf branch April 29, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants