Update to 1 35 6#9
Merged
gavin-jeong merged 11 commits intorelease/v1.35.6-sendbird-customfrom Jan 6, 2026
Merged
Conversation
Include code formatting improvements for consistent style in trace test files.
Signed-off-by: Chanhun Jeong <keyolk@gmail.com>
This commit introduces QUIC/HTTP3 keylog functionality in Envoy, enabling generation of NSS Key Log Format files for Wireshark and other debugging tools. - Keylog callback registration in OnNewSslCtx() - Implementation of EnvoyQuicProofSource::setupQuicKeylogCallback() and quicKeylogCallback() - TLS context–based keylog configuration with per–filter chain caching and thread safety - Address filtering via local/remote IP lists - Fallback to SSLKEYLOGFILE environment variable for compatibility with existing workflows - QuicKeylogBridge integration with Envoy’s existing TLS keylog infrastructure - RawBufferSocket fallback fix in QuicServerTransportSocketFactory::createDownstreamTransportSocket() - Comprehensive unit tests including edge cases Signed-off-by: Chanhun Jeong <keyolk@gmail.com>
Protect all async callbacks from accessing deallocated cluster members during destruction by adding is_destroying_ atomic flag checks. Affected callbacks: - ClusterRefreshManager callbacks - DNS resolution callbacks - Connection event callbacks - Timer callbacks - Redis client response callbacks (onResponse, onFailure, onUnexpectedResponse) - Hostname resolution callbacks The race condition occurred when callbacks were already queued in the event loop when cluster destruction began, causing use-after-free access to parent cluster members like info_, redis_discovery_session_, and resolve_timer_. All callbacks now check is_destroying_ with memory_order_acquire before accessing any parent members, ensuring safe termination during destruction. Fixes segfaults that occurred when removing Redis service entries. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…ter destruction
Problem: Segmentation faults occur when accessing member pointers in async
callbacks during Redis cluster destruction, even with is_destroying_ flag
checks. This happens because there's a race window between checking the
flag and accessing the pointers.
Solution: Add defensive null checks for all pointer accesses that could
become invalid during destruction:
1. ClusterInfo pointer (info_):
- Add null checks before all configUpdateStats() calls
- Use safe access pattern for name() in log statements
- Locations: startResolveRedis(), updateDnsStats(), DNS callbacks,
onResponse(), onUnexpectedResponse(), onFailure()
2. DNS Resolver pointer (dns_resolver_):
- Add null checks in startResolveDns()
- Add checks in resolveClusterHostnames() and resolveReplicas()
- Prevents crashes when DNS resolution is initiated during teardown
3. Timer pointer (resolve_timer_):
- Add null checks before enableTimer() calls
- Locations: finishClusterHostnameResolution(), onResponse(),
onUnexpectedResponse(), onFailure()
4. Consistency fix:
- Line 714: Changed parent_.info() to parent_.info_ to match
null-checked pattern used elsewhere
The pattern applied throughout:
1. Check is_destroying_ flag with memory_order_acquire
2. Verify each pointer is non-null before dereferencing
3. This dual-check handles the race window safely
This prevents use-after-free crashes during Redis cluster teardown when
async callbacks execute after partial destruction has begun.
The previous fix with null checks still had a race condition window between
checking the pointer and using it. Even with the null check, the shared_ptr
could be reset to null by another thread between the check and use.
Solution: Make local copies of shared_ptr before use. This ensures the
pointer remains valid throughout its usage in the current scope.
Changes:
1. startResolveRedis(): Copy info_ to local variable before use
2. updateDnsStats(): Use local copy of info_
3. DNS callbacks: Use local copy for stats updates
4. onResponse(), onUnexpectedResponse(), onFailure(): Use local copies
5. client_factory_.create(): Check and use local copy of info_
The pattern applied:
auto info = parent_.info_; // Make local copy (ref count++)
if (!info) { // Check if null
return;
}
info->method(); // Safe to use - won't become null
This prevents the crash at line 376 where info_ was becoming null
between the check and the access, even with memory_order_acquire.
…ing timer callbacks
The 5% crash rate was caused by timer callbacks executing after the
RedisDiscoverySession was destroyed. Even though we checked is_destroying_,
there was a race where:
1. Timer callback fires and enters the lambda
2. Destructor runs and deletes the session (unique_ptr reset)
3. Callback tries to access parent_.is_destroying_ → CRASH (use-after-free)
Solution:
- Move timer creation from constructor to initialize() method
- Capture shared_from_this() in timer lambda instead of raw 'this'
- Call initialize() after RedisDiscoverySession construction completes
This ensures the session object stays alive as long as any timer callback
is queued or executing, preventing the use-after-free.
Pattern changed from:
resolve_timer_ = dispatcher_.createTimer([this]() { ... });
To:
auto self = shared_from_this();
resolve_timer_ = dispatcher_.createTimer([self]() { ... });
This should eliminate the remaining 5% crash rate during Redis cluster
destruction.
…nt reference CRITICAL FIX: The previous approach had a fatal flaw - callbacks with shared_from_this() kept the session alive, but the session holds a reference to the parent RedisCluster. When the parent was destroyed, accessing parent_.is_destroying_ became use-after-free. The race condition: 1. Timer callback fires with shared_ptr<Session> (session kept alive) 2. RedisCluster destructor runs and completes 3. Callback tries to check parent_.is_destroying_ 4. CRASH - parent object destroyed, reference is dangling Solution: - Add parent_destroyed_ atomic flag IN THE SESSION - Parent sets this flag BEFORE destroying session - Callbacks check session-owned flag, never access parent directly - Also simplify all safety checks into helper methods This is the correct fix for the 5% crash rate when removing Redis services.
bellatoris
pushed a commit
that referenced
this pull request
Dec 7, 2025
Commit Message: json-fuzz: prevent large size inputs Additional Description: The original [fuzzer test-case](https://oss-fuzz.com/testcase-detail/4819251528007680) created a large input file with ~1MiB of base64 contents that was truncated in the middle and caused the fuzzer to OOM due to a test-function allocation: ``` #9 0x5ca72f569401 in testing::internal::edit_distance::CreateUnifiedDiff(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, unsigned long) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:1352:39 ``` This PR limits the input sizes for the json input to 32KiB, as most errors should be detected with this limit. Risk Level: low - tests only Testing: N/A Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Fixes fuzz issue [421951268](https://issues.oss-fuzz.com/issues/421951268). --------- Signed-off-by: Adi Suissa-Peleg <adip@google.com>
bellatoris
pushed a commit
that referenced
this pull request
Jan 15, 2026
…voyproxy#42554) ## Description Today, when a filesystem watch callback returns a non-OK status or throws an exception, the error gets propagated to `FileEventImpl` which uses `THROW_IF_NOT_OK`. Since there's no exception handler in the `libevent` loop, this causes `std::terminate` to be called, which crashes Envoy. **Stack Trace:** ``` Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.119][234999][warning][misc] [source/common/protobuf/message_validator_impl.cc:23] Deprecated field: type envoy.config.core.v3.HeaderValueOption Using deprecated option 'envoy.config.core.v3.HeaderValueOption.append' from file base.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/version_history/version_history for details. If continued use of this field is absolutely necessary, see https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/runtime#using-runtime-overrides-for-deprecated-features for how to apply a temporary and highly discouraged override. Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.120][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '0_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.123][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '1_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.126][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '2_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.127][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '3_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.128][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '4_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.130][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '5_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.132][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '6_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.134][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener 'mtls_untrusted_regional_transparent_tunnel_listener' Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.135][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener 'mtls_app_trusted_regional_transparent_tunnel_listener' Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][main] [source/exe/terminate_handler.cc:36] std::terminate called! Uncaught unknown exception, see trace. Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][backtrace] [./source/server/backtrace.h:113] Backtrace (use tools/stack_decode.py to get line numbers): Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][backtrace] [./source/server/backtrace.h:114] Envoy version: 5eaabe0bbaad4612cb85473cd151039d8f1a2760/1.34.2-dev/Clean/RELEASE/BoringSSL Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][backtrace] [./source/server/backtrace.h:116] Address mapping: 558d8afcc000-558d8ee2f000 /usr/local/bin/envoy Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.100][234999][critical][backtrace] [./source/server/backtrace.h:123] #0: [0x558d8da5784f] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.102][234999][critical][backtrace] [./source/server/backtrace.h:123] #1: [0x558d8edd8673] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.104][234999][critical][backtrace] [./source/server/backtrace.h:123] #2: [0x558d8e3b120b] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.106][234999][critical][backtrace] [./source/server/backtrace.h:121] #3: Envoy::Filesystem::WatcherImpl::onInotifyEvent() [0x558d8e3990c3] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.108][234999][critical][backtrace] [./source/server/backtrace.h:123] #4: [0x558d8e3998d2] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.109][234999][critical][backtrace] [./source/server/backtrace.h:123] #5: [0x558d8e393de6] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.111][234999][critical][backtrace] [./source/server/backtrace.h:121] #6: Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb() [0x558d8e394eb5] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.113][234999][critical][backtrace] [./source/server/backtrace.h:123] #7: [0x558d8e710823] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.115][234999][critical][backtrace] [./source/server/backtrace.h:121] #8: event_base_loop [0x558d8e70d4a1] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.117][234999][critical][backtrace] [./source/server/backtrace.h:121] #9: Envoy::Server::InstanceBase::run() [0x558d8daa2b99] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.119][234999][critical][backtrace] [./source/server/backtrace.h:121] #10: Envoy::MainCommonBase::run() [0x558d8da4327a] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.121][234999][critical][backtrace] [./source/server/backtrace.h:121] #11: Envoy::MainCommon::main() [0x558d8da44234] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.123][234999][critical][backtrace] [./source/server/backtrace.h:121] #12: main [0x558d8afcc11c] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.123][234999][critical][backtrace] [./source/server/backtrace.h:123] #13: [0x7f1d54073efb] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.123][234999][critical][backtrace] [./source/server/backtrace.h:121] #14: __libc_start_main [0x7f1d54073fbb] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:121] #15: _start [0x558d8afcc02e] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:129] Caught Aborted, suspect faulting address 0x395f7 Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:113] Backtrace (use tools/stack_decode.py to get line numbers): Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:114] Envoy version: 5eaabe0bbaad4612cb85473cd151039d8f1a2760/1.34.2-dev/Clean/RELEASE/BoringSSL Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:116] Address mapping: 558d8afcc000-558d8ee2f000 /usr/local/bin/envoy Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:123] #0: [0x7f1d54089c90] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:121] #1: gsignal [0x7f1d54089bde] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:121] #2: abort [0x7f1d54072832] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.126][234999][critical][backtrace] [./source/server/backtrace.h:123] #3: [0x558d8da5785c] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.128][234999][critical][backtrace] [./source/server/backtrace.h:123] #4: [0x558d8edd8673] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.129][234999][critical][backtrace] [./source/server/backtrace.h:123] #5: [0x558d8e3b120b] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.129][234999][critical][backtrace] [./source/server/backtrace.h:121] #6: Envoy::Filesystem::WatcherImpl::onInotifyEvent() [0x558d8e3990c3] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.131][234999][critical][backtrace] [./source/server/backtrace.h:123] #7: [0x558d8e3998d2] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.133][234999][critical][backtrace] [./source/server/backtrace.h:123] #8: [0x558d8e393de6] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.133][234999][critical][backtrace] [./source/server/backtrace.h:121] #9: Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb() [0x558d8e394eb5] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:123] #10: [0x558d8e710823] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] #11: event_base_loop [0x558d8e70d4a1] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] #12: Envoy::Server::InstanceBase::run() [0x558d8daa2b99] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] #13: Envoy::MainCommonBase::run() [0x558d8da4327a] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] #14: Envoy::MainCommon::main() [0x558d8da44234] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] #15: main [0x558d8afcc11c] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:123] #16: [0x7f1d54073efb] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] #17: __libc_start_main [0x7f1d54073fbb] Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] #18: _start [0x558d8afcc02e] ``` In this change, we are making the `inotify` and `kqueue` watchers handle callback errors gracefully by catching any exceptions using `TRY_ASSERT_MAIN_THREAD`, logging errors instead of propagating them and always returning the `OkStatus` to the event loop. --- **Commit Message:** filesystem: Fix crash when watch callback returns error or throws **Additional Description:** Make `inotify` and `kqueue` watchers handle callback errors gracefully. **Risk Level:** Low **Testing:** CI **Docs Changes:** N/A **Release Notes:** N/A --------- Signed-off-by: Rohit Agrawal <rohit.agrawal@salesforce.com> Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]