-
Notifications
You must be signed in to change notification settings - Fork 5.4k
shared_pool: fix leak when last reference drops on worker thread during dispatcher teardown #44570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
518fed9
7cf5d1f
77ad80f
7deb39c
6eff930
0529b02
2f55c69
020f461
099720b
d0df690
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -402,6 +402,13 @@ bug_fixes: | |
| - area: http2 | ||
| change: | | ||
| Apply nghttp2 CVE-2026-27135 patch. | ||
| - area: shared_pool | ||
| change: | | ||
| Fixed a leak in ``ObjectSharedPool`` when the last reference to a pooled object is released | ||
| on a non-main thread after the main dispatcher has started teardown. The queued cross-thread | ||
| deletion now carries ownership, so the object is freed whether or not the dispatcher runs | ||
| the queued callback. Most visibly this fixes a LeakSanitizer leak in | ||
| ``StrictDnsClusterImpl`` involving the ``Locality`` shared pool during server shutdown. | ||
|
Comment on lines
+407
to
+411
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we please also add a regression test for the leak fix exercising the dispatcher teardown-drops-callback path? |
||
|
|
||
| removed_config_or_runtime: | ||
| # *Normally occurs at the end of the* :ref:`deprecation period <deprecated>` | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -293,7 +293,7 @@ void DnsCacheImpl::onReResolveAlarm(const std::string& host) { | |
| auto last_used_time = primary_host.host_info_->lastUsedTime(); | ||
| ENVOY_LOG(debug, "host='{}' TTL check: now={} last_used={} TTL {}", host, now_duration.count(), | ||
| last_used_time.count(), host_ttl_.count()); | ||
| if ((now_duration - last_used_time) > host_ttl_) { | ||
| if ((now_duration - last_used_time) >= host_ttl_) { | ||
| ENVOY_LOG(debug, "host='{}' TTL expired, removing", host); | ||
| removeHost(host, primary_host, true); | ||
| } else { | ||
|
|
@@ -568,19 +568,26 @@ void DnsCacheImpl::finishResolve(const std::string& host, | |
| // which can exceed host_ttl by a large margin. | ||
| const auto now = main_thread_dispatcher_.timeSource().monotonicTime().time_since_epoch(); | ||
| const auto elapsed = now - primary_host_info->host_info_->lastUsedTime(); | ||
| std::chrono::milliseconds refresh_interval( | ||
| const std::chrono::milliseconds raw_backoff_ms( | ||
| primary_host_info->failure_backoff_strategy_->nextBackOffMs()); | ||
| std::chrono::milliseconds refresh_interval(raw_backoff_ms); | ||
| if (elapsed >= host_ttl_) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should add a test for this case. |
||
| refresh_interval = std::chrono::milliseconds(1); | ||
| refresh_interval = std::chrono::milliseconds(0); | ||
| } else { | ||
| const auto until_eviction = | ||
| std::chrono::duration_cast<std::chrono::milliseconds>(host_ttl_ - elapsed) + | ||
| std::chrono::milliseconds(1); | ||
| std::chrono::duration_cast<std::chrono::milliseconds>(host_ttl_ - elapsed); | ||
| refresh_interval = std::min(refresh_interval, until_eviction); | ||
| } | ||
|
Comment on lines
574
to
580
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the previous PR we specifically added the guarantee that a failed resolve re-arms the alarm so eviction happens no later than the host becomes eligible for eviction. Now, this PR is breaking that invariant. You should pick one:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i asked the bot about this - it said that both were required/a good idea for some reason - unfortunately copilot is now down - so unable to follow up immediately |
||
| // Floor the result at min_refresh_interval_ (dns_min_refresh_rate) to prevent arming | ||
| // a ms-scale alarm that can kick rapid-fire resolves and race with dispatcher/resolver | ||
| // teardown in integration tests (observed as a LeakSanitizer leak in | ||
| // proxy_filter_integration_test DoubleResolution). | ||
| refresh_interval = | ||
| std::max(refresh_interval, | ||
| std::chrono::duration_cast<std::chrono::milliseconds>(min_refresh_interval_)); | ||
| primary_host_info->refresh_timer_->enableTimer(refresh_interval); | ||
| ENVOY_LOG(debug, "DNS refresh rate reset for host '{}', (failure) refresh rate {} ms", host, | ||
| refresh_interval.count()); | ||
| ENVOY_LOG(debug, "DNS refresh rate reset for host '{}', (failure) raw={} ms armed={} ms", | ||
| host, raw_backoff_ms.count(), refresh_interval.count()); | ||
| } | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering how relevant this amount of information is for users. Maybe squashing all changes like this into a single changelog entry like
Fixed memory leaks?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im reluctant to change other entries - but if you can suggest a terser changelog - happy to update