Skip to content

dns_cache: add test for past-TTL failure-backoff branch floored at dns_min_refresh_rate#137

Draft
Copilot wants to merge 2 commits into
flake-dfp-nofrom
copilot/add-test-past-ttl-branch
Draft

dns_cache: add test for past-TTL failure-backoff branch floored at dns_min_refresh_rate#137
Copilot wants to merge 2 commits into
flake-dfp-nofrom
copilot/add-test-past-ttl-branch

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 23, 2026

Covers the if (elapsed >= host_ttl_) branch in DnsCacheImpl::finishResolve (PR envoyproxy#44570), which was explicitly flagged by reviewer @agrawroh as untested. When a DNS failure is observed after the host TTL has elapsed, refresh_interval is set to 0 and then unconditionally raised to min_refresh_interval_ — this test pins that behaviour.

New test: ResolveFailurePastHostTtlFlooredByMinRefresh

Placed immediately after ResolveFailureBackoffZeroFlooredByMinRefresh, mirroring ResolveFailureBackoffCappedByHostTtl in structure.

Key timing design — sim time is advanced in two stages to thread the needle between onReResolveAlarm (evicts when now - last_used >= host_ttl) and finishResolve (takes the past-TTL branch under the same condition):

Event Elapsed Effect
Alarm fires 100ms < 1000ms TTL onReResolveAlarmstartResolve (not evict)
resolve_cb(Failure) called 1000ms ≥ 1000ms TTL past-TTL branch → refresh_interval = 0 → floor to 1000ms
Follow-up alarm 2000ms ≥ 1000ms TTL host evicted

The critical assertion:

// Past-TTL branch: raw backoff is discarded and refresh_interval becomes 0;
// the unconditional floor lifts it to dns_min_refresh_rate (1000ms).
EXPECT_CALL(*resolve_timer, enableTimer(std::chrono::milliseconds(1000), _));

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • dl.google.com
    • Triggering command: /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/bin/java bazel(envoy) --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/build/bazel_root/base/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib/server:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/ -Dfile.encoding=ISO-8859-1 -Duser.country= -Duser.language= -Duser.variant= -Xmx3g -DBAZEL_TRACK_SOURCE_DIRECTORIES=1 -Djavax.net.ssl.trustStore=/tmp/custom-cacerts -Djavax.net.ssl.trustStorePassword=changeit -jar /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Add one new test to test/extensions/common/dynamic_forward_proxy/dns_cache_impl_test.cc covering the past-TTL branch of the failure-backoff logic in PR envoyproxy#44570. Reviewer @agrawroh explicitly asked for a test of this case (the if (elapsed >= host_ttl_) branch in finishResolve).

Push as a single commit on top of branch flake-dfp-no in fork phlax/envoy. Do not open a new PR. Do not touch any other files.

What the PR already does (on flake-dfp-no)

In source/extensions/common/dynamic_forward_proxy/dns_cache_impl.cc finishResolve, the failure branch:

  1. Takes the raw backoff via primary_host_info->failure_backoff_strategy_->nextBackOffMs().
  2. If elapsed >= host_ttl_ (past TTL), sets refresh_interval = 0.
  3. Else caps at host_ttl - elapsed.
  4. Unconditionally floors the result at min_refresh_interval_ (dns_min_refresh_rate).
  5. Arms the refresh timer.

And in onReResolveAlarm, (now - last_used) >= host_ttl_ (note: >=, not >) evicts the host.

So after a failure that happens past the TTL, the armed interval should equal min_refresh_interval_ (not 0, not 1ms). When that alarm fires the host is evicted because elapsed ≥ host_ttl.

Context

Two tests already exist nearby that together define the reference shape:

  • ResolveFailureBackoffCappedByHostTtl — cap kicks in because raw_backoff (60s) > until_eviction (900ms), then floor lifts 900ms → 1000ms.
  • ResolveFailureBackoffZeroFlooredByMinRefresh — raw_backoff forced to 0 via random()==0, cap doesn't kick in, floor lifts 0 → 1000ms.

The new test covers: elapsed past host_ttl at the moment of DNS failure, which takes the refresh_interval = 0 branch, then is floored to min_refresh_interval_.

What to add

A new TEST_F(DnsCacheImplTest, ResolveFailurePastHostTtlFlooredByMinRefresh) placed immediately after ResolveFailureBackoffZeroFlooredByMinRefresh.

Structure should mirror ResolveFailureBackoffCappedByHostTtl closely.

Steps:

  1. Configure:

    *config_.mutable_host_ttl() = Protobuf::util::TimeUtil::SecondsToDuration(1);
    *config_.mutable_dns_refresh_rate() = Protobuf::util::TimeUtil::SecondsToDuration(60);
    *config_.mutable_dns_min_refresh_rate() = Protobuf::util::TimeUtil::SecondsToDuration(1);
    initialize();
    InSequence s;
  2. Standard first resolve (copy from ResolveFailureBackoffCappedByHostTtl): loadDnsCacheEntry, resolver_->resolve, completion callback with "10.0.0.1", asserts for onDnsHostAddOrUpdate, onLoadDnsCacheComplete, onDnsResolutionComplete(Completed), and enableTimer(dns_ttl_, _) for the success refresh.

  3. Advance sim time past host_ttl — e.g. simTime().advanceTimeWait(std::chrono::milliseconds(1500)); (1500ms > host_ttl=1000ms so elapsed >= host_ttl_ is true when the failure handler runs).

  4. Invoke resolve_timer->invokeCallback() to trigger the re-resolve. EXPECT_CALL on *timeout_timer, enableTimer(5000ms, nullptr) and on resolve("foo.com", _, _) to capture the new resolve_cb.

  5. Trigger the DNS failure:

    • EXPECT_CALL(*timeout_timer, disableTimer());
    • EXPECT_CALL(update_callbacks_, onDnsHostAddOrUpdate(_, _)).Times(0);
    • EXPECT_CALL(callbacks, onLoadDnsCacheComplete(_)).Times(0);
    • EXPECT_CALL(update_callbacks_, onDnsResolutionComplete("foo.com:80", DnsHostInfoEquals("10.0.0.1:80", "foo.com", false), Network::DnsResolver::ResolutionStatus::Failure));
    • Key assertion: EXPECT_CALL(*resolve_timer, enableTimer(std::chrono::milliseconds(1000), _)); — 0 (past-TTL branch) floored to 1000ms (dns_min_refresh_rate). Not 0. Not 1ms.
    • resolve_cb(Network::DnsResolver::ResolutionStatus::Failure, "", TestUtility::makeDnsResponse({}));
  6. checkStats(2 attempt, 1 success, 1 failure, 1 address_changed, 1 added, 0 removed, 1 num_hosts);

  7. Eviction follow-up (mirror ResolveFailureBackoffCappedByHostTtl's tail):

    • simTime().advanceTimeWait(std::chrono::milliseconds(1000)); — advance by the floored interval
    • EXPECT_CALL(update_callbacks_, onDnsHostRemove("foo.com:80"));
    • resolve_timer->invokeCallback(); — should evict because (now - last_used) >= host_ttl_ holds (it holds well past the boundary since we advanced 1500ms + 1000ms = 2500ms of elapsed time vs 1000ms TTL).
    • checkStats(2 attempt, 1 success, 1 failure, 1 address_changed, 1 added, 1 removed, 0 num_hosts);

Add a short comment on the key assertion explaining: "Past-TTL branch: raw backoff is discarded and refresh_interval becomes 0; the unconditional floor lifts it to dns_min_refresh_rate (1000ms)."

Do NOT

  • Do NOT modify dns_cache_impl.cc, dns_cache_impl.h, the fixture class, or shared_pool*.
  • Do NOT touch the other two failure-floor tests.
  • Do NOT open a new PR. Push the commit directly to flake-dfp-no.

Build / test

  • clang-format the edited file.
  • Build/run //test/extensions/common/dynamic_forward_proxy:dns_cache_impl_test (if bazel unavailable, note that in the PR bo...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Add test for past-TTL branch in dns_cache_impl_test dns_cache: add test for past-TTL failure-backoff branch floored at dns_min_refresh_rate Apr 23, 2026
Copilot AI requested a review from phlax April 23, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants