Skip to content

dns_cache: test that zero failure backoff is floored at dns_min_refresh_rate#136

Draft
Copilot wants to merge 2 commits into
flake-dfp-nofrom
copilot/add-test-nextbackoff-ms-equal-zero
Draft

dns_cache: test that zero failure backoff is floored at dns_min_refresh_rate#136
Copilot wants to merge 2 commits into
flake-dfp-nofrom
copilot/add-test-nextbackoff-ms-equal-zero

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 23, 2026

Covers the nextBackOffMs() == 0 case of the failure-backoff floor added in envoyproxy#44570, as requested by @agrawroh.

What the test does

  • Stubs random()0 before initialize() so the jittered exponential backoff yields 0 on its first call
  • Configures host_ttl=60s, dns_refresh_rate=60s, dns_min_refresh_rate=1s — the large TTL ensures the cap (host_ttl − elapsed) never engages, isolating the floor path
  • Asserts the failure callback arms the timer at exactly 1000ms (the dns_min_refresh_rate floor), not 0
ON_CALL(context_.server_context_.api_.random_, random()).WillByDefault(Return(0));
*config_.mutable_host_ttl() = Protobuf::util::TimeUtil::SecondsToDuration(60);
*config_.mutable_dns_refresh_rate() = Protobuf::util::TimeUtil::SecondsToDuration(60);
*config_.mutable_dns_min_refresh_rate() = Protobuf::util::TimeUtil::SecondsToDuration(1);
// ...
EXPECT_CALL(*resolve_timer, enableTimer(std::chrono::milliseconds(1000), _));

Placed immediately after the existing ResolveFailureBackoffCappedByHostTtl test, which covers the complementary (cap) path.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • dl.google.com
    • Triggering command: /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/bin/java bazel(envoy) --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/build/bazel_root/base/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib/server:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/ -Dfile.encoding=ISO-8859-1 -Duser.country= -Duser.language= -Duser.variant= -Xmx3g -DBAZEL_TRACK_SOURCE_DIRECTORIES=1 -Djavax.net.ssl.trustStore=/tmp/custom-cacerts -Djavax.net.ssl.trustStorePassword=changeit -jar /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Add one new test to test/extensions/common/dynamic_forward_proxy/dns_cache_impl_test.cc that covers the nextBackOffMs() == 0 case of the failure-backoff floor in PR envoyproxy#44570. Reviewer @agrawroh asked for "a test with a seeded random generator that forces nextBackOffMs() == 0".

Push as a single commit on top of branch flake-dfp-no in fork phlax/envoy. Do not open a new PR. Do not touch any other files.

What the PR already does (on flake-dfp-no)

In source/extensions/common/dynamic_forward_proxy/dns_cache_impl.cc finishResolve, the failure branch now:

  1. Takes the raw backoff via primary_host_info->failure_backoff_strategy_->nextBackOffMs().
  2. Caps it at host_ttl - elapsed (or sets to 0 if past TTL).
  3. Unconditionally floors the result at min_refresh_interval_ (dns_min_refresh_rate).
  4. Arms the refresh timer.

So when the jittered exponential backoff returns 0, the armed interval should equal min_refresh_interval_, not 0 and not 1ms.

How the random generator is plumbed

In DnsCacheImpl::DnsCacheImpl the random_generator_ member is initialised from:

context.serverFactoryContext().api().randomGenerator()

The backoff strategy is built in PrimaryHostInfo::PrimaryHostInfo via Config::Utility::prepareDnsRefreshStrategy<...>(..., parent_.random_generator_).

In the existing DnsCacheImplTest fixture, context_ is a NiceMock<Server::Configuration::MockGenericFactoryContext> whose server_context already carries a NiceMock<Random::MockRandomGenerator> accessible as context_.server_context_.api_.random_. We don't need to wire anything new — just set an ON_CALL on its random() method before initialize().

When random() is stubbed to return 0, the jittered exponential backoff returns 0. (This is how jittered backoff works throughout Envoy tests: the jitter is random() % next_interval, so random() == 0 → jitter 0; and with the base set to the configured dns_refresh_rate, nextBackOffMs() returns 0 on the first call because of the % next_interval folding when random() is 0.)

Note: if for some reason the strategy produces a non-zero value on the very first call regardless (depends on how the initial current_interval_ is seeded), the test can still work as a test of the floor semantics as long as the resulting nextBackOffMs() is strictly less than min_refresh_interval_. What matters is that the assertion enableTimer(min_refresh_interval_) holds. If the initial strategy value turns out to be >= min_refresh, please dig into source/common/common/backoff_strategy.h and Config::Utility::prepareDnsRefreshStrategy to understand the exact behaviour and, worst case, configure dns_refresh_rate to something smaller than dns_min_refresh_rate so the raw backoff is guaranteed to be less than the floor — whatever produces a reliable armed == min_refresh_interval_ result with random() stubbed to 0.

What to add

A new TEST_F(DnsCacheImplTest, ResolveFailureBackoffZeroFlooredByMinRefresh) placed immediately after the existing ResolveFailureBackoffCappedByHostTtl test in the same file.

Shape should mirror ResolveFailureBackoffCappedByHostTtl closely — same fixture, same InSequence s, same pattern of MockTimers, same loadDnsCacheEntry -> resolve_cb flow.

Steps:

  1. Before initialize(), stub the random generator:

    ON_CALL(context_.server_context_.api_.random_, random())
        .WillByDefault(testing::Return(0));

    (Include testing::Return via the existing using testing::Return; or qualify.)

  2. Configure:

    *config_.mutable_host_ttl() = Protobuf::util::TimeUtil::SecondsToDuration(60);
    *config_.mutable_dns_refresh_rate() = Protobuf::util::TimeUtil::SecondsToDuration(60);
    *config_.mutable_dns_min_refresh_rate() = Protobuf::util::TimeUtil::SecondsToDuration(1);
    initialize();

    A 60s host_ttl (well above dns_min_refresh_rate) ensures the cap does not kick in and the test isolates the floor behaviour: raw backoff == 0, TTL cap leaves it at 0, floor lifts it to min_refresh_interval_ (1000ms).

  3. Standard first resolve: loadDnsCacheEntry, success callback with one IP, assert onDnsHostAddOrUpdate and onLoadDnsCacheComplete, assert the first (success) enableTimer uses the successful path (this will be dns_ttl from the resolver response — copy the value from the ResolveFailureBackoffCappedByHostTtl test's first-resolve block, i.e. EXPECT_CALL(*resolve_timer, enableTimer(std::chrono::milliseconds(60000), _)); or whatever that test uses — keep it consistent).

  4. Advance sim time a small amount so elapsed is well below host_ttl (e.g. simTime().advanceTimeWait(std::chrono::milliseconds(100));).

  5. Invoke the refresh timer to trigger a re-resolve, then set up the failure callback. Assertion:

    EXPECT_CALL(*resolve_timer, enableTimer(std::chrono::milliseconds(1000), _));

    This is the floor (`dns_min_refresh_ra...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Add test for nextBackOffMs() == 0 case in DNS cache dns_cache: test that zero failure backoff is floored at dns_min_refresh_rate Apr 23, 2026
Copilot AI requested a review from phlax April 23, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants