Skip to content
This repository was archived by the owner on Nov 24, 2025. It is now read-only.
This repository was archived by the owner on Nov 24, 2025. It is now read-only.

SteeringTest fails when LetsEncryptDnsChallengeWatcher hangs #5069

@zrhoffman

Description

@zrhoffman

I'm submitting a ...

  • bug report

Traffic Control components affected ...

  • Traffic Router

Current behavior:

When running the Traffic Router tests in ExternalTestSuite, SteeringTest fails with

Expected: a value greater than <986>
     but: <986> was equal to <986>
	at com.comcast.cdn.traffic_control.traffic_router.core.external.SteeringTest.z_itemsMigrateFromSmallerToLargerBucket(SteeringTest.java:356)

Expected behavior:

The tests in SteeringTest should pass

Minimal reproduction of the problem with instructions:

Run the external tests. For convenience, I have made a branch that runs SteeringTest only in CDN-in-a-Box
https://github.com/zrhoffman/trafficcontrol/commits/steering-test

You can use it to run the Traffic Router tests, including external tests, as follows:

docker-compose up -d
docker-compose -f docker-compose.readiness.yml up readiness # Waits for Traffic Monitor to be usable by Traffic Routers
docker-compose -f docker-compose.traffic-router-test.yml up tr-integration-test

Anything else:

SteeringTest has failed since 18fe13a/#3534 (and snuck into 4.1.0). The only 2 assertions that fail are these:

		assertThat(rehashedPaths.get(smallerTarget).size(), greaterThan(hashedPaths.get(smallerTarget).size()));
		assertThat(rehashedPaths.get(largerTarget).size(), lessThan(hashedPaths.get(largerTarget).size()));

I think this is because SteeringWatcher doesn't get to update after the POST request to the mock TO server to change the steering attributes.

The reason I don't think SteeringWatcher updates after that point is that LetsEncryptDnsChallengeWatcher's ScheduledExecutorService hangs when trying to update LetsEncryptDnsChallengeWatcher's database from https://${toHostname}/api/2.0/letsencrypt/dnsrecords/.

For whatever reason, Fetcher's request for LetsEncryptDnsChallengeWatcher never makes it to HttpDataServer (the mock TO server) even though the host and port seem to line up. But, if you make Fetcher return for LetsEncryptDnsChallengeWatcher requests right before it sends the request:

			if (url.contains("letsencrypt")) {
				return http;
			}

then SteeringTest succeeds. If you don't, the request doesn't seem to ever complete.

--

Other notes:

  • Changing DEFAULT_LE_DNS_CHALLENGE_URL to "https://${toHostname}/api/2.0/letsencrypt/dnsrecords" so it doesn't end in a slash, then adding a mock response for it does not keep LetsEncryptDnsChallengeWatcher's Fetcher request from hanging.
  • One AbstractResourceWatcher hanging should not make another AbstractResourceWatcher hang. If Traffic Router does this outside of tests, it's a serious bug that we should fix.

Metadata

Metadata

Assignees

Labels

Traffic Routerrelated to Traffic Routerregression buga bug in existing functionality introduced by a new versiontestsrelated to tests and/or testing infrastructure

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions