Skip to content
This repository was archived by the owner on Nov 24, 2025. It is now read-only.

Fix TR NullPointerException in LetsEncryptDnsChallengeWatcher#5342

Merged
mattjackson220 merged 1 commit intoapache:masterfrom
rawlinp:fix-tr-le-npe
Dec 8, 2020
Merged

Fix TR NullPointerException in LetsEncryptDnsChallengeWatcher#5342
mattjackson220 merged 1 commit intoapache:masterfrom
rawlinp:fix-tr-le-npe

Conversation

@rawlinp
Copy link
Copy Markdown
Contributor

@rawlinp rawlinp commented Dec 2, 2020

What does this PR (Pull Request) do?

Fixes a NullPointerException in LetsEncryptDnsChallengeWatcher which I believe is breaking the Let's Encrypt functionality.

Because databasesDirectory is already a property of the extended class AbstractServiceUpdater, it should not be redeclared in LetsEncryptDnsChallengeWatcher. The Spring framework was setting the value in LetsEncryptDnsChallengeWatcher, but the AbstractServiceUpdater uses its own value which was null because setDatabasesDirectory was overridden by LetsEncryptDnsChallengeWatcher.

Also, log exceptions when catching them in order to capture the full stacktraces in the log.

Which Traffic Control components are affected by this PR?

  • Traffic Router

What is the best way to verify this PR?

Run TR, view the logs to ensure that the following NPE does not occur periodically:

2020-12-01 16:30:33,834 [ERROR] traffic_router.core.loc.AbstractServiceUpdater: [] Failed updating database!
java.lang.NullPointerException
	at java.nio.file.Files.provider(Files.java:97)
	at java.nio.file.Files.exists(Files.java:2385)
	at com.comcast.cdn.traffic_control.traffic_router.core.loc.AbstractServiceUpdater.updateDatabase(AbstractServiceUpdater.java:104)
	at com.comcast.cdn.traffic_control.traffic_router.core.loc.AbstractServiceUpdater$1.run(AbstractServiceUpdater.java:86)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

and instead see logs like this:

2020-12-01 16:33:06,704 [INFO ] traffic_router.core.util.Fetcher: GETing: https://trafficops.example.org/api/2.0/letsencrypt/dnsrecords/; timeout is 15000
2020-12-01 16:33:06,900 [INFO ] traffic_router.core.loc.AbstractServiceUpdater: [LetsEncryptDnsChallengeWatcher] database unchanged.

Verify the TR unit and integration tests pass.

If this is a bug fix, what versions of Traffic Control are affected?

  • master
  • 5.0.x

The following criteria are ALL met by this PR

  • This PR includes tests OR I have explained why tests are unnecessary
  • bug fix, no docs necessary
  • unreleased bug, no changelog necessary
  • This PR includes any and all required license headers
  • This PR DOES NOT FIX A SERIOUS SECURITY VULNERABILITY (see the Apache Software Foundation's security guidelines for details)

Because databasesDirectory is already a property of the extended class
AbstractServiceUpdater, it should not be redeclared in
LetsEncryptDnsChallengeWatcher.

Also, log exceptions when catching them in order to capture the full
stacktraces in the log.
@rawlinp rawlinp added bug something isn't working as intended Traffic Router related to Traffic Router labels Dec 2, 2020
@rawlinp rawlinp added this to the 5.0.0 milestone Dec 2, 2020
Copy link
Copy Markdown
Member

@zrhoffman zrhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Traffic Router external tests never complete after this PR.

In a normal run, the integration and external tests complete within 9 minutes and 40 seconds of starting CDN in a Box: https://github.com/zrhoffman/trafficcontrol/runs/1495788949

When the commit from this PR is applied, the job times out after CDN in a Box has been running for 30 minutes (the TR integration and external tests have been running for about 27 minutes at this point): https://github.com/zrhoffman/trafficcontrol/runs/1495797296

I haven't PRed the GitHub action that runs the TR integration tests because it already comments out RouterTest in order for the external tests to complete until I can work out some SSL cert issues, but it should still be useful to test this PR.

@zrhoffman
Copy link
Copy Markdown
Member

I see, this PR partially reverses the changes in #5280, so the timeout is just #5069 again. So not actually a new problem.

@rawlinp
Copy link
Copy Markdown
Contributor Author

rawlinp commented Dec 4, 2020

So, if I understand correctly, #5280 "fixed" the external integration test timeout by causing a NullPointerException in the LetsEncryptDnsChallengeWatcher, and by fixing that, the external integration test still hangs again.

What's strange is that seems to be a CIAB issue, because when I run the integration/external tests locally (excluding that one failing RouterTest), they pass.

Copy link
Copy Markdown
Contributor

@mattjackson220 mattjackson220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests pass, code looks good, fixes NPE as expected!

@mattjackson220 mattjackson220 merged commit d4780c3 into apache:master Dec 8, 2020
@rawlinp rawlinp deleted the fix-tr-le-npe branch December 8, 2020 16:57
ocket8888 pushed a commit that referenced this pull request Dec 8, 2020
Because databasesDirectory is already a property of the extended class
AbstractServiceUpdater, it should not be redeclared in
LetsEncryptDnsChallengeWatcher.

Also, log exceptions when catching them in order to capture the full
stacktraces in the log.

(cherry picked from commit d4780c3)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bug something isn't working as intended Traffic Router related to Traffic Router

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants