-
Notifications
You must be signed in to change notification settings - Fork 656
ca+node: Allow expired certs when forcing a new cluster #1919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
LK4D4
merged 4 commits into
moby:master
from
cyli:allow-expired-certs-when-forcing-new-cluster
Feb 2, 2017
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
335b4d5
Do slightly more validation of the join token when downloading the ro…
cyli aefc3e9
Include timestamp information when validating a TLS certificate fails…
cyli d1d1ae6
LoadSecurityConfig can now take a parameter that allows the expiry ch…
cyli d6df02b
Allow nodes that are initialized with ForceNewCluster=true to start w…
cyli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is clock skew, I see two cases, both of which have potential problems:
NotBeforewould become a timestamp in the future).I think we should address the renewal loop issue by not retrying if we receive a certificate that appears expired.
The self-issuing case is an inherent consequence of reissuing certs if they expire. If we decide to behave this way (which I think is the best choice, overall), we probably have to live with it. The only alternatives I can think of are:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we assuming clock skew can eventually fix itself (for instance, maybe it can't contact the NTP server for now)? If so, rather than terminate the loop entirely, would it make sense to set the retry interval to something really long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if it's really clock skew, is there any need to keep renewing? Presumably the certificate will become valid once the clock skew is fixed, unless we're talking about really short expiry intervals.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we terminate the renewal loop, even if the clock skew goes away, we won't be able to renew near when the certificate actually nears expiry. Let's say that the cert expires in a month, but our clock skew is 4 months off. If we kill the renew loop because it's expired, then even if the clock skew fixes itself, the cert will eventually expire without because it won't be renewed unless the node is restarted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that's ok, though, and our position is if there's a clock skew so drastic that it makes the certificate look expired, the only way to recover is to fix the clock skew and restart the node? (One has to restart, otherwise even if the clock skew fixes itself the node won't auto-renew certificates as it nears expiry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable. Or we could renew every few minutes. But doing it every few seconds seems excessive.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally set it to renew every few seconds because it could take a little while for the cluster to start up in the
ForceNewClustercase (since we have to wait for the cluster data to load).But it renews based on exponential backoff, so given the current configuration, it should be:
0s35s1m5s2m5s4m5sand then maxes out at 5 minutes (so it renews every 5 minutes). Although looking at the actual exponential backoff code, the actual time seems to be chosen from a uniform distribution between 0 and those times, so could be every few seconds if unlucky.
We could max it out at 1 hour instead.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, sorry for the silly enumeration of the backoff, I think I misunderstood your comment:
If the certificate expires, then the renewal fails, since
RenewTLSConfigNowcallsRequestAndSaveNewCertificates, which validates the cert, including expiry before returning. If it fails the expiry validation,RequestAndSaveCertificatesand thusRenewTLSConfigNowfails, which means the exponential backoff doesn't get reset to 0.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the backoff config to use a factor of a minute and max out at an hour. So the progression would be:
0s -> 1m5s -> 2m5s -> 4m5s -> 8m5s -> 16m5s -> 32m5s -> 1h0m0sThis may make the integration test take longer though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pausing 65 seconds in the integration test doesn't sound good to me. Let's make this backoff configurable by the test if necessary.