[ca] Root rotation reconciliation loop in the CA server#2100
Conversation
28e7bcb to
0ead5dc
Compare
Codecov Report
@@ Coverage Diff @@
## master #2100 +/- ##
==========================================
+ Coverage 59.32% 59.58% +0.25%
==========================================
Files 117 118 +1
Lines 19362 19536 +174
==========================================
+ Hits 11487 11641 +154
Misses 6556 6556
- Partials 1319 1339 +20Continue to review full report at Codecov.
|
| s.mu.Unlock() | ||
|
|
||
| s.wg.Add(1) | ||
| logger := log.G(ctx).WithField("method", "(*Server).reconileNodeRootsAndCerts") |
| }) | ||
| } | ||
|
|
||
| func (s *Server) reconileNodeRootsAndCerts() { |
There was a problem hiding this comment.
There is a typo in the name. reconile should be reconcile.
There was a problem hiding this comment.
Ah, thanks! I was having trouble spotting that.
| } | ||
|
|
||
| var signerCert []byte | ||
| if len(cluster.RootCA.RootRotation.CAKey) > 0 { |
There was a problem hiding this comment.
Don't we need to check that RootRotation is not nil?
There was a problem hiding this comment.
The previous block checks that the root CA is the same as our expected root CA, modulo join tokens. The expected root CA is what one loop of the reconcileNodeRootsAndCerts attempts to converge on, and that loop checks to make sure the root rotation isn't nil. So if the cluster's root CA is the same as the root CA we are trying to finish a rotation for, then it should have a root rotation that isn't nil.
I should probably add a comment saying that at the top of this function, though.
There was a problem hiding this comment.
Sorry, I don't follow. We are assigning cluster with cluster := store.GetCluster(tx, clusterID) above. It's possible something else changed the cluster in the store, so we can't rely on checks that happen before the GetCluster.
There was a problem hiding this comment.
Another possibility is that there is a root rotation, but the certs in the root rotation have changed (because someone wants to rotate to yet a different cert). So for each cycle of the root rotation reconciliation, we store the (api).RootCA value we're trying to converge to, and ensure that all the nodes are converged on that particular one.
When we call finishRootRotation, we pass that saved RootCA value. If the cluster's RootCA value has changed in the meanwhile, we abort the update (and hence abort finishing the root rotation), because either someone else set RootRotation to nil, or they changed the values for RootRotation, so we want to wait for the next cycle of reconciliation.
Since that stored RootCA that we're trying to make sure we converge on has been checked to make sure it does have a RootRotation value, if it and the cluster's RootCA value are equivalent modulo join tokens (which must be true before we proceed with the update), then the cluster's RootCA must have the same non-nil RootRotation .
There was a problem hiding this comment.
I get it now. I had missed the implications of the RootCAEqualStable check. I think this is fine as-is.
| }, opsTimeout)) | ||
| } | ||
|
|
||
| func TestSuccessfulRootRotation(t *testing.T) { |
| }, opsTimeout)) | ||
| } | ||
|
|
||
| func TestRepeatedRootRotation(t *testing.T) { |
| return false | ||
| } | ||
|
|
||
| func (s *Server) finishRootRotation(expectedRootCA *api.RootCA) error { |
There was a problem hiding this comment.
Random cosmetic thought: If we make this take a store.Tx argument, it would remove a level of indentation. The call to finishRootRotation could be merged into the store.Batch above it.
|
I noticed that one of the integration tests takes a long time: If there's an easy way to speed it up, I think it would be worth the effort. It's difficult to balance exhaustive testing with fast test cyles, but I really like that the swarmkit integration testsuite can run in ~25s. It encourages people to run the tests early and frequently. |
| s.mu.Unlock() | ||
| }() | ||
|
|
||
| for { |
There was a problem hiding this comment.
Would it add a lot of complexity to do this as a Watch that monitors node creations, node updates, and cluster updates? (edit: and node deletions)
Basically, you'd have a map where the keys are the nodes that still need to rotate their certificates. When you see a node rotate to the new certificate, you remove it from the map. When the size of the map reaches 0, you've converged.
I don't think there's anything inherently wrong with this polling-based approach, but I kind of like event-driven models. It might start making a difference in very large clusters.
There was a problem hiding this comment.
I can try. :D good thing is that we have tests already. We might be able to piggyback off of our existing watch of nodes and clusters.
There was a problem hiding this comment.
I have created a mapping of all the nodes that is updated by events. On a root CA update, if the issuer has changed and there's a root rotation present, it will start a reconciliation loop similar to the previous one, except it no longer polls the store, it just checks its in-memory map of nodes.
0a4bcae to
25d69c8
Compare
…es all the nodes to have an IssuanceStateRotate to trigger all the nodes to get new certificates. When all the nodes have rotated their certificates to be signed by the desired issuer, complete root rotation. Signed-off-by: cyli <ying.li@docker.com>
Signed-off-by: cyli <ying.li@docker.com>
8bc5c04 to
31e85fe
Compare
|
@aaronlehmann Not killing the leader in that long-running test seems to bring the time down to |
| // close to the latest versions of all the nodes. If not, the node will updated later and the | ||
| // next batch of updates should catch it. | ||
| for _, n := range toUpdate { | ||
| if err := batch.Update(func(tx store.Tx) error { return store.UpdateNode(tx, n) }); err != nil { |
There was a problem hiding this comment.
How about formatting it like this:
if err := batch.Update(func(tx store.Tx) error {
return store.UpdateNode(tx, n)
}); err != nil {| continue | ||
| } | ||
| iState := n.Certificate.Status.State | ||
| if iState != api.IssuanceStateRenew&iState && iState != api.IssuanceStatePending && iState != api.IssuanceStateRotate { |
There was a problem hiding this comment.
I think the &iState in here is unintentional
There was a problem hiding this comment.
Oh nice, I didn't even see that, thanks
| defer func() { | ||
| r.wg.Done() | ||
| r.mu.Lock() | ||
| r.isReconciling = false |
There was a problem hiding this comment.
Race: if runReconcilerLoop gets called after this goroutine decides to return, but before this lock is acquired and isReconciling is set to false.
One way to solve this would be to have a channel or context that can stop the goroutine, and to always stop an existing goroutine (if any) and then start a new one.
There was a problem hiding this comment.
That makes sense - we only call this if something has changed.
ac8cb55 to
2a91941
Compare
|
|
||
| "bytes" | ||
|
|
||
| "reflect" |
There was a problem hiding this comment.
Cosmetic: some extra blank lines
| return nil, errors.Wrap(err, "invalid certificate in cluster root CA object") | ||
| } | ||
| if len(issuerCerts) == 0 { | ||
| return nil, errors.Wrap(err, "invalid certificate in cluster root CA object") |
There was a problem hiding this comment.
err would be nil here
| func (r *rootRotationReconciler) DeleteNode(node *api.Node) { | ||
| if node == nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
It looks like this function is never called with a nil node.
| log.G(r.ctx).Infof("completed root rotation on cluster %s", r.clusterID) | ||
| return | ||
| } | ||
| log.G(r.ctx).WithError(err).Errorf("could not complete root rotation") |
| }, | ||
| }, | ||
| { | ||
| descr: ("If all nodes have the right TLS info or are already rotated (or are not members), the " + |
| // Directly update the nodes rather than get + update, and ignore version errors. Since | ||
| // `rootRotationReconciler` should be hooked up to all node update/delete/create vents, we should have | ||
| // close to the latest versions of all the nodes. If not, the node will updated later and the | ||
| // next batch of updates should catch it. |
There was a problem hiding this comment.
BTW, this is what the allocator should be doing, ideally. Right now it overwrites fields in the current version of the object, and that's bad.
| if err := batch.Update(func(tx store.Tx) error { | ||
| return store.UpdateNode(tx, n) | ||
| }); err != nil { | ||
| log.G(r.ctx).WithError(err).Debugf("unable to update node %s to request a certificate rotation") |
There was a problem hiding this comment.
It would be a good idea to suppress these messages when it's just a version conflict.
| log.WithField(ctx, "method", "(*Server).rootRotationReconciler"), | ||
| s.securityConfig.ClientTLSCreds.Organization(), | ||
| s.rootReconciliationRetryInterval, | ||
| s.store, s.lastSeenClusterRootCA, nodes) |
There was a problem hiding this comment.
I'd prefer to just inline the contents of newReconciler here. Passing so many things through function arguments loses clarity over initializing a structure directly.
| currentRootCA *api.RootCA | ||
| currentIssuer IssuerInfo | ||
| allNodes map[string]*api.Node | ||
| unconvergedNodes map[string]struct{} |
There was a problem hiding this comment.
I feel like it should be possible to only have unconvergedNodes map[string]*api.Node and reinitialize it from the store when we start a new root rotation, so in the steady state we aren't maintaining a copy of every node here, but I can't justify why that would be better, so I'm not asking for the change. Just thought I'd mention the idea.
There was a problem hiding this comment.
I was trying to check the store as little as possible, but that makes sense to not have 2 copies of everything.
2a91941 to
65a0053
Compare
|
(still working) |
65a0053 to
0837aea
Compare
|
@aaronlehmann I think I have addressed all of your comments, if you wouldn't mind taking another look whenever you are free :) (+ one typo) |
0837aea to
2d3d90f
Compare
| return r.finishRootRotation(tx, loopRootCA) | ||
| }) | ||
| if err == nil { | ||
| log.G(r.ctx).Infof("completed root rotation on cluster %s", r.clusterID) |
There was a problem hiding this comment.
Let's just say "completed root rotation" for now. We don't support multiple cluster objects yet so the cluster ID isn't meaningful to the user.
|
LGTM ping @diogomonica |
3b3ee0b to
cb71dbd
Compare
| ) | ||
|
|
||
| // IssuanceStateRotateBatchSize is the maximum number of nodes we'll tell to rotate their certificates in any given update | ||
| const IssuanceStateRotateBatchSize = 30 |
There was a problem hiding this comment.
If this is the maximum, shouldn't it have max in the name?
| }, nil | ||
| } | ||
|
|
||
| var errRootRotationChanged = errors.New("target root rotation has changed") |
There was a problem hiding this comment.
Move the error declaration and struct before the IssuerFromAPIRootCA?
| return | ||
| } | ||
| // If the issuer has changed, iterate through all the nodes to figure out which ones need rotation | ||
| if newRootCA.RootRotation != nil { |
There was a problem hiding this comment.
In what situation would we have an issuer mismatch, but no RootRotation?
There was a problem hiding this comment.
If the root rotation were abandoned, for instance. Previously the issuer would have been the new root cert, but if before the root rotation finished someone rotated the desired cert back to the original cert, the root rotation could be done.
| if hasIssuer(node, &r.currentIssuer) { | ||
| delete(r.unconvergedNodes, node.ID) | ||
| } else { | ||
| r.unconvergedNodes[node.ID] = node |
There was a problem hiding this comment.
In wich situation would a node not be in this r.unconvergedNodes map but UpdateNode gets called for it?
There was a problem hiding this comment.
If everything is working correctly, then it's unlikely, but if you are rotating the root cert back and forth really quickly, it's possible that the root ca in the reconciler finishes updating before the root CA in the signer finishes updating, and the node could get a new TLS cert signed with the previous root.
Previously there was a bug where this update took too long, and blocked the signer root CA update, and the nodes got a few TLS cert updates that were signed with the wrong key, and it eventually recovered.
| var toUpdate []*api.Node | ||
| for _, n := range r.unconvergedNodes { | ||
| iState := n.Certificate.Status.State | ||
| if iState != api.IssuanceStateRenew&iState && iState != api.IssuanceStatePending && iState != api.IssuanceStateRotate { |
There was a problem hiding this comment.
what is this IssuanceStateRenew&iState about?
There was a problem hiding this comment.
Boo, I thought I fixed this already when @aaronlehmann pointed it out - I must have reverted, sorry.
| } | ||
|
|
||
| var signerCert []byte | ||
| if len(cluster.RootCA.RootRotation.CAKey) > 0 { |
There was a problem hiding this comment.
Are we guaranteed to not have a nil cluster.RootCA and cluster.RootCA.RootRotation?
There was a problem hiding this comment.
Yes, because we compare it to the expectedRootCA first, which is guaranteed not to be nil or have a nil RootRotation. If the current root CA is not equal to the expected one, we abort.
cb71dbd to
0ba0da4
Compare
…ng the store for all nodes at intervals, rely on the cluster and node watches to update an in-memory mapping of the current nodes. At regular intervals, update the store to tell a throttled number of the unconverged nodes to rotate their certificates. Also, remove the leader rotation part of the root rotation integration test, as that takes a very long time. There are server tests to ensure that multiple CA servers running reconciliation loops, and starting a CA server from a stopped state, does not break root reconciliation. Signed-off-by: cyli <ying.li@docker.com>
The CA server, when it detects a root rotation, will start a reconciliation loop that tells all nodes to rotate their TLS certificates, and once all are done, completes the root rotation.
This is stacked on top of #2097Merged.Addresses #1990.
Note that this does not throttle how quickly the nodes get TLS certificates yet. This should be throttled in the dispatcher, not the CA server. For instance, if it detects a root rotation, possibly it could delay the session message for every node by a random number of seconds. This seems more likely to correctly delay nodes than setting the issuance state in batches, since the dispatcher can set the delay based on the number of nodes connected to it.Update: Was thinking through how to delay things on the dispatcher side, and we don't want to delay session messages just because of the certificate issuance change - in case other things such as peers or network keys have changed. So right now we batch up the updates to the nodes for certificate issuance changes, so that not all are rotated at once.