Start of centralizing node writes via nodeWriter#3141
Start of centralizing node writes via nodeWriter#3141cgwalters wants to merge 7 commits intoopenshift:masterfrom
Conversation
The NodeWriter is an instance of the "actor" pattern effectively; it processes messages to update its internal state. However, it's a bit redundant to have every update take the node lister and node name - we only use this on the daemon side, where there is exactly one node we will be updating. Have the clusterNodeWriter instance of this interface cache references to that data and avoid passing it on every message. Prep for reworking the NodeWriter to have its own clientset.
So I can use this from the daemon code.
|
Digging into this, there's a fair amount of stuff that's needed:
However, I think after that it would really make the most sense to change nodeWriter into "nodeActor" and have it also be the sole thing providing read access to the node object to the rest of the MCD. This would obviously touch more code. But one immediate thing this would clean up is that today the MCD (i think) watches all nodes via informers, which is clearly just wrong. We should only watch our own node, and then the nodeActor provides a "node changed" event to the rest of the MCD. |
Ensure that this code path is using kubelet credentials.
|
Ah yes, I forgot to start the informer; classic. Will look at fixing that up. |
9b7f193 to
b8683ac
Compare
|
/test e2e-gcp-op |
|
Does this direction make sense to everyone? |
|
To back up to a high level on this - we want to use the kubelet credentials for everything ideally, but the problem is kubelet can't read e.g. MachineConfig objects today. One option is to just add that power - then we use kubelet creds for everything. It's a bit tempting. This patch is splitting up our node mutations to use the nodeWriter which uses kubelet credentials, then our service account is mostly just there to read machineconfigs (and our node - but a future enhancement to this work could also change the |
|
I do kind of feel the right long term direction is to change things so that we unify hypershift and self-driving cases. Probably, we should indeed move away from a daemonset for self-driving too. Once we do that, it becomes natural then to e.g. put machineconfig in a configmap (or really, should be a secret) like hypershift is doing. Then the daemon/pod doesn't need any special roles. Things like ssh access monitoring, config drift could be implemented instead with a periodically scheduled pod and not a daemonset. (Although at the cost of polling vs being event driven) Another possible in between model is daemonset-per-pool - here we can then mount the current/desired machineconfig into the pods specifically rather than having the daemonset fetch them. |
I looked at this briefly; I think it'd require us to carry a patch to the apiserver to do that. Definitely ugly. I think I'd lean towards going this direction instead, but open to debate. |
|
/approve cancel |
|
/test unit |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
As a first pass (without reviewing the discussion point on the PR), I think this is a solid optimization that yields more readable and centralized code. Always felt odd to have to pass nodenames through daemon functions bc well.. shouldn't it know where it is? 😆
There was a problem hiding this comment.
q: similar block above but that leads to a ctrl.common.WriteTerminationError while this is fatalf with Writetermination error for block below. Is this consistent? (Might be, I might be misunderstanding).
There was a problem hiding this comment.
(Forgot to look at this one, will do in the next pass)
|
OK cool. I took the first two commits and put them in #3143 - I think we can get that in now. I'll add some more work on top of this one for the followup listed above. |
This ensures it uses kubelet credentials.
b8683ac to
2ba5dd8
Compare
We should be able to rely on `nodeWriter` using kubelet credentials now.
|
/test unit |
|
/retest |
I originally missed that `loadAnnotations` was actually calling this API which should have been nodeWriter internal. But going to fix it, then I hit on the problem that the code flow in the daemon relied on getting an updated node object back. Fixing that required some code churn to also pass back the mutated node object through the response message. I think what we *really* want actually is to wait for the informer to update and pass that back instead. Which then really leads to making this stuff part of a control loop on the daemon and not randomly blocking here. For now though, let's keep the daemon code as is and just change this bit to use nodeWriter for kubelet credentials.
|
OK, let's see how this goes in a CI run; only compile tested locally. |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Approving the general direction. Also rebased #3135 onto this
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Trying this again, will report if see same failure |
|
new error, will report: Reported and it's an issue with some deprovisioning hitting us today. |
Hmm, this was a bit concerning at first because I thought NTO might be generating some machineconfig, but AFAICS that isn't the case. I looked at their logs but didn't see the problem. EDIT: OK, looks like this is a common failure |
|
The installer PR merged. Let's try /test e2e-gcp-op |
|
/test e2e-gcp-op |
|
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
OK, I think we need to take this PR together with #3135 |
Part of #3135
daemon: Have nodeWriter maintain ref to lister and node name
The NodeWriter is an instance of the "actor" pattern effectively;
it processes messages to update its internal state.
However, it's a bit redundant to have every update take the node
lister and node name - we only use this on the daemon side,
where there is exactly one node we will be updating.
Have the clusterNodeWriter instance of this interface cache
references to that data and avoid passing it on every message.
Prep for reworking the NodeWriter to have its own clientset.
controller: Export default resync period function
So I can use this from the daemon code.
daemon: Have nodewriter use kubelet credentials
daemon: Change hypershift path to use nodewriter
Ensure that this code path is using kubelet credentials.