Bug 1947684: delay kubelet config readiness until after pools and controller config are ready#2517
Conversation
|
/retitle Bug 1947684: delay kubelet config readiness until after pools and controller config are ready |
|
@harche: This pull request references Bugzilla bug 1947684, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@rphillips: This pull request references Bugzilla bug 1947684, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (schoudha@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/lgtm |
|
/lgtm |
sinnykumari
left a comment
There was a problem hiding this comment.
LGTM
Test needs to pass first and would like to see e2e-gcp-op passes.
|
@harche Can you add this to the PR: |
|
/lgtm |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Why are we gating the kubeletconfig on controllerconfig, since I don't think they depend on each other at all? Is this basically trying to wait until the initial configuration for a cluster has settled before applying master kubeletconfigs? In that case wouldn't this cause 1 extra reboot for masters during install, because the kubeletconfig sometimes doesn't get applied until initial configuration is generated, which is somewhat counter to what we want? (maybe I'm mis-understanding something here)
I feel like for the bug we should fix it in a way such that the bootstrap already handles kubeletconfigs, such that the initial pool configuration has the kubeletconfig in place.
There was a problem hiding this comment.
Could you update the commit message such that you mention why this is being bumped
There was a problem hiding this comment.
The kubelet config is dependent on the controller config here: https://github.com/harche/machine-config-operator/blob/f28b2c2a4102228af4b6754522e13384711969e7/pkg/controller/kubelet-config/kubelet_config_controller.go#L321-L341
Ideally, initial rollout would be nice, but that isn't how the feature works today. Changes to kubelet configurations require a reboot.
There was a problem hiding this comment.
Ah my bad. I missed that connection. I think this code does make sense to make the wait explicit. that said, I'm not sure if this will achieve the desired effect of waiting for the initial configuration of the pool to settle (the linked bugzilla), since
was blocking it before now anyways. The race between the KCC generating MCs from kubeletconfigs and the in-cluster MCC generating the initial node configuration still exists I think.
If we want to go this route, maybe
Alternatively, I still think it might be a cleaner solution just to add a bootstrap reader mode to the kubeletconfigcontoller and have the bootstrap MCC call it (achieving day 1), which if we want to do in the future, we'd have to revert our proposed fix here,
There was a problem hiding this comment.
Ideally, we need to update the code to run at bootstrap. Perhaps @QiWang19 can look into updating the KubeletConfig as a second pass. I would like to unblock Omer.
The node_controller does something similar to see if the pool is ready...
https://github.com/harche/machine-config-operator/blob/5527f842eda1fd51538590a592d3c95a3cd679d9/pkg/controller/node/node_controller.go#L720-L727
Signed-off-by: Ryan Phillips <rphillip@redhat.com>
|
/retest |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Approving since this seems to fix the bootstrap race condition for kubeletconfigs.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: harche, rphillips, sinnykumari, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@harche: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@harche: All pull requests linked via external trackers have merged: Bugzilla bug 1947684 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Harshal Patil harpatil@redhat.com
- What I did
- How to verify it
- Description for the changelog