Bug 1986453: Check for API server and node versions skew#2658
Conversation
|
/assign @sinnykumari |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
some initial questions/comments
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
a few more comments
/hold
|
@QiWang19 I know that this PR is based on someone else's previous PR, but I think we can refine it. What most of my comments are asking for is: using that work as a baseline, how can we make this into an understandable & actionable status for a user. Also keep in mind a pool can be large, so the info needs to be easy to consume if we have, say, 50 nodes that have unsupported skew. Users can get into this situation bc a pool is paused and so it did not upgrade to match the apiserver and is still at an older version. They will likely have to let those pools upgrade (to at a minimum a supported skew) before they can initiate another clusterwide upgrade (which would cause the kubeapiserver to get even further away). So we need to think about telling them the state in a meaningful way, but also give them some hint about what they need to do to remedy it. Happy to discuss further if you have any questions. =) |
4cf72ac to
3664ebe
Compare
|
@kikisdeliveryservice Thanks for the explanation. I have cleaned up some reviews. PTAL. |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
@kikisdeliveryservice Thanks for the explanation. I have cleaned up some reviews. Could you PTAL? |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
a few more comments
|
Just noting that when this is done, we need to get the commits updated so Qi is a co-author. |
|
I tested this PR locally and it does report the skew, and go back to no skew as expected. |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
one last thing otherwise looks good
Co-authored-by: Qi Wang <qiwan@redhat.com> Signed-off-by: Qi Wang <qiwan@redhat.com>
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Thanks for all of the work on this. I think we've gotten it to a good state.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kikisdeliveryservice, QiWang19, rphillips The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@QiWang19: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@QiWang19: All pull requests linked via external trackers have merged: Bugzilla bug 1986453 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare-RHEL support, and I want the Node lister to look for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) but we are ok with RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare-RHEL support, and I want the Node lister to look for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) but we are ok with RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare-RHEL support, and I want the Node lister to look for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) but we are ok with RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare-RHEL support, and I want the Node lister to look for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) but we are ok with RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0
The kubelet skew guard is from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed a similar guard in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal from MCO-guards to KAS-guards, so I'm not clear on why the MCO guard landed. This commit drops it, to consolidate around the KAS-side guard.
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare-RHEL support, and I want the Node lister to look for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) but we are ok with RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0
The kubelet skew guard is from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed a similar guard in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal from MCO-guards to KAS-guards, so I'm not clear on why the MCO guard landed. This commit drops it, to consolidate around the KAS-side guard.
The kubelet skew guard is from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed a similar guard in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal from MCO-guards to KAS-guards, so I'm not clear on why the MCO guard landed. This commit drops it, to consolidate around the KAS-side guard.
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare, package-managed RHEL support. I'd initially thought about looking for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) while excluding RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0 But instead of switching on osImage, I'm using the node.openshift.io/os_id label to find package-managed RHEL Nodes. The machine-config operator is setting up the label [1] based on the ID value in /etc/os-release. On RHCOS instances, the ID value is 'rhcos' [2]. On package-managed RHEL, it's 'rhel' [3,4]. [1]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml#L19-L31 [2]: https://github.com/openshift/os/blob/41f6a028d37b750db0bf4257447d809bd9cbe4bf/manifest-ocp-rhel-9.6.yaml#L41 [3]: https://github.com/openshift/enhancements/blob/ea465e192bfb58ec8654f1c904a4af68777f68ec/enhancements/rhcos/split-rhcos-into-layers.md?plain=1#L416 [4]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/pkg/daemon/osrelease/osrelease.go#L69
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare, package-managed RHEL support. I'd initially thought about looking for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) while excluding RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0 But instead of switching on osImage, I'm using the node.openshift.io/os_id label to find package-managed RHEL Nodes. The machine-config operator is setting up the label [1] based on the ID value in /etc/os-release. On RHCOS instances, the ID value is 'rhcos' [2]. On package-managed RHEL, it's 'rhel' [3,4]. [1]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml#L19-L31 [2]: https://github.com/openshift/os/blob/41f6a028d37b750db0bf4257447d809bd9cbe4bf/manifest-ocp-rhel-9.6.yaml#L41 [3]: https://github.com/openshift/enhancements/blob/ea465e192bfb58ec8654f1c904a4af68777f68ec/enhancements/rhcos/split-rhcos-into-layers.md?plain=1#L416 [4]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/pkg/daemon/osrelease/osrelease.go#L69
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare, package-managed RHEL support. I'd initially thought about looking for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) while excluding RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0 But instead of switching on osImage, I'm using the node.openshift.io/os_id label to find package-managed RHEL Nodes. The machine-config operator is setting up the label [1] based on the ID value in /etc/os-release. On RHCOS instances, the ID value is 'rhcos' [2]. On package-managed RHEL, it's 'rhel' [3,4]. [1]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml#L19-L31 [2]: https://github.com/openshift/os/blob/41f6a028d37b750db0bf4257447d809bd9cbe4bf/manifest-ocp-rhel-9.6.yaml#L41 [3]: https://github.com/openshift/enhancements/blob/ea465e192bfb58ec8654f1c904a4af68777f68ec/enhancements/rhcos/split-rhcos-into-layers.md?plain=1#L416 [4]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/pkg/daemon/osrelease/osrelease.go#L69
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare, package-managed RHEL support. I'd initially thought about looking for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) while excluding RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0 But instead of switching on osImage, I'm using the node.openshift.io/os_id label to find package-managed RHEL Nodes. The machine-config operator is setting up the label [1] based on the ID value in /etc/os-release. On RHCOS instances, the ID value is 'rhcos' [2]. On package-managed RHEL, it's 'rhel' [3,4]. [1]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml#L19-L31 [2]: https://github.com/openshift/os/blob/41f6a028d37b750db0bf4257447d809bd9cbe4bf/manifest-ocp-rhel-9.6.yaml#L41 [3]: https://github.com/openshift/enhancements/blob/ea465e192bfb58ec8654f1c904a4af68777f68ec/enhancements/rhcos/split-rhcos-into-layers.md?plain=1#L416 [4]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/pkg/daemon/osrelease/osrelease.go#L69
The kubelet skew guard is from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed a similar guard in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal from MCO-guards to KAS-guards, so I'm not clear on why the MCO guard landed. This commit drops it, to consolidate around the KAS-side guard.
The kubelet skew guard is from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed a similar guard in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal from MCO-guards to KAS-guards, so I'm not clear on why the MCO guard landed. This commit drops it, to consolidate around the KAS-side guard.
ref: https://issues.redhat.com/browse/OCPNODE-595
replace #2552
- What I did
Check for API server and node versions skew.
Update with the message the Kube API version is skew too far, but do not force
Upgradeable=Falseaccording to enhancement https://github.com/openshift/enhancements/pull/762/files- How to verify it
- Description for the changelog