MCO-1482: pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition by wking · Pull Request #4760 · openshift/machine-config-operator

wking · 2024-12-17T02:38:55Z

956e787 (#4012) had added the "this should no longer trigger when adding a node to a pool" comment, but unfortunately, it's still triggering. For example, in this serial 4.19 run:

$ curl -s https://storage.googleapis.com/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/build-log.txt | grep 'PoolUpdating' | sort | uniq
time="2024-12-16T01:43:52Z" level=info msg="operator status: processing event" event="Dec 16 00:55:35.662 W clusteroperator/machine-config condition/Upgradeable reason/PoolUpdating status/False One or more machine config pools are updating, please see `oc get mcp` for further details" operator=machine-config
``

Checking PromeCIeus, the `Upgradeable=False` window seems to have been 00:56 through 00:59, which correlates with the scale-up/scale-down of the serial suite:

```console
$ curl -s https://storage.googleapis.com/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/build-log.txt | grep 'Managed cluster should grow and decrease when scaling different machineSets simultaneously'
started: 0/20/74 "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]"
passed: (5m42s) 2024-12-16T00:57:49 "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]"

confirmed via MCC logs:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/artifacts/e2e-gcp-ovn-serial-crun/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-controller-6f4f46457c-v8b2l_machine-config-controller.log | grep rendered-
I1216 00:55:35.430231       1 node_controller.go:584] Pool worker[zone=us-central1-f]: node ci-op-k8c03v6z-9149a-r27w7-worker-f-t7rmb: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:35.430252       1 node_controller.go:584] Pool worker[zone=us-central1-f]: node ci-op-k8c03v6z-9149a-r27w7-worker-f-t7rmb: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:36.174629       1 node_controller.go:584] Pool worker[zone=us-central1-a]: node ci-op-k8c03v6z-9149a-r27w7-worker-a-f7hkj: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:36.174738       1 node_controller.go:584] Pool worker[zone=us-central1-a]: node ci-op-k8c03v6z-9149a-r27w7-worker-a-f7hkj: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:41.296273       1 node_controller.go:584] Pool worker[zone=us-central1-b]: node ci-op-k8c03v6z-9149a-r27w7-worker-b-554bt: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:41.296306       1 node_controller.go:584] Pool worker[zone=us-central1-b]: node ci-op-k8c03v6z-9149a-r27w7-worker-b-554bt: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:47.106173       1 node_controller.go:584] Pool worker[zone=us-central1-c]: node ci-op-k8c03v6z-9149a-r27w7-worker-c-hshj2: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:47.106201       1 node_controller.go:584] Pool worker[zone=us-central1-c]: node ci-op-k8c03v6z-9149a-r27w7-worker-c-hshj2: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2

In this commit, I'm dropping the code that had been moving the ClusterOperator to Upgradeable=False on PoolUpdating entirely, instead of hoping that it doesn't trip. I haven't dug into why the code had still been tripping. But we want to stay Upgradeable=True while new nodes scale in, because clusters where nodes are joining should still be able to update to 4.(y+1). There are node-vs.-control-plane skew issues that should block updates to 4.(y+1), but they're enforced by the Kube API server operator (openshift/cluster-kube-apiserver-operator/pull/1199), and don't need the MCO chipping in.

- Description for the changelog

The machine-config ClusterOperator no longer goes Upgradeable=False on PoolUpdating when new Nodes join the cluster.

956e787 (Implement Upgrade-Monitor, FeatureGate, and MachineConfigNode types, 2023-11-28, openshift#4012) had added the "this should no longer trigger when adding a node to a pool" comment, but unfortunately, it's still triggering. For example, in [1]: $ curl -s https://storage.googleapis.com/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/build-log.txt | grep 'PoolUpdating' | sort | uniq time="2024-12-16T01:43:52Z" level=info msg="operator status: processing event" event="Dec 16 00:55:35.662 W clusteroperator/machine-config condition/Upgradeable reason/PoolUpdating status/False One or more machine config pools are updating, please see `oc get mcp` for further details" operator=machine-config Checking PromeCIeus, the Upgradeable=False window seems to have been 00:56 through 00:59, which correlates with the scale-up/scale-down of the serial suite: $ curl -s https://storage.googleapis.com/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/build-log.txt | grep 'Managed cluster should grow and decrease when scaling different machineSets simultaneously' started: 0/20/74 "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]" passed: (5m42s) 2024-12-16T00:57:49 "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]" confirmed via MCC logs: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/artifacts/e2e-gcp-ovn-serial-crun/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-controller-6f4f46457c-v8b2l_machine-config-controller.log | grep rendered- I1216 00:55:35.430231 1 node_controller.go:584] Pool worker[zone=us-central1-f]: node ci-op-k8c03v6z-9149a-r27w7-worker-f-t7rmb: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 I1216 00:55:35.430252 1 node_controller.go:584] Pool worker[zone=us-central1-f]: node ci-op-k8c03v6z-9149a-r27w7-worker-f-t7rmb: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 I1216 00:55:36.174629 1 node_controller.go:584] Pool worker[zone=us-central1-a]: node ci-op-k8c03v6z-9149a-r27w7-worker-a-f7hkj: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 I1216 00:55:36.174738 1 node_controller.go:584] Pool worker[zone=us-central1-a]: node ci-op-k8c03v6z-9149a-r27w7-worker-a-f7hkj: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 I1216 00:55:41.296273 1 node_controller.go:584] Pool worker[zone=us-central1-b]: node ci-op-k8c03v6z-9149a-r27w7-worker-b-554bt: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 I1216 00:55:41.296306 1 node_controller.go:584] Pool worker[zone=us-central1-b]: node ci-op-k8c03v6z-9149a-r27w7-worker-b-554bt: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 I1216 00:55:47.106173 1 node_controller.go:584] Pool worker[zone=us-central1-c]: node ci-op-k8c03v6z-9149a-r27w7-worker-c-hshj2: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 I1216 00:55:47.106201 1 node_controller.go:584] Pool worker[zone=us-central1-c]: node ci-op-k8c03v6z-9149a-r27w7-worker-c-hshj2: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2 In this commit, I'm dropping the code that had been moving the ClusterOperator to Upgradeable=False on PoolUpdating entirely, instead of hoping that it doesn't trip. I haven't dug into why the code had still been tripping. But we want to stay Upgradeable=True while new nodes scale in, because clusters where nodes are joining should still be able to update to 4.(y+1). There are node-vs.-control-plane skew issues that should block updates to 4.(y+1), but they're enforced by the Kube API server operator [2], and don't need the MCO chipping in. [1]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712 [2]: openshift/cluster-kube-apiserver-operator@9ce4f74

wking · 2024-12-17T05:09:30Z

Unit test failure seems unrelated to my change:

$ curl -s https://storage.googleapis.com/test-platform-results/pr-logs/pull/openshift_machine-config-operator/4760/pull-ci-openshift-machine-config-operator-master-unit/1868848364292935680/build-log.txt | grep 'build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab\|MachineConfig_changes_creates_a_new_MachineOSBuild'
=== RUN   TestOSBuildController/MachineConfig_changes_creates_a_new_MachineOSBuild
=== PAUSE TestOSBuildController/MachineConfig_changes_creates_a_new_MachineOSBuild
=== CONT  TestOSBuildController/MachineConfig_changes_creates_a_new_MachineOSBuild
I1217 02:53:50.070870   27103 wrappedqueue.go:249] Error executing "<kind: \"MachineOSConfig\", name: \"worker-os-config\", func: \"(*OSBuildController).addMachineOSConfig\">" in queue TestOSBuildController/MachineConfig_changes_creates_a_new_MachineOSBuild: Adding MachineOSConfig "worker-os-config" failed: could not sync MachineOSConfigs: sync MachineOSConfigs failed: could not sync MachineOSConfig "worker-os-config": Syncing MachineOSConfig "worker-os-config" failed: could not create new or reuse existing MachineOSBuild for MachineOSConfig "worker-os-config": could not create new MachineOSBuild "worker-os-config-4b619479eb172ec79b53c7f66901964a": machineosbuilds.machineconfiguration.openshift.io "worker-os-config-4b619479eb172ec79b53c7f66901964a" already exists
I1217 02:53:50.213825   27103 jobimagebuilder.go:103] Build job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" created for MachineOSBuild "worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.213847   27103 reconciler.go:380] Started new build build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab for MachineOSBuild
I1217 02:53:50.214342   27103 reconciler.go:380] Started new build build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab for MachineOSBuild
I1217 02:53:50.214353   27103 reconciler.go:792] Adding Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.216973   27103 reconciler.go:179] Adding build job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.219137   27103 reconciler.go:792] Updating Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.219347   27103 jobimagebuilder.go:191] Build job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" status {Conditions:[] StartTime:<nil> CompletionTime:<nil> Active:0 Succeeded:0 Failed:0 Terminating:<nil> CompletedIndexes: FailedIndexes:<nil> UncountedTerminatedPods:nil Ready:<nil>} mapped to MachineOSBuild progress "Prepared"
I1217 02:53:50.219425   27103 jobimagebuilder.go:191] Build job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" status {Conditions:[] StartTime:<nil> CompletionTime:<nil> Active:0 Succeeded:1 Failed:0 Terminating:<nil> CompletedIndexes: FailedIndexes:<nil> UncountedTerminatedPods:nil Ready:<nil>} mapped to MachineOSBuild progress "Succeeded"
I1217 02:53:50.219873   27103 reconciler.go:795] Finished updating Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" after 1.240185ms
I1217 02:53:50.220931   27103 jobimagebuilder.go:266] Deleted build job build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab for MachineOSBuild worker-os-config-57cd21cca292604d4624ef5c0f87d1ab
I1217 02:53:50.221571   27103 reconciler.go:792] Deleting Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.221700   27103 jobimagebuilder.go:191] Build job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" status {Conditions:[] StartTime:<nil> CompletionTime:<nil> Active:0 Succeeded:1 Failed:0 Terminating:<nil> CompletedIndexes: FailedIndexes:<nil> UncountedTerminatedPods:nil Ready:<nil>} mapped to MachineOSBuild progress "Succeeded"
I1217 02:53:50.221812   27103 reconciler.go:200] Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" deleted
I1217 02:53:50.222112   27103 jobimagebuilder.go:103] Build job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" created for MachineOSBuild "worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.222124   27103 reconciler.go:380] Started new build build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab for MachineOSBuild
I1217 02:53:50.222271   27103 reconciler.go:795] Finished adding Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" after 8.142732ms
I1217 02:53:50.222772   27103 reconciler.go:792] Adding Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.222785   27103 reconciler.go:179] Adding build job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab"
I1217 02:53:50.223267   27103 reconciler.go:795] Finished deleting Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" after 1.950558ms
I1217 02:53:50.224482   27103 reconciler.go:795] Finished adding Job "build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab" after 1.933926ms
=== NAME  TestOSBuildController/MachineConfig_changes_creates_a_new_MachineOSBuild
                Test:           TestOSBuildController/MachineConfig_changes_creates_a_new_MachineOSBuild
                Messages:       Build job build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab did not reach specified state%!(EXTRA string=Expected the build job %s to be deleted, string=build-worker-os-config-57cd21cca292604d4624ef5c0f87d1ab)
    --- FAIL: TestOSBuildController/MachineConfig_changes_creates_a_new_MachineOSBuild (5.00s)

And I also see that same test-case failing in the unit tests of other pulls, such as this run.

wking · 2024-12-17T05:15:36Z

e2e-gcp-op failure build02 cluster issue, also unrelated to my pull:

error occurred handling build src-amd64: build didn't start running within 1h0m0s (phase: Pending): ...

openshift-ci · 2024-12-17T06:52:23Z

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change	`377a78b`	link	false	`/test e2e-azure-ovn-upgrade-out-of-change`
ci/prow/okd-scos-e2e-aws-ovn	`377a78b`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/unit	`377a78b`	link	true	`/test unit`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

wking · 2024-12-17T19:29:22Z

build02 had bumped into openshift/cincinnati-graph-data#6463, but has since been recovered. Trying again:

/retest-required

wking · 2024-12-17T19:55:33Z

/payload-job periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun

openshift-ci · 2024-12-17T19:55:37Z

@wking: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e0e7ac40-bcb0-11ef-8916-660e893ad4c7-0

wking · 2024-12-17T22:23:43Z

Previous payload job had trouble with build02 scheduling. Trying again:

/payload-job periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun

openshift-ci · 2024-12-17T22:23:46Z

@wking: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/939799e0-bcc5-11ef-9e7e-45e528fe631f-0

openshift-ci-robot · 2024-12-18T00:25:40Z

@wking: This pull request references MCO-1482 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

Details

In response to this:

956e787 (#4012) had added the "this should no longer trigger when adding a node to a pool" comment, but unfortunately, it's still triggering. For example, in this serial 4.19 run:

$ curl -s https://storage.googleapis.com/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/build-log.txt | grep 'PoolUpdating' | sort | uniq
time="2024-12-16T01:43:52Z" level=info msg="operator status: processing event" event="Dec 16 00:55:35.662 W clusteroperator/machine-config condition/Upgradeable reason/PoolUpdating status/False One or more machine config pools are updating, please see `oc get mcp` for further details" operator=machine-config
``

Checking PromeCIeus, the `Upgradeable=False` window seems to have been 00:56 through 00:59, which correlates with the scale-up/scale-down of the serial suite:

```console
$ curl -s https://storage.googleapis.com/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/build-log.txt | grep 'Managed cluster should grow and decrease when scaling different machineSets simultaneously'
started: 0/20/74 "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]"
passed: (5m42s) 2024-12-16T00:57:49 "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]"

confirmed via MCC logs:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-serial-crun/1868424902256627712/artifacts/e2e-gcp-ovn-serial-crun/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-controller-6f4f46457c-v8b2l_machine-config-controller.log | grep rendered-
I1216 00:55:35.430231       1 node_controller.go:584] Pool worker[zone=us-central1-f]: node ci-op-k8c03v6z-9149a-r27w7-worker-f-t7rmb: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:35.430252       1 node_controller.go:584] Pool worker[zone=us-central1-f]: node ci-op-k8c03v6z-9149a-r27w7-worker-f-t7rmb: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:36.174629       1 node_controller.go:584] Pool worker[zone=us-central1-a]: node ci-op-k8c03v6z-9149a-r27w7-worker-a-f7hkj: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:36.174738       1 node_controller.go:584] Pool worker[zone=us-central1-a]: node ci-op-k8c03v6z-9149a-r27w7-worker-a-f7hkj: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:41.296273       1 node_controller.go:584] Pool worker[zone=us-central1-b]: node ci-op-k8c03v6z-9149a-r27w7-worker-b-554bt: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:41.296306       1 node_controller.go:584] Pool worker[zone=us-central1-b]: node ci-op-k8c03v6z-9149a-r27w7-worker-b-554bt: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:47.106173       1 node_controller.go:584] Pool worker[zone=us-central1-c]: node ci-op-k8c03v6z-9149a-r27w7-worker-c-hshj2: changed annotation machineconfiguration.openshift.io/currentConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2
I1216 00:55:47.106201       1 node_controller.go:584] Pool worker[zone=us-central1-c]: node ci-op-k8c03v6z-9149a-r27w7-worker-c-hshj2: changed annotation machineconfiguration.openshift.io/desiredConfig = rendered-worker-6d0e61dc44f24db3272625b901024ed2

In this commit, I'm dropping the code that had been moving the ClusterOperator to Upgradeable=False on PoolUpdating entirely, instead of hoping that it doesn't trip. I haven't dug into why the code had still been tripping. But we want to stay Upgradeable=True while new nodes scale in, because clusters where nodes are joining should still be able to update to 4.(y+1). There are node-vs.-control-plane skew issues that should block updates to 4.(y+1), but they're enforced by the Kube API server operator (openshift/cluster-kube-apiserver-operator/pull/1199), and don't need the MCO chipping in.

- Description for the changelog

The machine-config ClusterOperator no longer goes Upgradeable=False on PoolUpdating when new Nodes join the cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

wking · 2024-12-18T06:07:03Z

New payload job failed, mostly on disruption that seems unrelated to my change. But PromeCIeus confirms the machine-config ClusterOperator was, as desired, Upgradeable=True the whole time, despite nodes scaling into the cluster:

max by (__name__, condition, reason) (cluster_operator_conditions{name="machine-config",condition="Upgradeable"})
or
max by (__name__, label_beta_kubernetes_io_instance_type) (cluster:node_instance_type_count:sum)

yuqi-zhang

/lgtm

Looks like we haven't (non-cosmetically) updated that code in awhile, so I'm fine with removing the check. Degrades are probably what we should care about most of the time.

openshift-ci · 2025-01-07T00:50:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2025-01-07T14:50:20Z

/retest-required

openshift-ci-robot · 2025-01-07T17:39:50Z

/retest-required

Remaining retests: 0 against base HEAD 599f6cd and 2 for PR HEAD 377a78b in total

openshift-bot · 2025-01-08T04:58:37Z

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.19.0-202501080413.p0.gdf0b3ba.assembly.stream.el9.
All builds following this will include this PR.

wking · 2025-05-20T15:50:42Z

/cherrypick release-4.18

openshift-cherrypick-robot · 2025-05-20T15:51:23Z

@wking: new pull request created: #5065

Details

In response to this:

/cherrypick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

In 4.19: * 377a78b (pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition, 2024-12-16, openshift#4760). * 0c21907 (pkg/operator/status: Drop kubelet skew guard, 2025-04-03, openshift#4970). But in 4.18, we're using the other order: * 13cceb0 (pkg/operator/status: Drop kubelet skew guard, add RHEL guard, 2025-03-26, openshift#4956). * 20fe075 (pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition, 2024-12-16, openshift#5065). So I'm adding this follow-up commit within openshift#5065 to remove the 'updating' variable that both the kubelet-skew-guard and the PoolUpdating guard had used, but which we no longer need now that both are gone in 4.18.

wking · 2025-06-05T21:06:51Z

/cherrypick release-4.17

openshift-cherrypick-robot · 2025-06-05T21:07:33Z

@wking: new pull request created: #5111

Details

In response to this:

/cherrypick release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci Bot requested review from cheesesashimi and djoshy December 17, 2024 02:39

wking changed the title ~~pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition~~ MCO-1482: pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition Dec 18, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 18, 2024

yuqi-zhang approved these changes Jan 7, 2025

View reviewed changes

openshift-ci Bot assigned yuqi-zhang Jan 7, 2025

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jan 7, 2025

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 7, 2025

openshift-merge-bot Bot merged commit df0b3ba into openshift:master Jan 8, 2025

wking deleted the drop-PoolUpdating-from-Upgradeable-calculation branch January 8, 2025 01:43

wking mentioned this pull request Mar 28, 2025

HIVE-2819: Lift upgradeable condition from CVO to cluster deployment label openshift/hive#2639

Merged

openshift-cherrypick-robot mentioned this pull request May 20, 2025

OCPBUGS-56517: pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition #5065

Merged

openshift-cherrypick-robot mentioned this pull request Jun 5, 2025

OCPBUGS-57135: pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition #5111

Merged

wking mentioned this pull request Jun 12, 2025

OCPBUGS-57423: pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition #5119

Merged

Conversation

wking commented Dec 17, 2024

Uh oh!

wking commented Dec 17, 2024

Uh oh!

wking commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Dec 17, 2024

Uh oh!

wking commented Dec 17, 2024

Uh oh!

openshift-ci Bot commented Dec 17, 2024

Uh oh!

wking commented Dec 17, 2024

Uh oh!

openshift-ci Bot commented Dec 17, 2024

Uh oh!

openshift-ci-robot commented Dec 18, 2024 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Dec 18, 2024

Uh oh!

yuqi-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jan 7, 2025

Uh oh!

wking commented Jan 7, 2025

Uh oh!

openshift-ci-robot commented Jan 7, 2025

Uh oh!

openshift-bot commented Jan 8, 2025

Uh oh!

wking commented May 20, 2025

Uh oh!

openshift-cherrypick-robot commented May 20, 2025

Uh oh!

wking commented Jun 5, 2025

Uh oh!

openshift-cherrypick-robot commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wking commented Dec 17, 2024 •

edited

Loading

openshift-ci Bot commented Dec 17, 2024 •

edited

Loading

openshift-ci-robot commented Dec 18, 2024 •

edited by openshift-ci Bot

Loading