OCPBUGS-22364: Make MCO certificate observability fields optional for 4.15#1637
Conversation
So back in 4.14 we had to change the types of our cert observability certificate dates from strings to metav1.Time. This resulted in some API breakages, and in our haste for 4.14 we removed those fields but left the rest of the object. The fields have now been added back in 4.15, and we are hitting some timing issues between when the "new" CRD gets applied, and when the "old" pods get replaced. The errors are unpleasant and are blocking CI, so we're going to make these fields optional for 4.15 and then lock them back down to required in 4.16. We will not have this issue during the 4.15->4.16 transition because the MCO has full control of these fields and will ensure they are populated in 4.15.
|
@jkyros: This pull request references Jira Issue OCPBUGS-22364, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Hello @jkyros! Some important instructions when contributing to openshift/api: |
|
/jira refresh |
|
@jkyros: This pull request references Jira Issue OCPBUGS-22364, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold for QE verification |
|
built an image based on this PR. $ co machine-config
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
machine-config 4.14.0-rc.7 False True False 2m46s Cluster not available for [{operator 4.14.0-rc.7}]: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [status.controllerCertificates[0].notAfter: Required value, status.controllerCertificates[0].notBefore: Required value, status.controllerCertificates[1].notAfter: Required value, status.controllerCertificates[1].notBefore: Required value, status.controllerCertificates[2].notAfter: Required value, status.controllerCertificates[2].notBefore: Required value, status.controllerCertificates[3].notAfter: Required value, status.controllerCertificates[3].notBefore: Required value, status.controllerCertificates[4].notAfter: Required value, status.controllerCertificates[4].notBefore: Required value, status.controllerCertificates[5].notAfter: Required value, status.controllerCertificates[5].notBefore: Required value, status.controllerCertificates[6].notAfter: Required value, status.controllerCertificates[6].notBefore: Required value, status.controllerCertificates[7].notAfter: Required value, status.controllerCertificates[7].notBefore: Required value, status.controllerCertificates[8].notAfter: Required value, status.controllerCertificates[8].notBefore: Required value, status.controllerCertificates[9].notAfter: Required value, status.controllerCertificates[9].notBefore: Required value, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]are my steps correct? |
|
Thanks for testing! Just this merging by itself shouldn't change anything though, I'd still expect it to fail until we bring it back into the MCO. I've put up openshift/machine-config-operator#4003 for bringing it back into the MCO that should be testable. |
|
build new image based on openshift/machine-config-operator#4003 $ cv version -o yaml | yq -y '.status.history'
- acceptedRisks: 'Target release version="" image="registry.build05.ci.openshift.org/ci-ln-dl66frt/release:latest"
cannot be verified, but continuing anyway because the update was forced: release
images that are not accessed via digest cannot be verified
Forced through blocking failures: Multiple precondition checks failed:
* Precondition "EtcdRecentBackup" failed because of "ControllerStarted": RecentBackup:
The etcd backup controller is starting, and will decide if recent backups are
available or if a backup is required
* Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate":
RetrievedUpdates=False (VersionNotFound), so the recommended status of updating
from 4.14.1 to 4.15.0-0.test-2023-10-30-013427-ci-ln-dl66frt-latest is unknown.'
completionTime: '2023-10-30T06:36:49Z'
image: registry.build05.ci.openshift.org/ci-ln-dl66frt/release:latest
startedTime: '2023-10-30T05:36:33Z'
state: Completed
verified: false
version: 4.15.0-0.test-2023-10-30-013427-ci-ln-dl66frt-latest
- completionTime: '2023-10-30T03:51:17Z'
image: quay.io/openshift-release-dev/ocp-release@sha256:05ba8e63f8a76e568afe87f182334504a01d47342b6ad5b4c3ff83a2463018bd
startedTime: '2023-10-30T03:33:35Z'
state: Completed
verified: false
version: 4.14.1$ logs openshift-machine-config-operator -c machine-config-controller machine-config-controller-f975c6f98-kdjhz | grep -i 'Required value'
>> empty/label qe-approved |
|
@jkyros: This pull request references Jira Issue OCPBUGS-22364, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/unhold |
yuqi-zhang
left a comment
There was a problem hiding this comment.
/lgtm
Should be no harm in making this optional for now
|
For approval |
|
/test e2e-upgrade |
|
/lgtm This is needed to allow upgrades, the fields should always have been required but some issues during the API review and migration meant they didn't ship when they should have done, so need to be optional now |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jkyros, JoelSpeed, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jkyros: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@jkyros: Jira Issue OCPBUGS-22364: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-22364 has not been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build ose-cluster-config-api-container-v4.15.0-202311220209.p0.ga295b8c.assembly.stream for distgit ose-cluster-config-api. |
Mistakes were made:
metav1.TimeAvailable=Falsewhen the "old" MCO render_controller fails to supply the fields and returns an errorWhat this does:
Fixes: OCPBUGS-22364