Skip to content

OCPBUGS-22364: Make MCO certificate observability fields optional for 4.15#1637

Merged
openshift-merge-bot[bot] merged 2 commits into
openshift:masterfrom
jkyros:mco-optional-cert-times
Nov 21, 2023
Merged

OCPBUGS-22364: Make MCO certificate observability fields optional for 4.15#1637
openshift-merge-bot[bot] merged 2 commits into
openshift:masterfrom
jkyros:mco-optional-cert-times

Conversation

@jkyros
Copy link
Copy Markdown
Member

@jkyros jkyros commented Oct 25, 2023

Mistakes were made:

What this does:

  • makes these fields optional for 4.15 so the upgrades work properly and we unblock CI
  • (we intend to lock them back down to required in 4.16 for correctness )

Fixes: OCPBUGS-22364

So back in 4.14 we had to change the types of our cert observability
certificate dates from strings to metav1.Time. This resulted in some API
breakages, and in our haste for 4.14 we removed those fields but left
the rest of the object.

The fields have now been added back in 4.15, and we are hitting some
timing issues between when the "new" CRD gets applied, and when the
"old" pods get replaced.

The errors are unpleasant and are blocking CI, so we're going to make
these fields optional for 4.15 and then lock them back down to required
in 4.16. We will not have this issue during the 4.15->4.16 transition
because the MCO has full control of these fields and will ensure they
are populated in 4.15.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 25, 2023
@openshift-ci-robot
Copy link
Copy Markdown

@jkyros: This pull request references Jira Issue OCPBUGS-22364, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Mistakes were made:

What this does:

  • makes these fields optional for 4.15 so the upgrades work properly and we unblock CI
  • (we intend to lock them back down to required in 4.16 for correctness )

Fixes: OCPBUGS-22364

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 25, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 25, 2023

Hello @jkyros! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci Bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 25, 2023
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Oct 25, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 25, 2023
@openshift-ci-robot
Copy link
Copy Markdown

@jkyros: This pull request references Jira Issue OCPBUGS-22364, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 25, 2023
@openshift-ci openshift-ci Bot requested a review from sergiordlr October 25, 2023 22:11
@rioliu-rh
Copy link
Copy Markdown

/hold for QE verification

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 26, 2023
@rioliu-rh
Copy link
Copy Markdown

built an image based on this PR. 4.15.0-0.test-2023-10-26-053849-ci-ln-phx1kjk-latest
upgrade a cluster from 4.14.0-rc.7 to 4.15.0-0.test-2023-10-26-053849-ci-ln-phx1kjk-latest
the validation error is occurred.

$ co machine-config
NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.14.0-rc.7   False       True          False      2m46s   Cluster not available for [{operator 4.14.0-rc.7}]: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [status.controllerCertificates[0].notAfter: Required value, status.controllerCertificates[0].notBefore: Required value, status.controllerCertificates[1].notAfter: Required value, status.controllerCertificates[1].notBefore: Required value, status.controllerCertificates[2].notAfter: Required value, status.controllerCertificates[2].notBefore: Required value, status.controllerCertificates[3].notAfter: Required value, status.controllerCertificates[3].notBefore: Required value, status.controllerCertificates[4].notAfter: Required value, status.controllerCertificates[4].notBefore: Required value, status.controllerCertificates[5].notAfter: Required value, status.controllerCertificates[5].notBefore: Required value, status.controllerCertificates[6].notAfter: Required value, status.controllerCertificates[6].notBefore: Required value, status.controllerCertificates[7].notAfter: Required value, status.controllerCertificates[7].notBefore: Required value, status.controllerCertificates[8].notAfter: Required value, status.controllerCertificates[8].notBefore: Required value, status.controllerCertificates[9].notAfter: Required value, status.controllerCertificates[9].notBefore: Required value, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

are my steps correct?

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Oct 26, 2023

Thanks for testing! Just this merging by itself shouldn't change anything though, I'd still expect it to fail until we bring it back into the MCO. I've put up openshift/machine-config-operator#4003 for bringing it back into the MCO that should be testable.

@rioliu-rh
Copy link
Copy Markdown

build new image based on openshift/machine-config-operator#4003
upgrade cluster from 4.14.1 to this image
monitor co/machine-config during upgrade, check logs of pod/machine-config-controller-xx-yy, mcc pod was launched multiple times, no validation error found

$ cv version -o yaml | yq -y '.status.history'
- acceptedRisks: 'Target release version="" image="registry.build05.ci.openshift.org/ci-ln-dl66frt/release:latest"
    cannot be verified, but continuing anyway because the update was forced: release
    images that are not accessed via digest cannot be verified

    Forced through blocking failures: Multiple precondition checks failed:

    * Precondition "EtcdRecentBackup" failed because of "ControllerStarted": RecentBackup:
    The etcd backup controller is starting, and will decide if recent backups are
    available or if a backup is required

    * Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate":
    RetrievedUpdates=False (VersionNotFound), so the recommended status of updating
    from 4.14.1 to 4.15.0-0.test-2023-10-30-013427-ci-ln-dl66frt-latest is unknown.'
  completionTime: '2023-10-30T06:36:49Z'
  image: registry.build05.ci.openshift.org/ci-ln-dl66frt/release:latest
  startedTime: '2023-10-30T05:36:33Z'
  state: Completed
  verified: false
  version: 4.15.0-0.test-2023-10-30-013427-ci-ln-dl66frt-latest
- completionTime: '2023-10-30T03:51:17Z'
  image: quay.io/openshift-release-dev/ocp-release@sha256:05ba8e63f8a76e568afe87f182334504a01d47342b6ad5b4c3ff83a2463018bd
  startedTime: '2023-10-30T03:33:35Z'
  state: Completed
  verified: false
  version: 4.14.1
$ logs openshift-machine-config-operator -c machine-config-controller machine-config-controller-f975c6f98-kdjhz | grep -i 'Required value'
>> empty

/label qe-approved

@openshift-ci openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label Oct 30, 2023
@openshift-ci-robot
Copy link
Copy Markdown

@jkyros: This pull request references Jira Issue OCPBUGS-22364, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

Details

In response to this:

Mistakes were made:

What this does:

  • makes these fields optional for 4.15 so the upgrades work properly and we unblock CI
  • (we intend to lock them back down to required in 4.16 for correctness )

Fixes: OCPBUGS-22364

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rioliu-rh
Copy link
Copy Markdown

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2023
Copy link
Copy Markdown
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Should be no harm in making this optional for now

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2023
@sinnykumari
Copy link
Copy Markdown

For approval
/assign @deads2k

@sinnykumari
Copy link
Copy Markdown

/test e2e-upgrade

@JoelSpeed
Copy link
Copy Markdown
Contributor

/lgtm

This is needed to allow upgrades, the fields should always have been required but some issues during the API review and migration meant they didn't ship when they should have done, so need to be optional now

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 21, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkyros, JoelSpeed, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 21, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 21, 2023

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit a295b8c into openshift:master Nov 21, 2023
@openshift-ci-robot
Copy link
Copy Markdown

@jkyros: Jira Issue OCPBUGS-22364: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-22364 has not been moved to the MODIFIED state.

Details

In response to this:

Mistakes were made:

What this does:

  • makes these fields optional for 4.15 so the upgrades work properly and we unblock CI
  • (we intend to lock them back down to required in 4.16 for correctness )

Fixes: OCPBUGS-22364

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Copy Markdown

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-config-api-container-v4.15.0-202311220209.p0.ga295b8c.assembly.stream for distgit ose-cluster-config-api.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants