OCPBUGS-59723: Add mechanism to delete MCN v1alpha1 CRD in 4.16+ clusters#5215
Conversation
|
@djoshy: This pull request references Jira Issue OCPBUGS-59723, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
isabella-janssen
left a comment
There was a problem hiding this comment.
This looks great to me, thanks @djoshy. Since I cannot answer your note about the image, I'll leave final review & tagging to a more senior member of the team than I.
yuqi-zhang
left a comment
There was a problem hiding this comment.
overall makes sense to me, just some followup questions inline
|
/retest |
1bcdfd1 to
10718d1
Compare
d679071 to
61bce7e
Compare
|
/test e2e-gcp-op |
43bc880 to
a9b20dd
Compare
|
@djoshy: This pull request references Jira Issue OCPBUGS-59723, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
a9b20dd to
682dbeb
Compare
yuqi-zhang
left a comment
There was a problem hiding this comment.
Overall the cronjob path makes sense to me, thanks for investigating thoroughly! Some last questions inline
| restartPolicy: OnFailure | ||
| containers: | ||
| - name: crd-cleanup | ||
| image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:48011539e9051f642e33e7aaa4617b93387e0d441f84d09d41f8f7966668bb04 |
There was a problem hiding this comment.
hmm, should this be hardcoded? I thought you wanted to reference the RHCOS image of the corresponding payload
Or if we want to remove this in 4.21+ maybe that's fine as well to just use a singular
There was a problem hiding this comment.
oh d'oh - I meant to change this back, was doing some manual testing and it must have snuck in. Although I guess your argument is fair, using the placeholder just makes it a tad more readable.
| # Run every minute initially to trigger an immediate run | ||
| schedule: "* * * * *" | ||
| # Don't suspend initially - let it run once | ||
| suspend: false |
There was a problem hiding this comment.
does this re-set every upgrade then? Since we patch it in the actual bash script, I thought the CVO would try to rectify back to this template?
There was a problem hiding this comment.
No, I think the create-only annotation should prevent updates to this resource.
|
/payload 4.20 blocking just to be safe |
|
/payload 4.20 nightly blocking |
|
@djoshy: trigger 10 job(s) of type blocking for the nightly release of OCP 4.20
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/69a756c0-73b3-11f0-8238-34e6750a6c1d-0 |
|
/test unit |
1 similar comment
|
/cherry-pick release-4.19 |
|
@djoshy: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test all |
|
/retest-required |
1 similar comment
|
/retest-required |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/e2e-gcp-op-1of2 Overriding GCP tests, this PR has passed them before on this commit and these jobs are failing to launch due to some infra issues |
|
@djoshy: Overrode contexts on behalf of djoshy: ci/prow/e2e-gcp-op-1of2, ci/prow/e2e-gcp-op-2of2, ci/prow/e2e-gcp-op-single-node DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
8f6c60c
into
openshift:main
|
@djoshy: Jira Issue OCPBUGS-59723: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-59723 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@djoshy: new pull request created: #5233 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
|
/cherry-pick release-4.19 |
|
@djoshy: new pull request could not be created: failed to create pull request against openshift/machine-config-operator#release-4.19 from head openshift-cherrypick-robot:cherry-pick-5215-to-release-4.19: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for openshift-cherrypick-robot:cherry-pick-5215-to-release-4.19."}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request","status":"422"} DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- What I did
This adds a new kubernetes CronJob manifest in the MCO's install folder to delete the unused
v1alpha1MCN CRD. This job has a run level of0000_80_machine-config_00; meaning that it would be deployed before the CVO applies thev1MCN CRD whose run level is0000_80_machine-config_01. While this delete may not happened instantly, once completed, the CVO will be able to successfully apply the v1 CRD. I also moved up the rbac manifest to the same run level so that the cronjob has the required permissions on its first try.- How to verify it
openshift-cluster-versionnamespace. When the CVO finally gets to the MCO's install/ manifests(~780 mark):The last line is the MCN sync, which may fail initially while the job is still running. Examine cronjobs via:
This should've spawned a job:
Running an
oc get pods | grep cleanupshould get you the pod name,which can be examined to observe the logs:
This took about a minute for me(GCP cluster); the CVO sync should now progress to roll out the 4.20 MCO pods. The new operator will now begin to update the control-plane and worker nodes. It's possible that the job logs may get lost as the cluster updates; but the CR itself should persist.
On an install, this cronjob should be a no-op; you should see pod logs like:
- Other notes
I chose a CronJob for two reasons: