OCPVE-719: feat: add support for olm capability#795
OCPVE-719: feat: add support for olm capability#795openshift-ci[bot] merged 1 commit intoopenshift:masterfrom
Conversation
|
@eggfoobar: This pull request references OCPVE-711 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@eggfoobar: This pull request references OCPVE-719 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| } | ||
|
|
||
| func isCapabilityEnabled(client configclientv1.ConfigV1Interface, expectedCap configv1.ClusterVersionCapability) bool { | ||
| cv, err := client.ClusterVersions().Get(context.TODO(), "version", metav1.GetOptions{}) |
There was a problem hiding this comment.
one-shot ClusterVersion fetch attempt here on operator initiation without a limiting Context has some risks. A lower-stakes approach would be to not enable the OLM portions until we can confirm they're enabled, so you could have behavior like:
- Console operator is coming up.
- Console operator attempts to retrieve ClusterVersion with a 5m deadline, but this fails.
- Console operator assumes that OLM is not enabled, and goes on to operate its other aspects.
- Follow-up attempt to check in on ClusterVersion succeeds, and it turns out that OLM is enabled.
- Console operator starts operating on the OLM-gated stuff too.
You'll want something like that anyway, because admins may decide to enable capabilities post-install. Only post-install disabling is not supported.
There was a problem hiding this comment.
@wking but looks like the current implementation is registering the informer and fetching OLM config only if the the capability is enabled. Is the only issue the context ?
There was a problem hiding this comment.
I think @wking would just like this to be a bit more robust for future use. So in the event that a user re-enables OLM after the fact, so the pod will need to be restarted for the console to correctly see the resources.
I'm not entirely sure what to do about the informer since it's attached at startup, I updated the code a bit since we just needed to know if a resource exists for that informer. I also updated the error checking so that we don't always skip based off of initial state. If I understand this right, the informer won't be attached, but the lister will be queried after the fact, so if the user re-enables OLM, then the CSV query will go through.
There was a problem hiding this comment.
Right but in case the capability will get enabled post-installation we are not registering the informer. Please check my comment below.
db606d2 to
4b3da5e
Compare
| defer cancel() | ||
|
|
||
| // We check if the resource exists in the cluster at all before attaching an informer | ||
| if _, err := dynamicClient.Resource(olmGroupVersionResource).List(ctx, metav1.ListOptions{}); !isNoMatchError(err) { |
There was a problem hiding this comment.
@eggfoobar what if the OLM capability is enabled post installation? Can the capability be enable at any time? or is there any restriction ?
In case the capability will be disabled at start and then enabled, I dont think the informer will be registered, since we are registering the informers in the NewConsoleOperator constructor.
Wondering if it wont be better to restart the operator's pod on this change?
There was a problem hiding this comment.
Yeah, that might be a good idea, I added some WIP logic to get this running in my e2e tests that just checks if the resource exists but in the future it would be preferable to have an informer on the clusterversions and if we have the resource enabled, we trigger a reboot then. There might be a more elegant way of doing that, but that's what came to mind, what do you think?
There was a problem hiding this comment.
So this is something I had in mind, to basically kill the pod end force new rollout so the informer is registered 👍
34f1536 to
e68c0a6
Compare
jhadvig
left a comment
There was a problem hiding this comment.
@eggfoobar thanks for addressing this 👍 I like how this got implemented
/lgtm
/approve
|
QE Approver: |
|
/label docs-approved |
|
/label px-approved |
|
/hold cancel |
|
/hold Apologies to add a hold on this so late, noticing some errors in the no-cap job that I would like to look over before we merge this in. Will unhold tomorrow as soon as I can. |
OLM is an optional component, adding conditional to remove informer on OLM resource Signed-off-by: ehila <ehila@redhat.com> fix: add olm check logic in sync loop for isCopiedCSVsDisabled Signed-off-by: ehila <ehila@redhat.com> refactor: small refinement, more logging and comments Signed-off-by: ehila <ehila@redhat.com>
e68c0a6 to
bbcf813
Compare
|
/unhold @jhadvig I was able to verify the error I was seeing wasn't happening again. I went a head and added some more comments and logs to make sure it's clear what's happening. Please take a look when you have a moment, thanks! |
|
/retest-required |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eggfoobar, jhadvig The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I built an image against the pr using cluster-bot, have done these tests:
@eggfoobar @jhadvig except above test If any other tests are needed for this pr? |
|
@yanpzhan That looks good to me, just to make sure the current changes were in. Can you tell me which 4.15 nightly release you used, and if you still have the cluster up, can you check |
|
@eggfoobar The target 4.15 nightly payload is 4.15.0-0.nightly-2023-10-09-101435, on the cluster, I checked oc get clusterversions version -oyaml, it doesn't contain "OperatorLifecycleManager" in "knownCapabilities". I wonder if this build doesn't contains changes about OLM Capability. |
1 similar comment
|
@eggfoobar The target 4.15 nightly payload is 4.15.0-0.nightly-2023-10-09-101435, on the cluster, I checked oc get clusterversions version -oyaml, it doesn't contain "OperatorLifecycleManager" in "knownCapabilities". I wonder if this build doesn't contains changes about OLM Capability. |
|
Ah yup, @yanpzhan would you be able to verify again with this nightly, 4.15.0-0.nightly-2023-10-17-065657 ?That should have the updated capabilities |
|
I launched a cluster with the pr and image 4.15.0-0.nightly-2023-10-17-065657 successfully, and checked clusterversions 'version', it contains "OperatorLifecycleManager" in "knownCapabilities" now. |
|
Perfect, thanks @yanpzhan ! In terms of what we need to verify with this PR, two things:
|
|
@eggfoobar I've checked again on new cluster. The 2 points you mentioned passed, no errors. Console works after re-enabled OLM by set 'OperatorLifecycleManager' in Clusterversions 'version'. oc get clusterversions.config.openshift.io version -ojson | jq .status.capabilities{ oc get clusterversions.config.openshift.io version -ojson | jq .status.capabilities{ |
|
Perfect, once OLM has been enabled, the specific resource that Console is looking for is |
|
Thanks for your guide. There is no issue for now. |
|
@eggfoobar: This pull request references OCPVE-719 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but it targets "openshift-4.15" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest-required |
1 similar comment
|
/retest-required |
|
/retest-required |
|
@eggfoobar: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
OLM is an optional component, adding conditional to remove informer on OLM resource
/hold
Need to find more context on the use of OLM resource in console and need to hold for API to be merged in openshift/api#1589