Skip to content

OCPVE-719: feat: add support for olm capability#795

Merged
openshift-ci[bot] merged 1 commit intoopenshift:masterfrom
eggfoobar:add-olm-capability-support
Oct 18, 2023
Merged

OCPVE-719: feat: add support for olm capability#795
openshift-ci[bot] merged 1 commit intoopenshift:masterfrom
eggfoobar:add-olm-capability-support

Conversation

@eggfoobar
Copy link
Copy Markdown
Contributor

OLM is an optional component, adding conditional to remove informer on OLM resource

/hold

Need to find more context on the use of OLM resource in console and need to hold for API to be merged in openshift/api#1589

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 23, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Sep 23, 2023

@eggfoobar: This pull request references OCPVE-711 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

Details

In response to this:

OLM is an optional component, adding conditional to remove informer on OLM resource

/hold

Need to find more context on the use of OLM resource in console and need to hold for API to be merged in openshift/api#1589

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2023
@openshift-ci openshift-ci Bot requested review from jhadvig and spadgett September 23, 2023 05:08
@eggfoobar eggfoobar changed the title OCPVE-711: feat: add support for olm capability OCPVE-719: feat: add support for olm capability Sep 29, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Sep 29, 2023

@eggfoobar: This pull request references OCPVE-719 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

Details

In response to this:

OLM is an optional component, adding conditional to remove informer on OLM resource

/hold

Need to find more context on the use of OLM resource in console and need to hold for API to be merged in openshift/api#1589

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment thread pkg/console/operator/operator.go Outdated
}

func isCapabilityEnabled(client configclientv1.ConfigV1Interface, expectedCap configv1.ClusterVersionCapability) bool {
cv, err := client.ClusterVersions().Get(context.TODO(), "version", metav1.GetOptions{})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one-shot ClusterVersion fetch attempt here on operator initiation without a limiting Context has some risks. A lower-stakes approach would be to not enable the OLM portions until we can confirm they're enabled, so you could have behavior like:

  1. Console operator is coming up.
  2. Console operator attempts to retrieve ClusterVersion with a 5m deadline, but this fails.
  3. Console operator assumes that OLM is not enabled, and goes on to operate its other aspects.
  4. Follow-up attempt to check in on ClusterVersion succeeds, and it turns out that OLM is enabled.
  5. Console operator starts operating on the OLM-gated stuff too.

You'll want something like that anyway, because admins may decide to enable capabilities post-install. Only post-install disabling is not supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking but looks like the current implementation is registering the informer and fetching OLM config only if the the capability is enabled. Is the only issue the context ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wking would just like this to be a bit more robust for future use. So in the event that a user re-enables OLM after the fact, so the pod will need to be restarted for the console to correctly see the resources.

I'm not entirely sure what to do about the informer since it's attached at startup, I updated the code a bit since we just needed to know if a resource exists for that informer. I also updated the error checking so that we don't always skip based off of initial state. If I understand this right, the informer won't be attached, but the lister will be queried after the fact, so if the user re-enables OLM, then the CSV query will go through.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but in case the capability will get enabled post-installation we are not registering the informer. Please check my comment below.

@eggfoobar eggfoobar force-pushed the add-olm-capability-support branch 2 times, most recently from db606d2 to 4b3da5e Compare October 5, 2023 02:08
Comment thread pkg/console/operator/operator.go Outdated
defer cancel()

// We check if the resource exists in the cluster at all before attaching an informer
if _, err := dynamicClient.Resource(olmGroupVersionResource).List(ctx, metav1.ListOptions{}); !isNoMatchError(err) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eggfoobar what if the OLM capability is enabled post installation? Can the capability be enable at any time? or is there any restriction ?
In case the capability will be disabled at start and then enabled, I dont think the informer will be registered, since we are registering the informers in the NewConsoleOperator constructor.
Wondering if it wont be better to restart the operator's pod on this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might be a good idea, I added some WIP logic to get this running in my e2e tests that just checks if the resource exists but in the future it would be preferable to have an informer on the clusterversions and if we have the resource enabled, we trigger a reboot then. There might be a more elegant way of doing that, but that's what came to mind, what do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is something I had in mind, to basically kill the pod end force new rollout so the informer is registered 👍

@eggfoobar eggfoobar force-pushed the add-olm-capability-support branch from 34f1536 to e68c0a6 Compare October 5, 2023 20:53
Copy link
Copy Markdown
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eggfoobar thanks for addressing this 👍 I like how this got implemented

/lgtm
/approve

@jhadvig
Copy link
Copy Markdown
Member

jhadvig commented Oct 12, 2023

QE Approver:
/assign @yanpzhan
Docs Approver:
/assign @opayne1
PX Approver:
/assign @RickJWagner

@openshift-ci openshift-ci Bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 12, 2023
@opayne1
Copy link
Copy Markdown
Contributor

opayne1 commented Oct 12, 2023

/label docs-approved

@openshift-ci openshift-ci Bot added the docs-approved Signifies that Docs has signed off on this PR label Oct 12, 2023
@RickJWagner
Copy link
Copy Markdown

/label px-approved

@openshift-ci openshift-ci Bot added the px-approved Signifies that Product Support has signed off on this PR label Oct 12, 2023
@jhadvig
Copy link
Copy Markdown
Member

jhadvig commented Oct 12, 2023

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2023
@eggfoobar
Copy link
Copy Markdown
Contributor Author

eggfoobar commented Oct 13, 2023

/hold

Apologies to add a hold on this so late, noticing some errors in the no-cap job that I would like to look over before we merge this in. Will unhold tomorrow as soon as I can.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 13, 2023
OLM is an optional component, adding conditional to remove informer on OLM resource

Signed-off-by: ehila <ehila@redhat.com>

fix: add olm check logic in sync loop for isCopiedCSVsDisabled

Signed-off-by: ehila <ehila@redhat.com>

refactor: small refinement, more logging and comments

Signed-off-by: ehila <ehila@redhat.com>
@eggfoobar eggfoobar force-pushed the add-olm-capability-support branch from e68c0a6 to bbcf813 Compare October 13, 2023 06:39
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2023
@eggfoobar
Copy link
Copy Markdown
Contributor Author

/unhold

@jhadvig I was able to verify the error I was seeing wasn't happening again. I went a head and added some more comments and logs to make sure it's clear what's happening. Please take a look when you have a moment, thanks!

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 13, 2023
@eggfoobar
Copy link
Copy Markdown
Contributor Author

/retest-required

@jhadvig
Copy link
Copy Markdown
Member

jhadvig commented Oct 13, 2023

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 13, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eggfoobar, jhadvig

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yanpzhan
Copy link
Copy Markdown

I built an image against the pr using cluster-bot, have done these tests:

  1. Launched a cluster using the image successfully, and updated it to 4.15 nightly payload successfully.
  2. Launched another 4.14 nightly cluster, updated it to the image I created successfully.
  3. Launched a cluster using the image with installation options: "baselineCapabilitySet: None" and "additionalEnabledCapabilities: ["Console"]", the cluster was installed successfully. Checked on web console, there was not operators in OperatorHub page.

@eggfoobar @jhadvig except above test If any other tests are needed for this pr?

@eggfoobar
Copy link
Copy Markdown
Contributor Author

@yanpzhan That looks good to me, just to make sure the current changes were in. Can you tell me which 4.15 nightly release you used, and if you still have the cluster up, can you check oc get clusterversions version -oyaml and make sure the knownCapabilities include OperatorLifecycleManager?

@yanpzhan
Copy link
Copy Markdown

@eggfoobar The target 4.15 nightly payload is 4.15.0-0.nightly-2023-10-09-101435, on the cluster, I checked oc get clusterversions version -oyaml, it doesn't contain "OperatorLifecycleManager" in "knownCapabilities". I wonder if this build doesn't contains changes about OLM Capability.

1 similar comment
@yanpzhan
Copy link
Copy Markdown

@eggfoobar The target 4.15 nightly payload is 4.15.0-0.nightly-2023-10-09-101435, on the cluster, I checked oc get clusterversions version -oyaml, it doesn't contain "OperatorLifecycleManager" in "knownCapabilities". I wonder if this build doesn't contains changes about OLM Capability.

@eggfoobar
Copy link
Copy Markdown
Contributor Author

Ah yup, @yanpzhan would you be able to verify again with this nightly, 4.15.0-0.nightly-2023-10-17-065657 ?That should have the updated capabilities

@yanpzhan
Copy link
Copy Markdown

yanpzhan commented Oct 17, 2023

I launched a cluster with the pr and image 4.15.0-0.nightly-2023-10-17-065657 successfully, and checked clusterversions 'version', it contains "OperatorLifecycleManager" in "knownCapabilities" now.

@eggfoobar
Copy link
Copy Markdown
Contributor Author

eggfoobar commented Oct 17, 2023

Perfect, thanks @yanpzhan !

In terms of what we need to verify with this PR, two things:

  1. Console should function normally with out OLM being enabled ✅ (you've already verified this part if no errors are happening)
  2. When you re-enable OLM via oc edit clusterversions version to enable OperatorLifecycleManager, after CVO installs OLM, we expect that console will restart in about 5 minutes and continue operating as normal.

@yanpzhan
Copy link
Copy Markdown

@eggfoobar I've checked again on new cluster. The 2 points you mentioned passed, no errors. Console works after re-enabled OLM by set 'OperatorLifecycleManager' in Clusterversions 'version'.
Before :

oc get clusterversions.config.openshift.io version -ojson | jq .status.capabilities

{
"enabledCapabilities": [
"Console"
],
"knownCapabilities": [
"Build",
"CSISnapshot",
"Console",
"DeploymentConfig",
"ImageRegistry",
"Insights",
"MachineAPI",
"NodeTuning",
"OperatorLifecycleManager",
"Storage",
"baremetal",
"marketplace",
"openshift-samples"
]
}
After:

oc get clusterversions.config.openshift.io version -ojson | jq .status.capabilities

{
"enabledCapabilities": [
"Console",
"OperatorLifecycleManager"
],
"knownCapabilities": [
"Build",
"CSISnapshot",
"Console",
"DeploymentConfig",
"ImageRegistry",
"Insights",
"MachineAPI",
"NodeTuning",
"OperatorLifecycleManager",
"Storage",
"baremetal",
"marketplace",
"openshift-samples"
]
}
Could you tell me what resources about OLM I could check to compare them before/after enable OLM?

@eggfoobar
Copy link
Copy Markdown
Contributor Author

eggfoobar commented Oct 18, 2023

Perfect, once OLM has been enabled, the specific resource that Console is looking for is olmconfigs so if you see that then things should be good, oc get olmconfigs, as long as that call doesn't fail then Console will work. That resource does not exist when OLM is not enabled.

@yanpzhan
Copy link
Copy Markdown

Thanks for your guide. There is no issue for now.
/label qe-approved

@openshift-ci openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label Oct 18, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Oct 18, 2023

@eggfoobar: This pull request references OCPVE-719 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but it targets "openshift-4.15" instead.

Details

In response to this:

OLM is an optional component, adding conditional to remove informer on OLM resource

/hold

Need to find more context on the use of OLM resource in console and need to hold for API to be merged in openshift/api#1589

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eggfoobar
Copy link
Copy Markdown
Contributor Author

/retest-required

1 similar comment
@eggfoobar
Copy link
Copy Markdown
Contributor Author

/retest-required

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD d02bd9b and 2 for PR HEAD bbcf813 in total

@eggfoobar
Copy link
Copy Markdown
Contributor Author

/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 18, 2023

@eggfoobar: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci Bot merged commit 301b4bf into openshift:master Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants