Skip to content

[release-4.13] OCPBUGS-38254: [CARRY] perform operator apiService certificate validity checks directly#836

Merged
openshift-merge-bot[bot] merged 4 commits intoopenshift:release-4.13from
ankitathomas:OCP25341-4.13
Sep 12, 2024
Merged

[release-4.13] OCPBUGS-38254: [CARRY] perform operator apiService certificate validity checks directly#836
openshift-merge-bot[bot] merged 4 commits intoopenshift:release-4.13from
ankitathomas:OCP25341-4.13

Conversation

@ankitathomas
Copy link
Copy Markdown
Contributor

@ankitathomas ankitathomas commented Aug 9, 2024

Manual cherry-pick of #821 to 4.14

The issue for this is open for years and it's not super interesting to
go debug it. The test threads will exit when the test process does.
Having teardown fail means none of the other tests run for me.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Upstream-repository: operator-lifecycle-manager
Upstream-commit: b683c28b31ad12f8acb8f7fd4d7beb85c74a751f
@openshift-ci openshift-ci Bot requested review from oceanc80 and perdasilva August 9, 2024 15:34
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Aug 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ankitathomas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ankitathomas ankitathomas changed the title Ocp25341 4.13 [release-4.13] OCPBUGS-38254: [CARRY] perform operator apiService certificate validity checks directly Aug 9, 2024
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2024
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Aug 9, 2024
@openshift-ci-robot
Copy link
Copy Markdown

@ankitathomas: This pull request references Jira Issue OCPBUGS-38254, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.z) matches configured target version for branch (4.13.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-36949 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-36949 targets the "4.14.z" version, which is one of the valid target versions: 4.14.0, 4.14.z
  • bug has dependents

No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Aug 9, 2024
@openshift-ci-robot
Copy link
Copy Markdown

@ankitathomas: This pull request references Jira Issue OCPBUGS-38254, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.z) matches configured target version for branch (4.13.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-36949 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-36949 targets the "4.14.z" version, which is one of the valid target versions: 4.14.0, 4.14.z
  • bug has dependents

No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request.

Details

In response to this:

Manual cherry-pick of #821 to 4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jianzhangbjz
Copy link
Copy Markdown
Contributor

jianzhangbjz commented Aug 12, 2024

Hi @ankitathomas , I got the below error when building cluster with this PR, could you help have a look when you get a chance? Thanks! https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-gcp-modern/1822827253600358400

Go compliance shim [6124] [rhel-8-golang-1.19][openshift-golang-builder]: invoking real go binary
# github.com/operator-framework/operator-lifecycle-manager/pkg/controller/install
vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/install/certresources.go:280:19: undefined: sets.New
Go compliance shim [6124] [rhel-8-golang-1.19][openshift-golang-builder]: Exited with: 1
make[1]: Leaving directory '/build'
make[1]: *** [Makefile:79: github.com/operator-framework/operator-lifecycle-manager/cmd/catalog] Error 1
make: *** [Makefile:67: build/olm] Error 2
error: build error: building at STEP "RUN make build/olm bin/cpb": while running runtime: exit status 2

@jianzhangbjz
Copy link
Copy Markdown
Contributor

Test pass, details: https://issues.redhat.com/browse/OCPBUGS-38254
/lgtm
/label qe-approved
/label cherry-pick-approved

@openshift-ci openshift-ci Bot added qe-approved Signifies that QE has signed off on this PR cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. labels Aug 13, 2024
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2024
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2024
@perdasilva
Copy link
Copy Markdown
Contributor

/retest

@perdasilva
Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@ankitathomas
Copy link
Copy Markdown
Contributor Author

/retest

@jianzhangbjz
Copy link
Copy Markdown
Contributor

/test e2e-gcp-olm-flaky
/test e2e-gcp-olm

@ankitathomas
Copy link
Copy Markdown
Contributor Author

/retest

@jianzhangbjz
Copy link
Copy Markdown
Contributor

/test e2e-gcp-olm

@jianzhangbjz
Copy link
Copy Markdown
Contributor

/test e2e-gcp-olm-flaky

@jianzhangbjz
Copy link
Copy Markdown
Contributor

/test e2e-gcp-olm

@jianzhangbjz
Copy link
Copy Markdown
Contributor

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_operator-framework-olm/836/pull-ci-openshift-operator-framework-olm-release-4.13-e2e-gcp-olm/1824389101466423296

level=error msg=Cluster operator kube-scheduler Degraded is True with MissingStaticPodController_SyncError::StaticPods_Error: MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "openshift-kube-scheduler" in namespace: "openshift-kube-scheduler" for revision: 7 on node: "ci-op-di2mvmr9-ae41b-g86vb-master-1" didn't show up, waited: 3m0s
level=error msg=StaticPodsDegraded: pod/openshift-kube-scheduler-ci-op-di2mvmr9-ae41b-g86vb-master-1 container "kube-scheduler" is terminated: Completed: 
level=error msg=StaticPodsDegraded: pod/openshift-kube-scheduler-ci-op-di2mvmr9-ae41b-g86vb-master-1 container "kube-scheduler-cert-syncer" is terminated: Error: go:169: Failed to watch *v1.Secret: failed to list *v1.Secret: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-scheduler/secrets?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
level=error msg=StaticPodsDegraded: W0816 10:51:46.493141       1 reflector.go:424] k8s.io/client-go@v0.26.10/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-scheduler/configmaps?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
...

@perdasilva
Copy link
Copy Markdown
Contributor

@ankitathomas e2e is failing because of another fetchCSV with the wrong ordering in the parameters (namespace <-> name). The flake (blocks a CRD upgrade that could cause data loss) is a perma fail. While I think the workload is still being protected (i.e. the upgrade doesn't go through), it seems the IP isn't getting updated with the error (at least not in the right place) ? Or being put in the right state after detecting it?


Eventually(func() error {
// Fetch the current csv
fetchedCSV, err := fetchCSV(crc, csv.Name, generatedNamespace.GetName(), csvSucceededChecker)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fetchedCSV, err := fetchCSV(crc, csv.Name, generatedNamespace.GetName(), csvSucceededChecker)
fetchedCSV, err := fetchCSV(crc, generatedNamespace.GetName(), csv.Name, csvSucceededChecker)

tmshort and others added 3 commits August 27, 2024 14:24
Some of the e2e loops are a bit flakey, make them more robust
* Retry on certain errors
* Use Eventually() consistently (avoid wait.Poll())
* Clarify logging
* Reduce some logging
* Change csvExists() to waitForCsvToDelete(), as that's how it's used
* Change awaitCSV() to fetchCSV()

Signed-off-by: Todd Short <todd.short@me.com>
Upstream-repository: operator-lifecycle-manager
Upstream-commit: e5f7320f29ee4e9def114d6dcc1d22b4c7bb2b0d
Fix #3151

Remove non-InstallPlan related checks for this test.

Also:
* Clean up some looping log messages
* Clean up some logging added when comments were converted

These comments/logs are at the beginning of the test, and are also part
of the test sequence, so they are redundant (and possibly confusing)

Signed-off-by: Todd Short <todd.short@me.com>
Upstream-repository: operator-lifecycle-manager
Upstream-commit: 5299830576c8e8e6cd728b08a3a2e60f212ba387
…cks directly (#3217)

* perform operator apiService certificate validity checks directly

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

* use sets to track certs to install, revert to checking for installPlan
timeouts after API availability checks, add service FQDN to list of
hostnames.

Signed-off-by: Ankita Thomas <ankithom@redhat.com>
Upstream-repository: operator-lifecycle-manager
Upstream-commit: 908da0c05363da40ad09ab774d9904b22aca7869

---------

Signed-off-by: Ankita Thomas <ankithom@redhat.com>
@ankitathomas
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@ankitathomas
Copy link
Copy Markdown
Contributor Author

/retest

@jianzhangbjz
Copy link
Copy Markdown
Contributor

/test e2e-gcp-olm-flaky

@jianzhangbjz
Copy link
Copy Markdown
Contributor

It failed at:

Summarizing 1 Failure:
  [FAIL] CRD Versions [It] [FLAKE] blocks a CRD upgrade that could cause data loss
  /go/src/github.com/openshift/operator-framework-olm/staging/operator-lifecycle-manager/test/e2e/crd_e2e_test.go:275
Ran 8 of 199 Specs in 305.024 seconds
FAIL! -- 7 Passed | 1 Failed | 2 Pending | 189 Skipped
--- FAIL: TestEndToEnd (305.04s)

@tmshort
Copy link
Copy Markdown
Contributor

tmshort commented Sep 6, 2024

That is a known flake.
/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Sep 6, 2024
@ankitathomas
Copy link
Copy Markdown
Contributor Author

/retest

@jianzhangbjz
Copy link
Copy Markdown
Contributor

/test e2e-gcp-olm-flaky

@jianzhangbjz
Copy link
Copy Markdown
Contributor

Hi @tmshort , I guess it needs the backport-risk-assessed label.

@perdasilva
Copy link
Copy Markdown
Contributor

/label backport-risk-assessed

@openshift-ci openshift-ci Bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Sep 12, 2024
@openshift-merge-bot openshift-merge-bot Bot merged commit 42d01eb into openshift:release-4.13 Sep 12, 2024
@openshift-ci-robot
Copy link
Copy Markdown

@ankitathomas: Jira Issue OCPBUGS-38254: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38254 has been moved to the MODIFIED state.

Details

In response to this:

Manual cherry-pick of #821 to 4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Copy Markdown
Contributor

[ART PR BUILD NOTIFIER]

Distgit: operator-lifecycle-manager
This PR has been included in build operator-lifecycle-manager-container-v4.13.0-202409120806.p0.g42d01eb.assembly.stream.el8.
All builds following this will include this PR.

@openshift-bot
Copy link
Copy Markdown
Contributor

[ART PR BUILD NOTIFIER]

Distgit: operator-registry
This PR has been included in build operator-registry-container-v4.13.0-202409120806.p0.g42d01eb.assembly.stream.el8.
All builds following this will include this PR.

@grokspawn
Copy link
Copy Markdown
Contributor

/cherrypick release-4.12

@openshift-cherrypick-robot
Copy link
Copy Markdown

@grokspawn: new pull request created: #866

Details

In response to this:

/cherrypick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.