[release-4.13] OCPBUGS-38254: [CARRY] perform operator apiService certificate validity checks directly#836
Conversation
The issue for this is open for years and it's not super interesting to go debug it. The test threads will exit when the test process does. Having teardown fail means none of the other tests run for me. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com> Upstream-repository: operator-lifecycle-manager Upstream-commit: b683c28b31ad12f8acb8f7fd4d7beb85c74a751f
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ankitathomas The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@ankitathomas: This pull request references Jira Issue OCPBUGS-38254, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ankitathomas: This pull request references Jira Issue OCPBUGS-38254, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hi @ankitathomas , I got the below error when building cluster with this PR, could you help have a look when you get a chance? Thanks! https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-gcp-modern/1822827253600358400 Go compliance shim [6124] [rhel-8-golang-1.19][openshift-golang-builder]: invoking real go binary
# github.com/operator-framework/operator-lifecycle-manager/pkg/controller/install
vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/install/certresources.go:280:19: undefined: sets.New
Go compliance shim [6124] [rhel-8-golang-1.19][openshift-golang-builder]: Exited with: 1
make[1]: Leaving directory '/build'
make[1]: *** [Makefile:79: github.com/operator-framework/operator-lifecycle-manager/cmd/catalog] Error 1
make: *** [Makefile:67: build/olm] Error 2
error: build error: building at STEP "RUN make build/olm bin/cpb": while running runtime: exit status 2 |
a7bd263 to
840c367
Compare
|
Test pass, details: https://issues.redhat.com/browse/OCPBUGS-38254 |
840c367 to
493cfa9
Compare
|
/retest |
493cfa9 to
cb34d66
Compare
|
/retest |
1 similar comment
|
/retest |
|
/test e2e-gcp-olm-flaky |
|
/retest |
|
/test e2e-gcp-olm |
|
/test e2e-gcp-olm-flaky |
|
/test e2e-gcp-olm |
level=error msg=Cluster operator kube-scheduler Degraded is True with MissingStaticPodController_SyncError::StaticPods_Error: MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "openshift-kube-scheduler" in namespace: "openshift-kube-scheduler" for revision: 7 on node: "ci-op-di2mvmr9-ae41b-g86vb-master-1" didn't show up, waited: 3m0s
level=error msg=StaticPodsDegraded: pod/openshift-kube-scheduler-ci-op-di2mvmr9-ae41b-g86vb-master-1 container "kube-scheduler" is terminated: Completed:
level=error msg=StaticPodsDegraded: pod/openshift-kube-scheduler-ci-op-di2mvmr9-ae41b-g86vb-master-1 container "kube-scheduler-cert-syncer" is terminated: Error: go:169: Failed to watch *v1.Secret: failed to list *v1.Secret: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-scheduler/secrets?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
level=error msg=StaticPodsDegraded: W0816 10:51:46.493141 1 reflector.go:424] k8s.io/client-go@v0.26.10/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-scheduler/configmaps?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
... |
|
@ankitathomas e2e is failing because of another fetchCSV with the wrong ordering in the parameters (namespace <-> name). The flake (blocks a CRD upgrade that could cause data loss) is a perma fail. While I think the workload is still being protected (i.e. the upgrade doesn't go through), it seems the IP isn't getting updated with the error (at least not in the right place) ? Or being put in the right state after detecting it? |
|
|
||
| Eventually(func() error { | ||
| // Fetch the current csv | ||
| fetchedCSV, err := fetchCSV(crc, csv.Name, generatedNamespace.GetName(), csvSucceededChecker) |
There was a problem hiding this comment.
| fetchedCSV, err := fetchCSV(crc, csv.Name, generatedNamespace.GetName(), csvSucceededChecker) | |
| fetchedCSV, err := fetchCSV(crc, generatedNamespace.GetName(), csv.Name, csvSucceededChecker) |
Some of the e2e loops are a bit flakey, make them more robust * Retry on certain errors * Use Eventually() consistently (avoid wait.Poll()) * Clarify logging * Reduce some logging * Change csvExists() to waitForCsvToDelete(), as that's how it's used * Change awaitCSV() to fetchCSV() Signed-off-by: Todd Short <todd.short@me.com> Upstream-repository: operator-lifecycle-manager Upstream-commit: e5f7320f29ee4e9def114d6dcc1d22b4c7bb2b0d
Fix #3151 Remove non-InstallPlan related checks for this test. Also: * Clean up some looping log messages * Clean up some logging added when comments were converted These comments/logs are at the beginning of the test, and are also part of the test sequence, so they are redundant (and possibly confusing) Signed-off-by: Todd Short <todd.short@me.com> Upstream-repository: operator-lifecycle-manager Upstream-commit: 5299830576c8e8e6cd728b08a3a2e60f212ba387
…cks directly (#3217) * perform operator apiService certificate validity checks directly Signed-off-by: Ankita Thomas <ankithom@redhat.com> * use sets to track certs to install, revert to checking for installPlan timeouts after API availability checks, add service FQDN to list of hostnames. Signed-off-by: Ankita Thomas <ankithom@redhat.com> Upstream-repository: operator-lifecycle-manager Upstream-commit: 908da0c05363da40ad09ab774d9904b22aca7869 --------- Signed-off-by: Ankita Thomas <ankithom@redhat.com>
cb34d66 to
c7ddc4f
Compare
|
/retest |
1 similar comment
|
/retest |
|
/test e2e-gcp-olm-flaky |
|
It failed at: Summarizing 1 Failure:
[FAIL] CRD Versions [It] [FLAKE] blocks a CRD upgrade that could cause data loss
/go/src/github.com/openshift/operator-framework-olm/staging/operator-lifecycle-manager/test/e2e/crd_e2e_test.go:275
Ran 8 of 199 Specs in 305.024 seconds
FAIL! -- 7 Passed | 1 Failed | 2 Pending | 189 Skipped
--- FAIL: TestEndToEnd (305.04s) |
|
That is a known flake. |
|
/retest |
|
/test e2e-gcp-olm-flaky |
|
Hi @tmshort , I guess it needs the |
|
/label backport-risk-assessed |
|
@ankitathomas: Jira Issue OCPBUGS-38254: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-38254 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: operator-lifecycle-manager |
|
[ART PR BUILD NOTIFIER] Distgit: operator-registry |
|
/cherrypick release-4.12 |
|
@grokspawn: new pull request created: #866 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Manual cherry-pick of #821 to 4.14