OCPBUGS-38859: add a test (that flakes) to detect faulty load balancer#29034
OCPBUGS-38859: add a test (that flakes) to detect faulty load balancer#29034openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
Conversation
2799f60 to
8afd3c8
Compare
|
Job Failure Risk Analysis for sha: 8afd3c8
|
|
Job Failure Risk Analysis for sha: f5ea35e
|
pkg/monitortests/kubeapiserver/faultyloadbalancer/monitortest.go
Outdated
Show resolved
Hide resolved
pkg/monitortests/kubeapiserver/faultyloadbalancer/monitortest.go
Outdated
Show resolved
Hide resolved
| lbType := unreachable.Condition.Locator.Keys[monitorapi.LocatorAPIUnreachableHostKey] | ||
| msg := fmt.Sprintf("client observed connection error, type: %s\nkube-apiserver: %s\n, client: %s\n", lbType, shutdown.String(), unreachable.String()) | ||
| junit.testCase.FailureOutput.Output = fmt.Sprintf("%s\n%s", junit.testCase.FailureOutput.Output, msg) | ||
| } |
There was a problem hiding this comment.
I think this test has to go out in flake only mode, otherwise there's a very good chance it shuts down all payloads. You'll need to refactor a little to return an additional junit testcase with no failure output to trigger it as a flake.
Sippy can then be used to find occurrences where it flaked. Once you know it's fully passing we can remove that.
There was a problem hiding this comment.
yes, it makes sense, it already detects faulty load balancer with AWS, I have added a new junit testcase with no failure output.
|
@tkashem: This pull request references Jira Issue OCPBUGS-38859, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/label acknowledge-critical-fixes-only (it does not fail yet, it flakes only so we can measure and fix, once the fixes are made, we can change it to a test that fails) |
pkg/monitortests/kubeapiserver/faultyloadbalancer/monitortest.go
Outdated
Show resolved
Hide resolved
| } | ||
| testCases = append(testCases, junit.testCase, flake) | ||
| } | ||
| return testCases |
There was a problem hiding this comment.
Am I reading this correctly that if this is not going to return anything if the test fully passes? From what I can see, on success junit.testCase is nil, because Evaluate is never called. Then we get here, and return an empty slice.
You need to return a success case as well otherwise your pass rates will be all out of wack.
fa053c0 to
8e36b1b
Compare
|
/lgtm I would just check that you can find passes and fails in the rehersals once they're in, but it looks good now. |
|
/hold |
|
/retest |
|
Job Failure Risk Analysis for sha: 8e36b1b
|
|
/test e2e-metal-ipi-ovn-kube-apiserver-rollout |
add test that detects faulty load balancer using the client metric and the apiserver graceful shutdown interval
|
juni test output under "Tests Passed": Skipped: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29034/pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2/1828905495277080576 junit test ouput: monitor tests log: junit output: |
| // b) we find at least one valid kube-apiserver shutdown interval, but no | ||
| // overlapping client error interval, this test is a pass | ||
| // c) we find at least one valid kube-apiserver shutdown interval, and at | ||
| // least one overlapping client error interval, this test is a flake |
There was a problem hiding this comment.
@dgoodwin I revised the junit test output for Pass, Skip, and Flake, let me know your thoughts, examples are here #29034 (comment)
There was a problem hiding this comment.
Cool, I don't recall seeing a skip in a monitortest yet.
|
/hold cancel |
|
/retest-required |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin, sanchezl, tkashem The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
|
/retest-required |
|
@tkashem: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Job Failure Risk Analysis for sha: cbc62c5
|
|
@tkashem: Jira Issue OCPBUGS-38859: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-38859 has been moved to the MODIFIED state. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-tests |


No description provided.