-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Flake and improve alert tests #27559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flake and improve alert tests #27559
Conversation
We can parse this in sippy with we extract metadata.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin, stbenjam The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@dgoodwin: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/override ci/prow/e2e-gcp-ovn-upgrade The job just missed on pod sandbox. |
|
@dgoodwin: Overrode contexts on behalf of dgoodwin: ci/prow/e2e-gcp-ovn-upgrade DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
TRT has decided with input from @deads2k that we're ok with moving the generic alert backstop tests to always flake, as the signal is not as high value. The post-upgrade alert test fails 30% of the time globally, and the post-conformance variant is also not stellar, and both disproportionately affect certain NURPs.
In sippy as of openshift/sippy#685 we will begin tracking the reasons this test is flaking in the db by storing metadata about what alerts fired, so we'll have good insight into what may be causing issues, provided somebody remembers to look.
PR also re-introduces the refactor to merge the two backstop alert tests into a common code path, which they once were forked from. We thought this caused a regression, turned out it was something else, so this change should be good provided we start getting good payloads again after #27553.
Also improves output of the alert tests for better parsing in sippy. Clearly identifies if we accept or reject the failure, and if there is an associated bug.