Skip to content

Commit 56609ae

Browse files
test: Don't allow alerts to fire during a test run
The alert test has not been working and a PR was able to merge that was firing alerts during the test run. Fix the test logic based on: 1. Remove the filter on KubeAPILatencyHigh - no longer needed in 4.6+ 2. Remove the filter on KubePodCrashLooping - no longer needed in 4.6+ 3. Sort the metric by number of seconds the alert was firing The bug that allowed the merge was the filter for KubePodCrashLooping, to omit a series we must use `unless X`. If we reintroduce a filter we should use that syntax.
1 parent c822846 commit 56609ae

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

test/extended/prometheus/prometheus.go

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,10 +65,9 @@ var _ = g.Describe("[sig-instrumentation][Late] Alerts", func() {
6565
testDuration := exutil.DurationSinceStartInSeconds().String()
6666

6767
tests := map[string]bool{
68-
// Checking Watchdog alert state is done in "should have a Watchdog alert in firing state".
69-
// TODO: remove KubePodCrashLooping subtraction logic once https://bugzilla.redhat.com/show_bug.cgi?id=1842002
70-
// is fixed, but for now we are ignoring KubePodCrashLooping alerts in the openshift-kube-controller-manager namespace.
71-
fmt.Sprintf(`count_over_time(ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh",alertstate="firing",severity!="info"}[%[1]s]) - count_over_time(ALERTS{alertname="KubePodCrashLooping",namespace="openshift-kube-controller-manager",alertstate="firing",severity!="info"}[%[1]s]) >= 1`, testDuration): false,
68+
// Invariant: No alerts should have fired during the test run except the known alerts
69+
// Returns number of seconds the alerts were firing
70+
fmt.Sprintf(`sort_desc(count_over_time(ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured",alertstate="firing",severity!="info"}[%[1]s:1s]) > 0)`, testDuration): false,
7271
}
7372
err := helper.RunQueries(tests, oc, ns, execPod.Name, url, bearerToken)
7473
o.Expect(err).NotTo(o.HaveOccurred())

0 commit comments

Comments
 (0)