#2216 had fixed a bug where the csv_succeeded was lost between deployment pod restarts. In those changes, a new metric e2e test was created that restarts (e.g. scales down/scales back up) to test whether the metric was retained after pod restarts:
When("the OLM pod restarts", func() {
BeforeEach(func() {
restartDeploymentWithLabel(c, "app=olm-operator")
})
It("CSV metric is preserved", func() {
Expect(getMetricsFromPod(c, getPodWithLabel(c, "app=olm-operator"))).To(
ContainElement(LikeMetric(WithFamily("csv_succeeded"), WithName(csv.Name), WithValue(1))),
)
})
})
It looks like the restartDeploymentWithLabel(...) doesn't have enough safeguards to verify that the restarted deployment is ready and available, leading to issues running this e2e test on more bloated clusters, as we're attempting to grab the metric before the metric endpoint has been setup and ready to serve traffic.
#2216 had fixed a bug where the csv_succeeded was lost between deployment pod restarts. In those changes, a new metric e2e test was created that restarts (e.g. scales down/scales back up) to test whether the metric was retained after pod restarts:
It looks like the
restartDeploymentWithLabel(...)doesn't have enough safeguards to verify that the restarted deployment is ready and available, leading to issues running this e2e test on more bloated clusters, as we're attempting to grab the metric before the metric endpoint has been setup and ready to serve traffic.