-
Notifications
You must be signed in to change notification settings - Fork 427
[release-4.16] OCPBUGS-33917: add alert data to upgrade health in oc adm upgrade status #1794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.16] OCPBUGS-33917: add alert data to upgrade health in oc adm upgrade status #1794
Conversation
this commit adds alerts that fire during the upgrade to `upgrade health` section. by default all the alerts that started firing after intiating the upgrade will appear in the upgrade health section we also have allowed alerts that will show alerts that firing before the upgrade was started. We've not added examples for alerts and are just testing it with unit tests. This is to reduce the input data in the examples.
Previously, the method iterated over URIs from a route but instead of searching for success it actually searched until first failures, which is against the point of iterating over possible URIs in the first place. Refactor the method so that it does not return on error, and return on success instead. Only return with an error if all URIs failed to yield a workable result. Slightly optimize the error for the common case where there is only a single URI to try, and shorten the string by using a namespace/name nnotation.
- skip alerts without required labels - add context on why we show the insight (started firing during update or is known to affect updates) - skip alerts with info level - explicitly mention the alert does not have a runbook - fix `shortDuration` for more cases, including `now`, add tests - handle also `message` annotation on alerts
|
@petr-muller: This pull request references Jira Issue OCPBUGS-33917, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@petr-muller: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/cc @PratikMahajan @wking @jan--f |
wking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean backports; nothing but some context difference vs. the dev-branch changes:
$ cherry-pick-diff origin/release-4.16..origin/pr/1794 origin/master7515161 -> 3955f41 upgrade status: polish alert insights
51487f9 -> b2c1189 inspectalerts: refactor getWithBearer to try all urls in route
2715cc4 -> 579e757 inspectalerts: use client-go wrappers for thanos call debugging
518c0e7 -> c3cee5d OCPBUGS-33896: status/inspect-alerts: handle non-200 by Thanos
d5f52ff -> 48c7595 add mock tests for alerts in oc adm upgrade status
10c41f5 -> 8b0dc50 add alerts to update health in oc adm upgrade status
751516199 -> 3955f41d7 `upgrade status`: polish alert insights
--- 751516199
+++ 3955f41d7
@@ -11,3 +11,3 @@
diff --git a/pkg/cli/admin/upgrade/status/alerts.go b/pkg/cli/admin/upgrade/status/alerts.go
-index cf66178d9..a9387b865 100644
+index 503c4ebf9..bc690baec 100644
--- a/pkg/cli/admin/upgrade/status/alerts.go
@@ -23,3 +23,3 @@
@@ -57,27 +58,72 @@ func parseAlertDataToInsights(alertData AlertData, startedAt time.Time) []update
- var updateInsights []updateInsight
+ var updateInsights []updateInsight = []updateInsight{}
@@ -102,3 +102,3 @@
diff --git a/pkg/cli/admin/upgrade/status/alerts_test.go b/pkg/cli/admin/upgrade/status/alerts_test.go
-index db68ab2a0..654976165 100644
+index bb79a3e8b..be25cd932 100644
--- a/pkg/cli/admin/upgrade/status/alerts_test.go
d5f52ff69 -> 48c75952f add mock tests for alerts in oc adm upgrade status
--- d5f52ff69
+++ 48c75952f
@@ -269,6 +269,6 @@
diff --git a/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.detailed-output b/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.detailed-output
-index 6225bf52a..a7381ef27 100644
+index 6c3c5bcc0..1c6d86694 100644
--- a/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.detailed-output
+++ b/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.detailed-output
-@@ -33,3 +33,21 @@ Message: Cluster Operator machine-config is unavailable (MachineConfigController
+@@ -34,3 +34,21 @@ Message: Cluster Operator machine-config is unavailable (MachineConfigController
Resources:
@@ -295,6 +295,6 @@
diff --git a/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.output b/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.output
-index 5289e9e55..c40a74b8c 100644
+index 155bfa393..f230be862 100644
--- a/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.output
+++ b/pkg/cli/admin/upgrade/status/examples/4.15.0-ec2-unavailable-mco-20m.output
-@@ -25,7 +25,9 @@ ip-10-0-4-159.us-east-2.compute.internal Outdated Pending 4.14.0-rc.3
+@@ -26,7 +26,9 @@ ip-10-0-4-159.us-east-2.compute.internal Outdated Pending 4.14.0-rc.3
ip-10-0-99-40.us-east-2.compute.internal Outdated Pending 4.14.0-rc.3 ?
@@ -311,3 +311,3 @@
diff --git a/pkg/cli/admin/upgrade/status/status.go b/pkg/cli/admin/upgrade/status/status.go
-index ad24f7e14..d79113d0f 100644
+index b54e530b3..a56e60459 100644
--- a/pkg/cli/admin/upgrade/status/status.go
@@ -322,3 +322,3 @@
-@@ -283,7 +284,6 @@ func (o *options) Run(ctx context.Context) error {
+@@ -280,7 +281,6 @@ func (o *options) Run(ctx context.Context) error {
if err := json.Unmarshal(alertBytes, &alertData); err != nil {
10c41f526 -> 8b0dc5057 add alerts to update health in oc adm upgrade status
--- 10c41f526
+++ 8b0dc5057
@@ -277,3 +277,3 @@
diff --git a/pkg/cli/admin/upgrade/status/health.go b/pkg/cli/admin/upgrade/status/health.go
-index ef047b0c4..07e895352 100644
+index 8d77250b6..24c23b5e4 100644
--- a/pkg/cli/admin/upgrade/status/health.go
@@ -309,3 +309,3 @@
diff --git a/pkg/cli/admin/upgrade/status/status.go b/pkg/cli/admin/upgrade/status/status.go
-index 0612a0745..ad24f7e14 100644
+index 82d0de1c4..b54e530b3 100644
--- a/pkg/cli/admin/upgrade/status/status.go
@@ -371,3 +371,3 @@
if err != nil {
-@@ -245,6 +266,27 @@ func (o *options) Run(ctx context.Context) error {
+@@ -242,6 +263,27 @@ func (o *options) Run(ctx context.Context) error {
}/lgtm
|
pre-merge verified |
|
@evakhoni can you slap the |
|
np. |
|
/label backport-risk-assessed |
simonpasquier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, simonpasquier, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, simonpasquier, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@petr-muller: Jira Issue OCPBUGS-33917: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-33917 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build ose-tools-container-v4.16.0-202406042036.p0.gd9c3227.assembly.stream.el9 for distgit ose-tools. |
Replaces automated #1771 because multiple PRs were needed for OCPBUGS-33896.
Backports #1771 #1782 and #1787 to release 4.16:
status/inspect-alerts: handle non-200 by ThanosgetWithBearerto try all urls in routeupgrade status: polish alert insights