-
Notifications
You must be signed in to change notification settings - Fork 216
Bug 1762920: install: add alerts for cluster-version-operator #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
the `manifests` directory on the bootstrap is used by the cluster-bootstrap to push to the cluster. `servicemonitor` for cvo was added by openshift#214 `servicemonitor` api is created by the cluster-monitoring-operator and therefore this causes the bootstrapping to get stuck until we get the monitoring operator running. This skips the `servicemonitor` in the bootstrap render as it is not required for the bootstrap cvo pod.
…ling Cherry-picked from fad0688 (pkg/cvo/metrics/go: the cluster operators report Degraded and not Failing, 2019-08-07, openshift#232) and manually removed the import shuffling from metrics.go.
Adds alerts for cluster-version-operator and cluster operators * `ClusterVersionOperatorDown` This alert is fired when cluster-version-operator is not providing any metrics. Serverity is critical as upgrades will not work and the clusters can drift from expected state. * `ClusterOperatorDegraded` This alert is fired when the cluster operator is degraded. This is important as the cluster might be in an unacceptable state for produciton cluster, for example, using emptyDir for storage backend for registry. Severity is critical as degraded operator implies the operands are in a state that is not correct for the cluster. * `ClusterOperatorDown` This alert fires when a cluster operator is not up ie cluster_operator_up is 0. This means that the operator might not be reconcile the operands. * `ClusterOperatorFlapping` This alert fires when a cluster operator is flapping between up and down continously because of some weird condition.
|
@wking: This pull request references Bugzilla bug 1762920, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… for ServiceMonitor
|
@wking: This pull request references Bugzilla bug 1762920, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
And also #235 (like #234, linked from rhbz#1738527). |
|
/retest |
|
Hrm, integration test is checking out master and trying to merge my 4.1 branch? Maybe this is fall-out from my accidentally filing the PR against master and then fixing the base to be release-4.1? |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Closing in favor of #261 |
|
@wking: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Backports #232 to release-4.1. Also pulls in the precursor commits from #234, #221, and #214. I think that's all that's required, but I guess we'll see in CI ;).