Bug 1762920: install: add alerts for cluster-version-operator #261

abhinavdahiya · 2019-10-22T18:22:32Z

Backports #232 to release-4.1. Also pulls in the precursor commits from #221, and #214.

This one is smaller cherry-pick of #253 because that one needs changes in the CMO to move the servicemonitor to the CVO management.

This only moves the alerts using PrometheusRule skipping the ServiceMonitor

This is not a clean cherry-pick and definitely needs review.

/cc @wking @smarterclayton

openshift-ci-robot · 2019-10-22T18:22:37Z

@abhinavdahiya: This pull request references Bugzilla bug 1762920, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1762920: install: add alerts for cluster-version-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

the `manifests` directory on the bootstrap is used by the cluster-bootstrap to push to the cluster. `servicemonitor` for cvo was added by openshift#214 `servicemonitor` api is created by the cluster-monitoring-operator and therefore this causes the bootstrapping to get stuck until we get the monitoring operator running. This skips the `servicemonitor` in the bootstrap render as it is not required for the bootstrap cvo pod.

…ling

Adds alerts for cluster-version-operator and cluster operators * `ClusterVersionOperatorDown` This alert is fired when cluster-version-operator is not providing any metrics. Serverity is critical as upgrades will not work and the clusters can drift from expected state. * `ClusterOperatorDegraded` This alert is fired when the cluster operator is degraded. This is important as the cluster might be in an unacceptable state for produciton cluster, for example, using emptyDir for storage backend for registry. Severity is critical as degraded operator implies the operands are in a state that is not correct for the cluster. * `ClusterOperatorDown` This alert fires when a cluster operator is not up ie cluster_operator_up is 0. This means that the operator might not be reconcile the operands. * `ClusterOperatorFlapping` This alert fires when a cluster operator is flapping between up and down continously because of some weird condition.

wking · 2019-10-23T04:29:49Z

/retest

wking · 2019-11-12T05:51:05Z

/lgtm

openshift-ci-robot · 2019-11-12T05:51:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sdodson · 2019-12-20T14:49:55Z

/bugzilla refresh

openshift-ci-robot requested review from smarterclayton and wking October 22, 2019 18:22

abhinavdahiya added 3 commits October 22, 2019 12:49

pkg/cvo/metrics/go: the cluster operators report Degraded and not Fai…

9ea9199

…ling

abhinavdahiya force-pushed the release-4-1-pick-232 branch from f2ae221 to 2125d2a Compare October 22, 2019 19:49

wking mentioned this pull request Oct 22, 2019

Bug 1762920: install: add alerts for cluster-version-operator #253

Closed

openshift-ci-robot assigned wking Nov 12, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 12, 2019

sdodson added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Nov 18, 2019

openshift-merge-robot merged commit 2d57b0a into openshift:release-4.1 Nov 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1762920: install: add alerts for cluster-version-operator #261

Bug 1762920: install: add alerts for cluster-version-operator #261

Uh oh!

abhinavdahiya commented Oct 22, 2019

Uh oh!

openshift-ci-robot commented Oct 22, 2019

Uh oh!

wking commented Oct 23, 2019

Uh oh!

wking commented Nov 12, 2019

Uh oh!

openshift-ci-robot commented Nov 12, 2019

Uh oh!

sdodson commented Dec 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Bug 1762920: install: add alerts for cluster-version-operator #261

Bug 1762920: install: add alerts for cluster-version-operator #261

Uh oh!

Conversation

abhinavdahiya commented Oct 22, 2019

Uh oh!

openshift-ci-robot commented Oct 22, 2019

Uh oh!

wking commented Oct 23, 2019

Uh oh!

wking commented Nov 12, 2019

Uh oh!

openshift-ci-robot commented Nov 12, 2019

Uh oh!

sdodson commented Dec 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants