Skip to content

Conversation

@abhinavdahiya
Copy link
Contributor

Backports #232 to release-4.1. Also pulls in the precursor commits from #221, and #214.

This one is smaller cherry-pick of #253 because that one needs changes in the CMO to move the servicemonitor to the CVO management.

This only moves the alerts using PrometheusRule skipping the ServiceMonitor

This is not a clean cherry-pick and definitely needs review.

/cc @wking @smarterclayton

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: This pull request references Bugzilla bug 1762920, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1762920: install: add alerts for cluster-version-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 22, 2019
the `manifests` directory on the bootstrap is used by the cluster-bootstrap to push to the cluster.
`servicemonitor` for cvo was added by openshift#214
`servicemonitor` api is created by the cluster-monitoring-operator and therefore this causes the bootstrapping to get stuck until we get the monitoring operator running.

This skips the `servicemonitor` in the bootstrap render as it is not required for the bootstrap cvo pod.
Adds alerts for cluster-version-operator and cluster operators

* `ClusterVersionOperatorDown`
  This alert is fired when cluster-version-operator is not providing any metrics. Serverity is critical as upgrades will not work and the clusters can drift from expected state.
* `ClusterOperatorDegraded`
  This alert is fired when the cluster operator is degraded. This is important as the cluster might be in an unacceptable state for produciton cluster, for example, using emptyDir for storage backend for registry. Severity is critical as
  degraded operator implies the operands are in a state that is not correct for the cluster.
* `ClusterOperatorDown`
  This alert fires when a cluster operator is not up ie cluster_operator_up is 0. This means that the operator might not be reconcile the operands.
* `ClusterOperatorFlapping`
  This alert fires when a cluster operator is flapping between up and down continously because of some weird condition.
@wking
Copy link
Member

wking commented Oct 23, 2019

/retest

@wking
Copy link
Member

wking commented Nov 12, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 12, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sdodson sdodson added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Nov 18, 2019
@openshift-merge-robot openshift-merge-robot merged commit 2d57b0a into openshift:release-4.1 Nov 18, 2019
@sdodson
Copy link
Member

sdodson commented Dec 20, 2019

/bugzilla refresh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants