Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions pkg/cvo/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ import (

const (
// ClusterStatusFailing is set on the ClusterVersion status when a cluster
// cannot reach the desired state. It is considered more serious than Degraded
// and indicates the cluster is not healthy.
// cannot reach the desired state. It indicates the cluster is not healthy.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can not we say that it probably means one or more operators are in degraded state, rather than saying not healthy?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there could be other reasons besides unavailable/degraded operators to be Failing=True. Although without #867 in place, it's hard to get Telemetry stats on how frequent the various modes are.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this is not customer facing documentation still makes me nervous about telling cluster is not healthy but we do not think this is more serious degraded condition of operators.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing may be more serious than Degraded (e.g. it may be because a ClusterOperator is Available=False). Failing may be less serious than Degraded (e.g. we may be having trouble rolling out a peripheral alert rule). I'm not saying Failing is not serious. I'm just dropping the apples-to-oranges Degraded comparison.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it seems Failing is more serious given it indicates that a cluster cannot reach its desired state, is unhealthy, and requires an administrator to intervene. And Degraded's consequences may vary given the specific cluster operator and Degraded is only an indication that something may need investigation and adjustment. As long as the Operator is available, the Degraded condition does not cause user workload failure or application downtime. The Failing does sound scarier in the documentation, I am not going to lie. And a failing cluster sounds more serious than a degraded operator.

I agree with Trevor's statement:

dropping the apples-to-oranges Degraded comparison

The comparison seems to depend on the specific reasons for the conditions, and since we can't tell which reasons seem to be more frequent (https://github.com/openshift/cluster-version-operator/pull/905/files#r1116077435) we can drop the comparison. I would simply say Failing is for reporting one group of things, and Degraded is for reporting another group of things, and both can be reported due to more or less serious issues, and thus the comparison can be dropped.


It's also worth pointing out that if we end up modifying the comment, we can also modify the comment in the openshift/oc repository (https://github.com/openshift/oc/blob/master/pkg/cli/admin/upgrade/upgrade.go#L32-L35).

ClusterStatusFailing = configv1.ClusterStatusConditionType("Failing")

// MaxHistory is the maximum size of ClusterVersion history. Once exceeded
Expand Down