Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can not we say that it probably means one or more operators are in degraded state, rather than saying not healthy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there could be other reasons besides unavailable/degraded operators to be
Failing=True. Although without #867 in place, it's hard to get Telemetry stats on how frequent the various modes are.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this is not customer facing documentation still makes me nervous about telling cluster is not healthy but we do not think this is more serious degraded condition of operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failingmay be more serious thanDegraded(e.g. it may be because a ClusterOperator isAvailable=False).Failingmay be less serious thanDegraded(e.g. we may be having trouble rolling out a peripheral alert rule). I'm not sayingFailingis not serious. I'm just dropping the apples-to-orangesDegradedcomparison.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it seems
Failingis more serious given it indicates that a cluster cannot reach its desired state, is unhealthy, and requires an administrator to intervene. AndDegraded's consequences may vary given the specific cluster operator andDegradedis only an indication that something may need investigation and adjustment. As long as the Operator is available, the Degraded condition does not cause user workload failure or application downtime. TheFailingdoes sound scarier in the documentation, I am not going to lie. And a failing cluster sounds more serious than a degraded operator.I agree with Trevor's statement:
The comparison seems to depend on the specific reasons for the conditions, and since we can't tell which reasons seem to be more frequent (https://github.com/openshift/cluster-version-operator/pull/905/files#r1116077435) we can drop the comparison. I would simply say
Failingis for reporting one group of things, andDegradedis for reporting another group of things, and both can be reported due to more or less serious issues, and thus the comparison can be dropped.It's also worth pointing out that if we end up modifying the comment, we can also modify the comment in the openshift/oc repository (https://github.com/openshift/oc/blob/master/pkg/cli/admin/upgrade/upgrade.go#L32-L35).