From f99f4bb273143d4b9aa8a80e0a14070f4dd514e1 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Wed, 1 Sep 2021 21:29:33 -0700 Subject: [PATCH] config/v1/types_cluster_operator: Explain that conditions cover components Some operators have no configured operands (e.g. the bare-metal operator on non-metal platforms, or the image-registry operator when the admins have configured managementState:Removed [1]). Some operators have many configured operands. Operators writing ClusterOperator conditions should not limit the conditions to speak about just the operator, or just a particular operand. Instead, operators should speak about the component as a whole. Is something about the service that the component provides gone? If so, Available=False (midnight admin page). Is something about the service that the component provides not hitting its service-level objectives? If so, Degraded=True (working-hours admin queue). Doesn't matter if that thing is "the operator is having trouble talking to the API to figure out how the operands are doing" or "the operator is super-happy, and sees that some operand is sad". That's all stuff that can be distinguished in the reason/message. [1]: https://docs.openshift.com/container-platform/4.8/registry/configuring_registry_storage/configuring-registry-storage-baremetal.html#registry-removed_configuring-registry-storage-baremetal --- config/v1/types_cluster_operator.go | 73 +++++++++++++++-------------- 1 file changed, 37 insertions(+), 36 deletions(-) diff --git a/config/v1/types_cluster_operator.go b/config/v1/types_cluster_operator.go index fe292b581a1..bbe3596793d 100644 --- a/config/v1/types_cluster_operator.go +++ b/config/v1/types_cluster_operator.go @@ -143,48 +143,49 @@ type ClusterOperatorStatusCondition struct { type ClusterStatusConditionType string const ( - // Available indicates that the operand (eg: openshift-apiserver for the - // openshift-apiserver-operator), is functional and available in the cluster. - // Available=False means at least part of the component is non-functional, - // and that the condition requires immediate administrator intervention. + // Available indicates that the component (operator and all configured operands) + // is functional and available in the cluster. Available=False means at least + // part of the component is non-functional, and that the condition requires + // immediate administrator intervention. OperatorAvailable ClusterStatusConditionType = "Available" - // Progressing indicates that the operator is actively rolling out new code, - // propagating config changes, or otherwise moving from one steady state to - // another. Operators should not report progressing when they are reconciling - // (without action) a previously known state. If the observed cluster state - // has changed and the operator/operand is reacting to it (scaling up for instance), - // Progressing should become true since it is moving from one steady state to - // another. + // Progressing indicates that the component (operator and all configured operands) + // is actively rolling out new code, propagating config changes, or otherwise + // moving from one steady state to another. Operators should not report + // progressing when they are reconciling (without action) a previously known + // state. If the observed cluster state has changed and the component is + // reacting to it (scaling up for instance), Progressing should become true + // since it is moving from one steady state to another. OperatorProgressing ClusterStatusConditionType = "Progressing" - // Degraded indicates that the operator's current state does not match its - // desired state over a period of time resulting in a lower quality of service. - // The period of time may vary by component, but a Degraded state represents - // persistent observation of a condition. As a result, a component should not - // oscillate in and out of Degraded state. A service may be Available even - // if its degraded. For example, your service may desire 3 running pods, but 1 - // pod is crash-looping. The service is Available but Degraded because it - // may have a lower quality of service. A component may be Progressing but - // not Degraded because the transition from one state to another does not - // persist over a long enough period to report Degraded. A service should not - // report Degraded during the course of a normal upgrade. A service may report - // Degraded in response to a persistent infrastructure failure that requires - // eventual administrator intervention. For example, if a control plane host - // is unhealthy and must be replaced. An operator should report Degraded if - // unexpected errors occur over a period, but the expectation is that all - // unexpected errors are handled as operators mature. + // Degraded indicates that the component (operator and all configured operands) + // does not match its desired state over a period of time resulting in a lower + // quality of service. The period of time may vary by component, but a Degraded + // state represents persistent observation of a condition. As a result, a + // component should not oscillate in and out of Degraded state. A component may + // be Available even if its degraded. For example, a component may desire 3 + // running pods, but 1 pod is crash-looping. The component is Available but + // Degraded because it may have a lower quality of service. A component may be + // Progressing but not Degraded because the transition from one state to + // another does not persist over a long enough period to report Degraded. A + // component should not report Degraded during the course of a normal upgrade. + // A component may report Degraded in response to a persistent infrastructure + // failure that requires eventual administrator intervention. For example, if + // a control plane host is unhealthy and must be replaced. A component should + // report Degraded if unexpected errors occur over a period, but the + // expectation is that all unexpected errors are handled as operators mature. OperatorDegraded ClusterStatusConditionType = "Degraded" - // Upgradeable indicates whether the operator is safe to upgrade based on the - // current cluster state. When status is False, the cluster-version operator - // will prevent the cluster from performing impacted updates unless forced. - // When set on ClusterVersion, the message will explain which updates (minor - // or patch) are impacted. When set on ClusterOperator, False will block - // minor OpenShift updates. The message field should contain a human - // readable description of what the administrator should do to allow the - // cluster or operator to successfully update. The cluster-version operator - // will allow updates when this condition is not False, including when it is + // Upgradeable indicates whether the component (operator and all configured + // operands) is safe to upgrade based on the current cluster state. When + // Upgradeable is False, the cluster-version operator will prevent the + // cluster from performing impacted updates unless forced. When set on + // ClusterVersion, the message will explain which updates (minor or patch) + // are impacted. When set on ClusterOperator, False will block minor + // OpenShift updates. The message field should contain a human readable + // description of what the administrator should do to allow the cluster or + // component to successfully update. The cluster-version operator will + // allow updates when this condition is not False, including when it is // missing, True, or Unknown. OperatorUpgradeable ClusterStatusConditionType = "Upgradeable" )