diff --git a/docs/dev/clusteroperator.md b/docs/dev/clusteroperator.md index 7f6d900823..b63f72cbb2 100644 --- a/docs/dev/clusteroperator.md +++ b/docs/dev/clusteroperator.md @@ -135,41 +135,3 @@ status: #### Version reporting during an upgrade When your operator begins rolling out a new version it must continue to report the previous operator version in its ClusterOperator status. While any of your operands are still running software from the previous version then you are in a mixed version state, and you should continue to report the previous version. As soon as you can guarantee you are not and will not run any old versions of your operands, you can update the operator version on your ClusterOperator status. - -### Conditions - -Refer [the godocs](https://godoc.org/github.com/openshift/api/config/v1#ClusterStatusConditionType) for conditions. - -In general, ClusterOperators should contain at least three core conditions: - -* `Progressing` must be true if the operator is actually making change to the operand. -The change may be anything: desired user state, desired user configuration, observed configuration, version update, etc. -If this is false, it means the operator is not trying to apply any new state. -If it remains true for an extended period of time, it suggests something is wrong in the cluster. It can probably wait until Monday. -* `Available` must be true if the operand is functional and available in the cluster at the level in status. -If this is false, it means there is an outage. Someone is probably getting paged. -* `Failing` should be true if the operator has encountered an error that is preventing it or its operand from working properly. -The operand may still be available, but intent may not have been fulfilled. -If this is true, it means that the operand is at risk of an outage or improper configuration. It can probably wait until the morning, but someone needs to look at it. - -The message reported for each of these conditions is important. All messages should start with a capital letter (like a sentence) and be written for an end user / admin to debug the problem. `Failing` should describe in detail (a few sentences at most) why the current controller is blocked. The detail should be sufficient for an engineer or support person to triage the problem. `Available` should convey useful information about what is available, and be a single sentence without punctuation. `Progressing` is the most important message because it is shown by default in the CLI as a column and should be a terse, human-readable message describing the current state of the object in 5-10 words (the more succinct the better). - -For instance, if the CVO is working towards 4.0.1 and has already successfully deployed 4.0.0, the conditions might be reporting: - -* `Failing` is false with no message -* `Available` is true with message `Cluster has deployed 4.0.0` -* `Progressing` is true with message `Working towards 4.0.1` - -If the controller reaches 4.0.1, the conditions might be: - -* `Failing` is false with no message -* `Available` is true with message `Cluster has deployed 4.0.1` -* `Progressing` is false with message `Cluster version is 4.0.1` - -If an error blocks reaching 4.0.1, the conditions might be: - -* `Failing` is true with a detailed message `Unable to apply 4.0.1: could not update 0000_70_network_deployment.yaml because the resource type NetworkConfig has not been installed on the server.` -* `Available` is true with message `Cluster has deployed 4.0.0` -* `Progressing` is true with message `Unable to apply 4.0.1: a required object is missing` - -The progressing message is the first message a human will see when debugging an issue, so it should be terse, succinct, and summarize the problem well. The failing message can be more verbose. Start with simple, easy to understand messages and grow them over time to capture more detail.