-
Notifications
You must be signed in to change notification settings - Fork 584
config/v1/types_cluster_operator: Explain that conditions cover components #1000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config/v1/types_cluster_operator: Explain that conditions cover components #1000
Conversation
bparees
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i put comments on the Upgradeable message in the other PR.
but the net new updates in this PR look reasonable to me.
4f64bd6 to
a3258e8
Compare
…nents Some operators have no configured operands (e.g. the bare-metal operator on non-metal platforms, or the image-registry operator when the admins have configured managementState:Removed [1]). Some operators have many configured operands. Operators writing ClusterOperator conditions should not limit the conditions to speak about just the operator, or just a particular operand. Instead, operators should speak about the component as a whole. Is something about the service that the component provides gone? If so, Available=False (midnight admin page). Is something about the service that the component provides not hitting its service-level objectives? If so, Degraded=True (working-hours admin queue). Doesn't matter if that thing is "the operator is having trouble talking to the API to figure out how the operands are doing" or "the operator is super-happy, and sees that some operand is sad". That's all stuff that can be distinguished in the reason/message. [1]: https://docs.openshift.com/container-platform/4.8/registry/configuring_registry_storage/configuring-registry-storage-baremetal.html#registry-removed_configuring-registry-storage-baremetal
a3258e8 to
f99f4bb
Compare
|
still looks reasonable to me but this seems worthy of a second set of eyes. /approve |
|
looks great to me, really helps clarify things for baremetal 👍 |
| type ClusterStatusConditionType string | ||
|
|
||
| const ( | ||
| // Available indicates that the operand (eg: openshift-apiserver for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case it helps with review, I personally find the output of:
$ git show --word-diff=coloreasier to read for this commit than GitHub's rendering.
|
@wking Thanks for providing this clarification for the various CO states. Here are some remaining questions specific to the cluster-baremetal-operator. The cluster-baremetal-operator (CBO) is responsible for deploying the metal3 pod when the Provisioning CR is present and when the platform is "Baremetal". Let us consider the follwing scenarios:
|
yes. CBO is doing exactly what it is expected to be doing in this scenario, so it is available=true. (note: if a CR is created, but CBO is actively ignoring the CR because of the platform type, it would be appropriate for the CBO to include some sort of message in its status conditions that make this clear, like "Available=true Reason=NonMetalPlatform Message=operator is functioning normally, although there is a CR, the CR is ignored because the platform is not metal")
I don't see why it would be available=false. The function that is expected to be provided is being provided. Available=true, Reason=NoProvisionRequested Message="CBO is running and responding to requests, however no provisioning is currently requested" It's only when a CR exists but the request can't be fulfilled that available=false would make sense. That is the point at which the CBO(and its operands) are not providing the functionality it is expected to provide. (or in other cases where things are going wrong w/ the operator or operand) |
+1. We are also currently setting Disabled=True. Do we stop doing that?
+1 for this too. Since the operand wasn't running, we were setting Disabled=False, Available=False. As you pointed out, CBO is behaving exactly as expected, even when the CR is absent. So, we will go ahead and set Available=True with an appropriate Reason and not use the Disabled flag at all. |
it is up to you. It has no official api meaning or implications for upgrades/alerts/etc, so you can set it or not set it as you like. |
|
/lgtm |
|
/hold |
|
@wking let's give it another day or two in case anyone else cares enough to weigh in, and then i'd say you can remove the hold. |
|
/lgtm |
|
@awolffredhat: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: awolffredhat, bparees, sadasu, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
Builds on #995; consider reviewing that one first.
Some operators have no configured operands (e.g. the bare-metal operator on non-metal platforms, or the image-registry operator when the admins have configured
managementState:Removed). Some operators have many configured operands. Operators writing ClusterOperator conditions should not limit the conditions to speak about just the operator, or just a particular operand. Instead, operators should speak about the component as a whole. Is something about the service that the component provides gone? If so,Available=False(midnight admin page). Is something about the service that the component provides not hitting its service-level objectives? If so,Degraded=True(working-hours admin queue). Doesn't matter if that thing is "the operator is having trouble talking to the API to figure out how the operands are doing" or "the operator is super-happy, and sees that some operand is sad". That's all stuff that can be distinguished in thereason/message.