OTA-1159: pkg/cvo: Render status.Failure in ReconciliationIssues#1041
Conversation
35df0d8 to
b47edd8
Compare
petr-muller
left a comment
There was a problem hiding this comment.
Let's do this over #1033. I admit I had a knee-jerk negative reaction to this b/c I am not too fond of Unwrap() APIs because the concept of "error is silently an opaque tree of errors that need to be interrogated essentially with pattern matching but in a language without pattern matching" is IMO super clunky.
But this is not a real API, and serializing the full tree can be really useful for the experimental purposes, and it makes the PR much simpler, concentrating the complexity in reconciliation_issues where it hurts nobody 👍 👍
Probably worth doing something to stabile-sort children. Possibly
doing something to get more UpdateError / Manifest. I haven't done
either of those yet though.
I think sorting would be great to have to avoid churning status when we see same set of errors. Digging deeper for UpdateError / Manifest can wait for a followup, if needed, IMO.
We should probably also make sure we fit the resulting json into a condition message? Although maybe not needed, b/c that ClusterOperatorStatusCondition does not have any constraint:
...unlike metav1.Condition:
Walk the error's tree using Go's [1]:
Unwrap() error
Unwrap() []error
APIs. Also include Errors(), because that's what our
k8s.io/apimachinery implementation currently uses:
$ git --no-pager grep '^func.* [A-Z].*) \[]error {' -- vendor
vendor/k8s.io/apimachinery/pkg/runtime/error.go:func (e *strictDecodingError) Errors() []error {
vendor/k8s.io/apimachinery/pkg/runtime/helper.go:func DecodeList(objects []Object, decoders ...Decoder) []error {
vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:func (agg aggregate) Errors() []error {
vendor/k8s.io/client-go/tools/clientcmd/validation.go:func (e errConfigurationInvalid) Errors() []error {
I'm stable-sorting the children so consumers don't have to worry about
confusing users by flipping things around depending on which worker
goroutine happened to bump into trouble with a manifest first.
I'm using the meta.v1 length limit [2], even though that technically
doesn't apply to ClusterOperatorStatusCondition [3], because it's
unlikely that the signal-to-noise in such a long message is high
enough for users to be able to find actionable information. And
failing to write the ClusterVersion resource because we've overrun an
object-size cap wouldn't be much help to users either.
[1]: https://pkg.go.dev/errors#pkg-overview
[2]: https://github.com/kubernetes/apimachinery/blob/e696ec55a32e2177cd5f2fe1ded91d9485aa5c64/pkg/apis/meta/v1/types.go#L1575-L1577
[3]: https://github.com/openshift/api/blob/1e963d8dc4663f4f004f44fd58459381a771bdb5/config/v1/types_cluster_operator.go#L149-L153
b47edd8 to
31e2bca
Compare
I made an attempt at this too.
Sure, I made an attempt at this as well. |
|
@wking: This pull request references OTA-1159 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Known flake Hypershift jobs are busted right now because of permissions, lets not wait for them; all new code is behind a feature gate. /override ci/prow/e2e-hypershift This is not a user-facing change and no sensitive logic is changed; we only add an experimental pseudo-api that we will eventually remove, so there is nothing to test (and definitely nothing that needs pre-merge testing). |
|
@petr-muller: Overrode contexts on behalf of petr-muller: ci/prow/e2e-hypershift, ci/prow/e2e-hypershift-conformance, ci/prow/unit DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build cluster-version-operator-container-v4.16.0-202404100642.p0.g063a87b.assembly.stream.el9 for distgit cluster-version-operator. |
Walk the error's tree using Go's:
APIs. Also include
Errors(), because that's what ourk8s.io/apimachineryimplementation currently uses:Probably worth doing something to stabile-sort children. Possibly doing something to get more
UpdateError/Manifest. I haven't done either of those yet though.