Skip to content

OTA-1159: pkg/cvo: Render status.Failure in ReconciliationIssues#1041

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
wking:ReconciliationIssues-content
Apr 9, 2024
Merged

OTA-1159: pkg/cvo: Render status.Failure in ReconciliationIssues#1041
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
wking:ReconciliationIssues-content

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented Mar 30, 2024

Walk the error's tree using Go's:

  Unwrap() error
  Unwrap() []error

APIs. Also include Errors(), because that's what our k8s.io/apimachinery implementation currently uses:

$ git --no-pager grep '^func.* [A-Z].*) \[]error {' -- vendor
vendor/k8s.io/apimachinery/pkg/runtime/error.go:func (e *strictDecodingError) Errors() []error {
vendor/k8s.io/apimachinery/pkg/runtime/helper.go:func DecodeList(objects []Object, decoders ...Decoder) []error {
vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:func (agg aggregate) Errors() []error {
vendor/k8s.io/client-go/tools/clientcmd/validation.go:func (e errConfigurationInvalid) Errors() []error {

Probably worth doing something to stabile-sort children. Possibly doing something to get more UpdateError / Manifest. I haven't done either of those yet though.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 30, 2024
@wking wking force-pushed the ReconciliationIssues-content branch 2 times, most recently from 35df0d8 to b47edd8 Compare March 30, 2024 17:00
Copy link
Copy Markdown
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this over #1033. I admit I had a knee-jerk negative reaction to this b/c I am not too fond of Unwrap() APIs because the concept of "error is silently an opaque tree of errors that need to be interrogated essentially with pattern matching but in a language without pattern matching" is IMO super clunky.

But this is not a real API, and serializing the full tree can be really useful for the experimental purposes, and it makes the PR much simpler, concentrating the complexity in reconciliation_issues where it hurts nobody 👍 👍

Probably worth doing something to stabile-sort children. Possibly
doing something to get more UpdateError / Manifest. I haven't done
either of those yet though.

I think sorting would be great to have to avoid churning status when we see same set of errors. Digging deeper for UpdateError / Manifest can wait for a followup, if needed, IMO.

We should probably also make sure we fit the resulting json into a condition message? Although maybe not needed, b/c that ClusterOperatorStatusCondition does not have any constraint:

https://github.com/openshift/api/blob/1e963d8dc4663f4f004f44fd58459381a771bdb5/config/v1/types_cluster_operator.go#L149-L153

...unlike metav1.Condition:

https://github.com/kubernetes/apimachinery/blob/e696ec55a32e2177cd5f2fe1ded91d9485aa5c64/pkg/apis/meta/v1/types.go#L1575-L1577

Comment thread pkg/cvo/reconciliation_issues.go Outdated
Comment thread pkg/cvo/reconciliation_issues.go Outdated
Walk the error's tree using Go's [1]:

  Unwrap() error
  Unwrap() []error

APIs.  Also include Errors(), because that's what our
k8s.io/apimachinery implementation currently uses:

  $ git --no-pager grep '^func.* [A-Z].*) \[]error {' -- vendor
  vendor/k8s.io/apimachinery/pkg/runtime/error.go:func (e *strictDecodingError) Errors() []error {
  vendor/k8s.io/apimachinery/pkg/runtime/helper.go:func DecodeList(objects []Object, decoders ...Decoder) []error {
  vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:func (agg aggregate) Errors() []error {
  vendor/k8s.io/client-go/tools/clientcmd/validation.go:func (e errConfigurationInvalid) Errors() []error {

I'm stable-sorting the children so consumers don't have to worry about
confusing users by flipping things around depending on which worker
goroutine happened to bump into trouble with a manifest first.

I'm using the meta.v1 length limit [2], even though that technically
doesn't apply to ClusterOperatorStatusCondition [3], because it's
unlikely that the signal-to-noise in such a long message is high
enough for users to be able to find actionable information.  And
failing to write the ClusterVersion resource because we've overrun an
object-size cap wouldn't be much help to users either.

[1]: https://pkg.go.dev/errors#pkg-overview
[2]: https://github.com/kubernetes/apimachinery/blob/e696ec55a32e2177cd5f2fe1ded91d9485aa5c64/pkg/apis/meta/v1/types.go#L1575-L1577
[3]: https://github.com/openshift/api/blob/1e963d8dc4663f4f004f44fd58459381a771bdb5/config/v1/types_cluster_operator.go#L149-L153
@wking wking force-pushed the ReconciliationIssues-content branch from b47edd8 to 31e2bca Compare April 9, 2024 02:20
@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 9, 2024

I think sorting would be great...

Done with b47edd8 -> 31e2bca.

Digging deeper for UpdateError / Manifest can wait for a followup, if needed, IMO.

I made an attempt at this too.

We should probably also make sure we fit the resulting json into a condition message?

Sure, I made an attempt at this as well.

@wking wking changed the title pkg/cvo: Render status.Failure in ReconciliationIssues OTA-1159: pkg/cvo: Render status.Failure in ReconciliationIssues Apr 9, 2024
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 9, 2024

@wking: This pull request references OTA-1159 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

Walk the error's tree using Go's:

 Unwrap() error
 Unwrap() []error

APIs. Also include Errors(), because that's what our k8s.io/apimachinery implementation currently uses:

$ git --no-pager grep '^func.* [A-Z].*) \[]error {' -- vendor
vendor/k8s.io/apimachinery/pkg/runtime/error.go:func (e *strictDecodingError) Errors() []error {
vendor/k8s.io/apimachinery/pkg/runtime/helper.go:func DecodeList(objects []Object, decoders ...Decoder) []error {
vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:func (agg aggregate) Errors() []error {
vendor/k8s.io/client-go/tools/clientcmd/validation.go:func (e errConfigurationInvalid) Errors() []error {

Probably worth doing something to stabile-sort children. Possibly doing something to get more UpdateError / Manifest. I haven't done either of those yet though.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2024
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 9, 2024
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@petr-muller
Copy link
Copy Markdown
Member

petr-muller commented Apr 9, 2024

--- FAIL: TestRunGraph (1.05s)
  --- FAIL: TestRunGraph/mid-task_cancellation_with_work_in_queue_does_not_deadlock (0.01s) 

Known flake
/override ci/prow/unit

Hypershift jobs are busted right now because of permissions, lets not wait for them; all new code is behind a feature gate.

/override ci/prow/e2e-hypershift
/override ci/prow/e2e-hypershift-conformance

This is not a user-facing change and no sensitive logic is changed; we only add an experimental pseudo-api that we will eventually remove, so there is nothing to test (and definitely nothing that needs pre-merge testing).
/label no-qe

@openshift-ci openshift-ci Bot added the no-qe Allows PRs to merge without qe-approved label label Apr 9, 2024
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2024

@petr-muller: Overrode contexts on behalf of petr-muller: ci/prow/e2e-hypershift, ci/prow/e2e-hypershift-conformance, ci/prow/unit

Details

In response to this:

--- FAIL: TestRunGraph (1.05s)
 --- FAIL: TestRunGraph/mid-task_cancellation_with_work_in_queue_does_not_deadlock (0.01s) 

/override ci/prow/unit

Hypershift jobs are busted right now because of permissions, lets not wait for them; all new code is behind a feature gate.

/override ci/prow/e2e-hypershift
/override ci/prow/e2e-hypershift-conformance

This is not a user-facing change and no sensitive logic is changed; we only add an experimental pseudo-api that we will eventually remove, so there is nothing to test (and definitely nothing that needs pre-merge testing).
/label no-qe

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2024

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 063a87b into openshift:master Apr 9, 2024
@wking wking deleted the ReconciliationIssues-content branch April 9, 2024 23:21
@openshift-bot
Copy link
Copy Markdown
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-version-operator-container-v4.16.0-202404100642.p0.g063a87b.assembly.stream.el9 for distgit cluster-version-operator.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. no-qe Allows PRs to merge without qe-approved label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants