Operator status revisions squash #142

benjaminapetersen · 2019-02-15T21:11:25Z

Based on #139

Squashed various fixes
Refactored into cleaner function calls
Renumbered manifests to ensure the clusteroperator/console is created after the operator. This should avoid a race that may cause the clusteroperator/console to report failure status simply because the operator does not yet exist.

Some screenshots:

benjaminapetersen · 2019-02-15T21:12:05Z

pkg/console/operator/status.go

+	operatorsv1 "github.com/openshift/api/operator/v1"
+	"github.com/openshift/library-go/pkg/operator/v1helpers"
+)
+


Detailing out the purpose behind the status/conditions here to ensure we get it right.

benjaminapetersen · 2019-02-16T16:47:44Z

/retest

benjaminapetersen · 2019-02-16T20:18:06Z

I think tests are frozen right now on:

level=fatal msg="failed to initialize the cluster: Cluster operator machine-config is reporting a failure: Failed to resync 3.11.0-673-gadf12809-dirty because: Get https://172.30.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/controllerconfigs.machineconfiguration.openshift.io: dial tcp 172.30.0.1:443: connect: connection refused"

benjaminapetersen · 2019-02-16T20:18:26Z

/retest

benjaminapetersen · 2019-02-18T15:32:05Z

level=fatal msg="failed to initialize the cluster: Cluster operator machine-config is reporting a failure: Failed to resync 3.11.0-676-g745693cd-dirty because: error syncing: request declared a Content-Length of 483 but only wrote 0 bytes"

benjaminapetersen · 2019-02-18T15:32:11Z

/retest

benjaminapetersen · 2019-02-18T18:12:36Z

level=fatal msg="failed to initialize the cluster: Cluster operator network has not yet reported success"

level=fatal msg="failed to initialize the cluster: Cluster operator console has not yet reported success"

Network operator failed in one test, Console in the other. Not sure if flakes or related.

console and console-operator pods have no logs, apparently never came up.

benjaminapetersen · 2019-02-18T18:13:11Z

/retest

benjaminapetersen · 2019-02-18T20:16:32Z

no console-operator logs at all that run.

benjaminapetersen · 2019-02-18T20:16:37Z

/retest

benjaminapetersen · 2019-02-18T20:44:25Z

/retest

spadgett · 2019-02-19T01:52:07Z

/retest

benjaminapetersen · 2019-02-19T02:48:49Z

error: unable to read image registry.svc.ci.openshift.org/ci-op-8yj29n81/stable@sha256:0d6c76c0d202665f7a16b899f4c62d94c9ac8a14c7540c841a8e802a91775253: received unexpected HTTP status: 504 Gateway Time-out

benjaminapetersen · 2019-02-19T02:48:55Z

/retest

benjaminapetersen · 2019-02-19T05:11:41Z

/retest

benjaminapetersen · 2019-02-19T11:01:41Z

/retest

benjaminapetersen · 2019-02-19T13:24:34Z

/retest

benjaminapetersen · 2019-02-19T15:09:54Z

/retest

benjaminapetersen · 2019-02-19T16:15:20Z

/retest

benjaminapetersen · 2019-02-19T16:40:37Z

/assign @zherman0 @jhadvig @spadgett

I'll take some feedback. I think if tests finally pass at some point I'd rather not have to make changes & restart the whole process 😄

pkg/console/starter/starter.go

benjaminapetersen · 2019-02-21T18:48:28Z

rebased

pkg/console/operator/status.go

spadgett · 2019-02-21T20:42:24Z

pkg/console/operator/status.go

+// To use when another more specific status function is not sufficient.
+// examples:
+//  setStatusCondition(operatorConfig, Failing, True, "SyncLoopError", "Sync loop failed to complete successfully")
+func (c *consoleOperator) SetStatusCondition(operatorConfig *operatorsv1.Console, conditionType string, conditionStatus operatorsv1.ConditionStatus, conditionReason string, conditionMessage string) *operatorsv1.Console {


I don't see how this is better than calling v1helpers.SetOperatorCondition directly. It actually seems worse to me since it would be easy to mix up the order of the arguments.

I also don't see where this is used.

Its not used at this point, its been factored out.
I'll remove it.

It was a step to initially condense the inline noise:

if aBadThing { logrus.Errorf("Bad things are happening") // logic.... v1helpers.SetOperatorCondition(&operatorConfig.Status.Conditions, operatorsv1.OperatorCondition{ Type: BadType, Status: BadStatus, Reason: "ABadThing", Message: "a bad thing", LastTransitionTime: metav1.Now(), }) // oh my if more than one.... // do other logic } // to a one-liner setStatusCondition(operatorConfig, Failing, True, "SyncLoopError", "Sync loop failed to complete successfully")

That said, I agree, it was still too many things to take care of. Thats when I moved to

// one condition or 3, still one line... not 7,14,21, etc. co.ConditionABadThing(operatorConfig)

The right way to do this is probably for:

// operator.go ideally could deal with all the status stuff, when it calls sync, and not have it // scattered across multiple files: structuredThing, err := sync_v400() // that we could instead handle it all in one place in operator.go handleConditions(structuredThing, err)

Tech debt...

spadgett · 2019-02-21T20:43:44Z

pkg/console/operator/status.go

+	v1helpers.SetOperatorCondition(&operatorConfig.Status.Conditions, operatorsv1.OperatorCondition{
+		Type:               operatorsv1.OperatorStatusTypeFailing,
+		Status:             operatorsv1.ConditionFalse,
+		LastTransitionTime: metav1.Now(),


Ideally we'd add messages to all these, but OK as a follow on.

Failing:False is the desired state, along with Progressing:False and Available:True. I assumed if in the desired state no other information should be needed, but I'm open to adding info if we feel it is necessary or helpful.

I assumed if in the desired state no other information should be needed, but I'm open to adding info if we feel it is necessary or helpful.

I think it would make things more clear if we add a message saying things are good. Particularly because you have to think through a double negative like failing: false. The messages are displayed on the cluster settings page in the UI.

(For a follow on)

spadgett · 2019-02-21T20:45:07Z

pkg/console/operator/status.go

+	return operatorConfig
+}
+
+func (c *consoleOperator) ConditionResourceSyncSuccess(operatorConfig *operatorsv1.Console) *operatorsv1.Console {


How is this different than ConditionNotFailing?

It is not at this point. Initially I was considering putting the rest of this logic (sync_v400 line 110) in this function as it may have been necessary to set more than one status.

I may be willing to eliminate this wrapper as I believe it won't be a "set multiple statuses" kind of function.

pkg/console/operatorclient/operatorclient.go

benjaminapetersen · 2019-02-22T15:08:57Z

/retest

benjaminapetersen · 2019-02-22T15:20:11Z

/retest

spadgett · 2019-02-22T17:48:15Z

/retest

spadgett

/lgtm

spadgett · 2019-02-22T18:19:29Z

pkg/console/operator/status.go

+// To use when another more specific status function is not sufficient.
+// examples:
+//  setStatusCondition(operatorConfig, Failing, True, "SyncLoopError", "Sync loop failed to complete successfully")
+func (c *consoleOperator) SetStatusCondition(operatorConfig *operatorsv1.Console, conditionType string, conditionStatus operatorsv1.ConditionStatus, conditionReason string, conditionMessage string) *operatorsv1.Console {


We should remove if unused, but we can do that as a follow on if this passes CI.

spadgett · 2019-02-22T18:20:58Z

pkg/console/operator/sync_v400.go

+	// the operand is in a transitional state if any of the above resources changed
+	// or if we have not settled on the desired number of replicas
+	if toUpdate || actualDeployment.Status.ReadyReplicas != deploymentsub.ConsoleReplicas {
+		co.ConditionResourceSyncProgressing(operatorConfig, "Changes made during sync updates, additional sync expected.")


We might revisit this message, but OK for now.

Agree, its not the best message. Progressing is quite fast, so its unlikely to be seen, but still, can def revisit.

openshift-ci-robot · 2019-02-22T18:21:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benjaminapetersen, spadgett

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [benjaminapetersen,spadgett]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

spadgett · 2019-02-22T18:37:48Z

level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised"
level=info msg="Consuming \"Install Config\" from target directory"
level=info msg="Creating cluster..."
level=info msg="Waiting up to 30m0s for the Kubernetes API..."
level=fatal msg="waiting for Kubernetes API: context deadline exceeded"

/retest

benjaminapetersen · 2019-02-22T18:41:11Z

/retest

Died before generating any artifacts due to:
received unexpected HTTP status: 504 Gateway Time-out

spadgett · 2019-02-22T18:47:28Z

/retest

benjaminapetersen · 2019-02-22T19:56:35Z

woohoo, one set succeeded this time...

benjaminapetersen · 2019-02-22T20:49:54Z

/retest

spadgett · 2019-02-22T21:28:55Z

/retest

benjaminapetersen · 2019-02-22T22:54:33Z

Nice.

spadgett · 2019-02-22T23:30:57Z

level=error msg="\t* module.vpc.aws_route_table_association.route_net[3]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.3: timeout while waiting for state to become 'success' (timeout:

/retest

benjaminapetersen · 2019-02-22T23:39:44Z

Bah, looked like they passed.

spadgett · 2019-02-22T23:52:51Z

Bah, looked like they passed.

Yeah, they have to run again since another PR merged to master in between

spadgett · 2019-02-23T00:07:42Z

e2e-aws passed, waiting on e2e-aws-operator

spadgett · 2019-02-23T00:18:04Z

all green!

benjaminapetersen · 2019-02-23T00:24:29Z

Fantastic.

openshift-ci-robot requested review from enj and spadgett February 15, 2019 21:11

openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 15, 2019

benjaminapetersen commented Feb 15, 2019

View reviewed changes

benjaminapetersen mentioned this pull request Feb 15, 2019

Operator status revisions #139

Closed

benjaminapetersen force-pushed the operator-status-revisions-squash branch from 67b79d7 to 09284a8 Compare February 19, 2019 16:39

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 19, 2019

openshift-ci-robot assigned jhadvig and spadgett Feb 19, 2019

benjaminapetersen commented Feb 21, 2019

View reviewed changes

pkg/console/starter/starter.go Show resolved Hide resolved

benjaminapetersen commented Feb 21, 2019

View reviewed changes

pkg/console/starter/starter.go Show resolved Hide resolved

spadgett reviewed Feb 21, 2019

View reviewed changes

benjaminapetersen force-pushed the operator-status-revisions-squash branch from 0787a25 to fdc153a Compare February 21, 2019 20:59

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 21, 2019

Set Available:True on status when ManagementState:Removed

873f74b

benjaminapetersen force-pushed the operator-status-revisions-squash branch from fdc153a to 873f74b Compare February 21, 2019 21:14

<fixup>Log additional error

5e484de

spadgett approved these changes Feb 22, 2019

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2019

openshift-merge-robot merged commit cc814fa into openshift:master Feb 23, 2019

Operator status revisions squash #142

Operator status revisions squash #142

Uh oh!

Conversation

benjaminapetersen commented Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benjaminapetersen commented Feb 16, 2019

Uh oh!

benjaminapetersen commented Feb 16, 2019

Uh oh!

benjaminapetersen commented Feb 16, 2019

Uh oh!

benjaminapetersen commented Feb 18, 2019

Uh oh!

benjaminapetersen commented Feb 18, 2019

Uh oh!

benjaminapetersen commented Feb 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjaminapetersen commented Feb 18, 2019

Uh oh!

benjaminapetersen commented Feb 18, 2019

Uh oh!

benjaminapetersen commented Feb 18, 2019

Uh oh!

benjaminapetersen commented Feb 18, 2019

Uh oh!

spadgett commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

benjaminapetersen commented Feb 19, 2019

Uh oh!

Uh oh!

Uh oh!

benjaminapetersen commented Feb 21, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benjaminapetersen Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benjaminapetersen commented Feb 22, 2019

Uh oh!

benjaminapetersen commented Feb 22, 2019

Uh oh!

benjaminapetersen commented Feb 15, 2019 •

edited

Loading

benjaminapetersen commented Feb 18, 2019 •

edited

Loading

benjaminapetersen Feb 21, 2019 •

edited

Loading