Skip to content

Conversation

@benjaminapetersen
Copy link
Contributor

Extracted a bunch of operator status updates out into separate named functions. Was a bit pedantic about naming, but hopefully it helps understand the flow and see if we can tease out any bugs.

@jhadvig

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benjaminapetersen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2019
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 14, 2019

// do we need the if(configChanged) update bits?
operatorConfigOut, consoleConfigOut, configChanged, err := sync_v400(c, operatorConfig, consoleConfig)
_, _, _, err := sync_v400(c, operatorConfig, consoleConfig)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this change, I need to look into it a bit.

@@ -0,0 +1,167 @@
package operator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are pretty verbose / repetitive.


toUpdate = toUpdate || depChanged

// at this point, we should not be failing anymore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this now make sense? Thinking through it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does, I'm counting on it as well

@benjaminapetersen
Copy link
Contributor Author

Note that all of these ClusterOperator PRs will cause the Console operator to appear on the Cluster Settings page, under Cluster Operators:

screen shot 2019-02-14 at 3 38 53 pm

@benjaminapetersen
Copy link
Contributor Author

A test:
oc delete deployment console -n openshift-console
oc get events -n openshift-console-operator

shows:

LAST SEEN   TYPE      REASON                  KIND         MESSAGE
9m11s       Normal    Scheduled               Pod          Successfully assigned openshift-console-operator/console-operator-779cc894db-8b8vz to ip-10-0-133-68.us-west-1.compute.internal
9m3s        Normal    Pulling                 Pod          pulling image "quay.io/benjaminapetersen/console-operator:latest"
8m58s       Normal    Pulled                  Pod          Successfully pulled image "quay.io/benjaminapetersen/console-operator:latest"
8m58s       Normal    Created                 Pod          Created container
8m58s       Normal    Started                 Pod          Started container
9m11s       Normal    Killing                 Pod          Killing container with id cri-o://console-operator:Need to kill Pod
9m11s       Normal    SuccessfulCreate        ReplicaSet   Created pod: console-operator-779cc894db-8b8vz
8m39s       Normal    LeaderElection          ConfigMap    d4803d32-3097-11e9-a364-0a580a81020b became leader
8m39s       Warning   StatusNotFound          Deployment   Unable to determine current operator status for console
8m39s       Normal    OperatorStatusChanged   Deployment   Status for operator console changed: Failing set to False (""),Available set to True (""),Progressing set to True (""),status.relatedObjects changed from [] to [{"operator.openshift.io" "consoles" "" "console"} {"config.openshift.io" "consoles" "" "console"} {"oauth.openshift.io" "oauthclients" "" "console"} {"" "namespaces" "" "openshift-console-operator"} {"" "namespaces" "" "openshift-console"}]
8m39s       Normal    OperatorStatusChanged   Deployment   Status for operator console changed: Progressing changed from True to False (""),status.relatedObjects changed from [] to [{"operator.openshift.io" "consoles" "" "console"} {"config.openshift.io" "consoles" "" "console"} {"oauth.openshift.io" "oauthclients" "" "console"} {"" "namespaces" "" "openshift-console-operator"} {"" "namespaces" "" "openshift-console"}]
34s         Normal    DeploymentCreated       Deployment   Created Deploymentapps./console -n openshift-console because it was missing
34s         Normal    OperatorStatusChanged   Deployment   Status for operator console changed: Progressing changed from False to True ("")
34s         Normal    OperatorStatusChanged   Deployment   Status for operator console changed: Progressing changed from True to False ("")
34s         Normal    OperatorStatusChanged   Deployment   Status for operator console changed: Available changed from True to False ("NoDeploymentPodsAvailableOnAnyNode.")
32s         Normal    OperatorStatusChanged   Deployment   Status for operator console changed: Available changed from False to True ("")

Which is interesting, and
oc describe clusteroperator console
Will result in:

Name:         console
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-02-14T20:34:04Z
  Generation:          1
  Resource Version:    113010
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/console
  UID:                 de84e7e6-3097-11e9-968a-02a640b76a0a
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-02-14T20:42:24Z
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-02-14T20:42:09Z
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-02-14T20:34:04Z
    Status:                False
    Type:                  Failing
  Extension:               <nil>
  Related Objects:
    Group:     operator.openshift.io
    Name:      console
    Resource:  consoles
    Group:     config.openshift.io
    Name:      console
    Resource:  consoles
    Group:     oauth.openshift.io
    Name:      console
    Resource:  oauthclients
    Group:
    Name:      openshift-console-operator
    Resource:  namespaces
    Group:
    Name:      openshift-console
    Resource:  namespaces
  Versions:    <nil>
Events:        <none>

Which seems promising.

@jhadvig
Copy link
Member

jhadvig commented Feb 14, 2019

/retest

1 similar comment
@benjaminapetersen
Copy link
Contributor Author

/retest

@benjaminapetersen
Copy link
Contributor Author

Fail message:

level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised"
level=info msg="Consuming \"Install Config\" from target directory"
level=info msg="Creating cluster..."
level=info msg="Waiting up to 30m0s for the Kubernetes API..."
level=info msg="API v1.12.4+bb5f71c up"
level=info msg="Waiting up to 30m0s for the bootstrap-complete event..."
level=warning msg="RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 4768"
level=info msg="Destroying the bootstrap resources..."
level=info msg="Waiting up to 30m0s for the cluster to initialize..."
level=fatal msg="failed to initialize the cluster: Cluster operator console has not yet reported success"

@benjaminapetersen
Copy link
Contributor Author

/retest


toUpdate = toUpdate || depChanged

// at this point, we should not be failing anymore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does, I'm counting on it as well


// but we may be in a transitional state, if any of the above resources changed
if toUpdate {
co.operatorStatusProgressing(operatorConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was debugging a bit more and figured our that even if none of the resources are updated, here the toUpdate will be set to true, which shouldn't happen if nothing gets updated, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, if nothing is updated, we want it false. So that is definitely something worth tracking down.

@benjaminapetersen
Copy link
Contributor Author

Just pasting again....

So this error:

level=fatal msg="failed to initialize the cluster: Cluster operator console has not yet reported success"

https://github.com/openshift/cluster-version-operator/blob/9335e43c21efc9df7eed48f05585997087fc04db/pkg/cvo/internal/operatorstatus.go#L126

Means we need Available: true, Progressing: false, Failing: false in order for CVO to be happy & tell the installer its happy.

@benjaminapetersen
Copy link
Contributor Author

/retest

the brand fix PR went in.

@benjaminapetersen
Copy link
Contributor Author

Prefer #142 for tidiness (squashed verison of this)

@openshift-ci-robot
Copy link
Contributor

@benjaminapetersen: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws a7cca40 link /test e2e-aws
ci/prow/e2e-aws-operator a7cca40 link /test e2e-aws-operator

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@benjaminapetersen
Copy link
Contributor Author

benjaminapetersen commented Feb 21, 2019

Closing, #142 has a few other necessary components for operator status, such as renumbering manifests, etc.

@benjaminapetersen benjaminapetersen deleted the operator-status-revisions branch February 21, 2019 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants