Skip to content

run bundle: Add deployment status check when verifying installed operator.#3819

Merged
bharathi-tenneti merged 3 commits intooperator-framework:masterfrom
bharathi-tenneti:verifycsv
Sep 9, 2020
Merged

run bundle: Add deployment status check when verifying installed operator.#3819
bharathi-tenneti merged 3 commits intooperator-framework:masterfrom
bharathi-tenneti:verifycsv

Conversation

@bharathi-tenneti
Copy link
Copy Markdown
Contributor

@bharathi-tenneti bharathi-tenneti commented Aug 31, 2020

Description: Add deployment status check in DoCSVWait() function, to determine further root cause on why operator has failed to run, and report back to operator developer.

Copy link
Copy Markdown
Member

@estroz estroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the deployment polling function refactored into a separate function.

Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
@jmrodri jmrodri changed the title runbundle: Add deployment status check when verifying installed operator. run bundle: Add deployment status check when verifying installed operator. Sep 2, 2020
Copy link
Copy Markdown
Member

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits; A couple questions and then a suggestion to rename the method. Overall it looks correct.

Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment on lines +241 to +259
depErrors[depName] = err.Error()
break
}
for _, s := range dep.Status.Conditions {
if s.Type == appsv1.DeploymentAvailable && s.Status == "False" {
depErrors[depName] = s.Message
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get all deployment errors:

Suggested change
depErrors[depName] = err.Error()
break
}
for _, s := range dep.Status.Conditions {
if s.Type == appsv1.DeploymentAvailable && s.Status == "False" {
depErrors[depName] = s.Message
}
depErrors[depName] = err.Error()
} else {
for _, s := range dep.Status.Conditions {
if s.Type == appsv1.DeploymentAvailable && s.Status == corev1.ConditionFalse {
depErrors[depName] = s.Message
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, this was supposed to be continue. As we want to to skip to next deployment itself, in case of error with Get

Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated

return wait.PollImmediateUntil(time.Second, csvPhaseSucceeded, ctx.Done())
err := wait.PollImmediateUntil(time.Second, csvPhaseSucceeded, ctx.Done())
if err != nil && errors.Is(err, context.DeadlineExceeded) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this from a prior review: why check deployments/pods only when a timeout occurs? Shouldn't we print deployment/pod status when any error occurs? We can ignore "not found" errors in case something happened before the operator deployment was created.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford WDYT? I know you have mentioned to check only for deadline exceeded situation.

Copy link
Copy Markdown
Member

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few changes.

Comment thread internal/olm/client/client.go
Comment thread internal/olm/client/client_test.go Outdated
Comment thread internal/olm/client/client_test.go
Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
@@ -0,0 +1,13 @@
package client
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs the license header that's why it's failing travis.

Copy link
Copy Markdown
Member

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns were addressed. Travis is failing because of the missing license header. I do not need to re-review this for that change.

Comment thread internal/olm/client/client.go
Copy link
Copy Markdown
Member

@estroz estroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, although I'm concerned that the tests aren't actually testing the functionality added here. Client.printDeploymentErrors() should take an io.Writer so you can check that it collects and prints deployment and pod errors correctly.

Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client.go Outdated
log.Printf("failed to run operator, deployment not found for : %v\n", ds.Name)
continue
}
for _, s := range dep.Status.Conditions {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally you shouldn't interleave network calls and print statements; instead all errors should be collected, then printed together. Doing so is ok for now, but can you add a TODO to separate these two actions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, so modify these two functions to return errors back to DoCSVWait() instead of printing.?

Copy link
Copy Markdown
Member

@estroz estroz Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. printDeploymentErrors() should look something like (in pseudocode):

depErrs := make(map[string]string)
podErrsForDep := make(map[string]map[string]string)
for dep in deployements {
	for cond in dep.Status.Conditions {
		if isStatusNotAvailable(dep) {
			depErrs[dep.Name] = dep.Status.Reason
			podErrsForDep[dep.Name] = getPodErrs(dep)
			break
		}
	}
}

for depName, depReason in depErrs {
	printf("dep %q error: %s\n", depName, depReason)
	for podName, podErr in podErrsForDep[depName] {
		printf("\tpod %q error: %s\n", podName, podErr)
	}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO comments , and addressed rest of the suggestions.

Comment thread internal/olm/client/client.go Outdated
Comment thread internal/olm/client/client_test.go Outdated
Copy link
Copy Markdown
Member

@estroz estroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2020
@bharathi-tenneti bharathi-tenneti merged commit 9100552 into operator-framework:master Sep 9, 2020
return fmt.Errorf("error getting Pods: %v", err)
}
for _, p := range podList.Items {
if p.Status.Phase != corev1.PodSucceeded {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Succeeded means the Pod finished and exited. For the operator pods, they should be running all the time to monitor the CRs, so, the expected status should be Running, not the Succeeded.

for _, p := range podList.Items {
if p.Status.Phase != corev1.PodSucceeded {
for _, cs := range p.Status.ContainerStatuses {
if !cs.Ready {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, there are multi containers in a Pod.

@jianzhangbjz
Copy link
Copy Markdown
Contributor

Since this PR has been merged, I submit PR #3884 to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants