run bundle: Add deployment status check when verifying installed operator.#3819
Conversation
estroz
left a comment
There was a problem hiding this comment.
I'd like to see the deployment polling function refactored into a separate function.
jmrodri
left a comment
There was a problem hiding this comment.
Some nits; A couple questions and then a suggestion to rename the method. Overall it looks correct.
1efb1d7 to
a80291a
Compare
| depErrors[depName] = err.Error() | ||
| break | ||
| } | ||
| for _, s := range dep.Status.Conditions { | ||
| if s.Type == appsv1.DeploymentAvailable && s.Status == "False" { | ||
| depErrors[depName] = s.Message | ||
| } |
There was a problem hiding this comment.
Get all deployment errors:
| depErrors[depName] = err.Error() | |
| break | |
| } | |
| for _, s := range dep.Status.Conditions { | |
| if s.Type == appsv1.DeploymentAvailable && s.Status == "False" { | |
| depErrors[depName] = s.Message | |
| } | |
| depErrors[depName] = err.Error() | |
| } else { | |
| for _, s := range dep.Status.Conditions { | |
| if s.Type == appsv1.DeploymentAvailable && s.Status == corev1.ConditionFalse { | |
| depErrors[depName] = s.Message | |
| } | |
| } |
There was a problem hiding this comment.
yea, this was supposed to be continue. As we want to to skip to next deployment itself, in case of error with Get
|
|
||
| return wait.PollImmediateUntil(time.Second, csvPhaseSucceeded, ctx.Done()) | ||
| err := wait.PollImmediateUntil(time.Second, csvPhaseSucceeded, ctx.Done()) | ||
| if err != nil && errors.Is(err, context.DeadlineExceeded) { |
There was a problem hiding this comment.
Coming back to this from a prior review: why check deployments/pods only when a timeout occurs? Shouldn't we print deployment/pod status when any error occurs? We can ignore "not found" errors in case something happened before the operator deployment was created.
There was a problem hiding this comment.
@joelanford WDYT? I know you have mentioned to check only for deadline exceeded situation.
| @@ -0,0 +1,13 @@ | |||
| package client | |||
There was a problem hiding this comment.
This file needs the license header that's why it's failing travis.
jmrodri
left a comment
There was a problem hiding this comment.
My concerns were addressed. Travis is failing because of the missing license header. I do not need to re-review this for that change.
6338678 to
0aba980
Compare
0aba980 to
1fdf445
Compare
estroz
left a comment
There was a problem hiding this comment.
This looks fine, although I'm concerned that the tests aren't actually testing the functionality added here. Client.printDeploymentErrors() should take an io.Writer so you can check that it collects and prints deployment and pod errors correctly.
| log.Printf("failed to run operator, deployment not found for : %v\n", ds.Name) | ||
| continue | ||
| } | ||
| for _, s := range dep.Status.Conditions { |
There was a problem hiding this comment.
Ideally you shouldn't interleave network calls and print statements; instead all errors should be collected, then printed together. Doing so is ok for now, but can you add a TODO to separate these two actions?
There was a problem hiding this comment.
Sure, so modify these two functions to return errors back to DoCSVWait() instead of printing.?
There was a problem hiding this comment.
Not necessarily. printDeploymentErrors() should look something like (in pseudocode):
depErrs := make(map[string]string)
podErrsForDep := make(map[string]map[string]string)
for dep in deployements {
for cond in dep.Status.Conditions {
if isStatusNotAvailable(dep) {
depErrs[dep.Name] = dep.Status.Reason
podErrsForDep[dep.Name] = getPodErrs(dep)
break
}
}
}
for depName, depReason in depErrs {
printf("dep %q error: %s\n", depName, depReason)
for podName, podErr in podErrsForDep[depName] {
printf("\tpod %q error: %s\n", podName, podErr)
}
}
There was a problem hiding this comment.
Added TODO comments , and addressed rest of the suggestions.
| return fmt.Errorf("error getting Pods: %v", err) | ||
| } | ||
| for _, p := range podList.Items { | ||
| if p.Status.Phase != corev1.PodSucceeded { |
There was a problem hiding this comment.
Succeeded means the Pod finished and exited. For the operator pods, they should be running all the time to monitor the CRs, so, the expected status should be Running, not the Succeeded.
| for _, p := range podList.Items { | ||
| if p.Status.Phase != corev1.PodSucceeded { | ||
| for _, cs := range p.Status.ContainerStatuses { | ||
| if !cs.Ready { |
There was a problem hiding this comment.
Sometimes, there are multi containers in a Pod.
|
Since this PR has been merged, I submit PR #3884 to fix it. |
Description: Add deployment status check in
DoCSVWait()function, to determine further root cause on why operator has failed to run, and report back to operator developer.