run bundle: Verify operator is installed/upgraded#3766
run bundle: Verify operator is installed/upgraded#3766bharathi-tenneti wants to merge 0 commit intooperator-framework:masterfrom
Conversation
| } | ||
|
|
||
| if csvKey.Name == "" { | ||
| return v1alpha1.InstallPlanFailed, fmt.Errorf("could not find installed CSV in install plan") |
There was a problem hiding this comment.
Seems like here should return the CSV phase, not the InstallPlan's.
| return true, nil | ||
| } | ||
| return false, nil | ||
| }, ctx.Done()); err != nil { |
There was a problem hiding this comment.
I'm confused here, when failed to get the CSV, why still need to check the Deployment? Based on my understanding, both CSV and deployment work well, we can say the operator is installed successfully. Anyone of them failed, we can say the operator failed to install.
There was a problem hiding this comment.
As I understand, this is to get the deployment reason for failure.
@joelanford Is that what you were suggesting?
There was a problem hiding this comment.
The idea is that if the CSV never succeeds, we should try to be helpful and tell a user why the CSV didn't succeed.
One common reason is that the operator deployment hasn't rolled out.
I think we should only check the deployment status if the error from the wait.PollImmediateUntil call is context.DeadlineExceeded.
There was a problem hiding this comment.
Thank you! I see now, yeah, we should try to return the helpful info. Besides the Deployment object, the rule policy(role, clusterrole, rolebinding ...) leads to many failures.
|
@bharathi-tenneti Could you help add the unit test or e2e test for this feature? Thanks! |
IMO: We ought to call the command in : |
| @@ -17,9 +17,13 @@ package internal | |||
| import ( | |||
There was a problem hiding this comment.
Missing fragment since it is an aditional of a new command.
e2e tests and unit tests are coming in this sprint. |
12d9a86 to
8927719
Compare
| for _, s := range dep.Status.Conditions { | ||
| if s.Type != "Progressing" && s.Type != "Available" { | ||
| return nil, fmt.Errorf("Deployment failed: %v", s.Message) | ||
| } | ||
| } |
There was a problem hiding this comment.
I ran a quick test by creating a deployment with an image I know doesn't exist. The deployment conditions I see are:
conditions:
- lastTransitionTime: "2020-08-26T17:11:29Z"
lastUpdateTime: "2020-08-26T17:11:29Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2020-08-26T17:11:29Z"
lastUpdateTime: "2020-08-26T17:11:29Z"
message: ReplicaSet "test-dep-97bfdcd79" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: ProgressingAnd the underlying pod has this:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-08-26T17:11:29Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-08-26T17:11:29Z"
message: 'containers with unready status: [test-fake-jwl-image]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-08-26T17:11:29Z"
message: 'containers with unready status: [test-fake-jwl-image]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-08-26T17:11:29Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: test-fake-jwl-image
imageID: ""
lastState: {}
name: test-fake-jwl-image
ready: false
restartCount: 0
started: false
state:
waiting:
message: Back-off pulling image "test-fake-jlanford-image"
reason: ImagePullBackOffWe probably need to be checking all of the deployment conditions and looking for any of them that are indicative of failure. And then if we find one, perhaps we could try digging further, such that we eventually end up being able to tell the user something like:
failed to run operator: timed out waiting for csv success: deployment unavailable: pod is Pending with ImagePullBackOff: Back-off pulling image "test-fake-jlanford-image"
There was a problem hiding this comment.
@joelanford So, DeploymentConditions of Progressing and Available are not considered failure. In current code , we are considering as failure, if the condition is anything else
for _, s := range dep.Status.Conditions {
if s.Type != "Progressing" && s.Type != "Available" {
return nil, fmt.Errorf("Deployment failed: %v", s.Message)
}Currently, I am returning the Message filed in such case. However, as you have suggested to dig deeper, we need to dig into pods, and grab the error message from there. Is that what you meant?
There was a problem hiding this comment.
So, DeploymentConditions of Progressing and Available are not considered failure
We have to look at the condition status too. For example, Available = False and Progressing = False mean something's not working correctly.
| return nil, fmt.Errorf("Deployment failed: %v", s.Message) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
If the error is not nil, but we were unable to determine a root cause, we should still return the error, possibly wrapped with some extra context if we know anything else about it.
| csvKey := types.NamespacedName{ | ||
| Namespace: o.cfg.Namespace, | ||
| } | ||
|
|
||
| if csvKey.Name == "" { | ||
| return nil, fmt.Errorf("could not find installed CSV in install plan") | ||
| } |
There was a problem hiding this comment.
This will always result in the error being returned right? Should we pass the install plan to this function so we can find the CSV key?
There was a problem hiding this comment.
Moved csvKey login into install Plan function.
|
@bharathi-tenneti: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
c6a8772 to
d8a7556
Compare
Description:
Verification logic will handle verifying the operator is installed beginning with the starting CSV and upgraded through the ending CSV.
Log what happened and final state (e.g. operator is running in namespace foo, watching namespace bar.
Use CSV status and deployment status as a way to verify the operator.