BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed#696
Conversation
|
@JoelSpeed: This pull request references Bugzilla bug 1875598, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| // Any updates to the status must be done after this point. | ||
| baseToPatch := client.MergeFrom(machine.DeepCopy()) | ||
|
|
||
| if phase == phaseFailed { |
There was a problem hiding this comment.
shouldn't this reuse the if block in line 429?
There was a problem hiding this comment.
If we use the block from above, then it's before the baseToPatch is created and as such, the patching mechanism thinks that the changes were there already. This modification has to come after baseToPatch is created, but the other one has to come before it, because it is modifying the spec.
I added the line on 438 to try and make this more obvious
| baseToPatch := client.MergeFrom(machine.DeepCopy()) | ||
|
|
||
| if phase == phaseFailed { | ||
| if err := r.patchFailedMachineProviderStatusState(machine); err != nil { |
There was a problem hiding this comment.
this is called patchFailedMachineProviderStatusState but is not actually patching but just setting right?
We should be consistent with a patchFailedMachineInstanceAnnotation and either let patchFailedMachineProviderStatusState to create a baseToPatch and do its patch request or rename both funcs and let them just set.
There was a problem hiding this comment.
I agree the naming is wrong, but I think the Instance annotation needs to stay as a patching function otherwise we have the issue of not actually updating the resource. The annotation needs to send a client.Patch but the rest of the changes need a client.Status().Patch because of the subresource.
I think the preferable route here is to just rename the new one not to include the patching word and leave the logic as is
This ensures that if the Machine goes into a Failed state, and the provider has already set the providerStatus to include an instanceState or vmState, that we override this with the value `Unknown`. This ensures consistency between the providerStatus and the instance state annotation.
6bdbf76 to
00d6df0
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
/test e2e-aws-operator |
|
/test e2e-aws-operator |
1 similar comment
|
/test e2e-aws-operator |
|
/lgtm |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
10 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/lgtm |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
| const instanceStateField = "instanceState" | ||
| const vmStateField = "vmState" | ||
|
|
||
| providerStatus, err := runtime.DefaultUnstructuredConverter.ToUnstructured(machine.Status.ProviderStatus) |
There was a problem hiding this comment.
Will this work correctly if ProviderStatus is nil?
There was a problem hiding this comment.
There is a check on L#480 which prevents us getting this far if the providerStatus is nil, hence that scenario is avoided (and if the check is removed, the tests fail)
| const vmStateField = "vmState" | ||
|
|
||
| providerStatus, err := runtime.DefaultUnstructuredConverter.ToUnstructured(machine.Status.ProviderStatus) | ||
| if err != nil { |
There was a problem hiding this comment.
Should this really prevent us from doing the rest?
| if phase == phaseFailed { | ||
| if err := r.overrideFailedMachineProviderStatusState(machine); err != nil { | ||
| klog.Errorf("Failed to update machine provider status %q: %v", machine.GetName(), err) | ||
| return err |
There was a problem hiding this comment.
No need to fail the whole run here, we should log the error and continue processing. Nothing about this added behavior for setting subsequent status.
There was a problem hiding this comment.
I've had a look through the kinds of errors that we could get returned from this function and I don't think in a running environment we would ever actually see them. To me, they all seem to be that they would exist only from programming errors (https://github.com/kubernetes/apimachinery/blob/94222d04a59075a01fddedd696037db9e61db6e9/pkg/runtime/converter.go#L404), digging into this, an error could come up if either the object passed is nil (Which is already being checked), or if the type of object cannot be converted there is a possibility for an error. In either of these cases we should see this during unit testing so I think this is better to leave as it is, else we may miss our programming errors as we make changes to this in the future
|
/hold |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/hold cancel |
|
/retest |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@JoelSpeed: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@JoelSpeed: All pull requests linked via external trackers have merged: Bugzilla bug 1875598 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This ensures that if the Machine goes into a Failed state, and the provider has already set the providerStatus to include an instanceState or vmState, that we override this with the value
Unknown.This ensures consistency between the providerStatus and the instance state annotation.
Ref:
machine-api-operator/pkg/apis/vsphereprovider/v1beta1/vsphereproviderstatus_types.go
Line 64 in 184fb73
The OpenStack and Baremetal providers do not have an equivalent field.