Skip to content

[full ci] Prevent vic-machine from deleting non cVMs#6679

Closed
andrewtchin wants to merge 3 commits intovmware:masterfrom
andrewtchin:6603/vic-machine-overzealous
Closed

[full ci] Prevent vic-machine from deleting non cVMs#6679
andrewtchin wants to merge 3 commits intovmware:masterfrom
andrewtchin:6603/vic-machine-overzealous

Conversation

@andrewtchin
Copy link
Contributor

Fixes #6603

}
if d.parentResourcepool != nil {
if d.parentResourcepool.Reference() == defaultrp.Reference() {
return fmt.Errorf("Targed VCH is in cluster's default resource pool. Refusing to delete it.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we not remove if it's in the cluster default? We won't install there, so it's been moved there...why not remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a requirement, but that makes sense to me. I removed this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well..hell...looking at the issue I see why you'd think that was a requirement..I think the first item should have said "vic-machine create will fail if the target rp is the default rp"...was really thinking of adding a new validation rule. I'm not sure that's really needed...

I think we prevent the install into a cluster default RP by creating an RP on each deploy. Previously there was an issue that would (in some cases) result in a VCH deployed to the default RP, but I think since 1.2.1 that's fixed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea you're right 👍

return err
}
// Assume the cVMs and RP have already been deleted
log.Warnf("Proceeding with delete of VCH due to --force")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be later feedback / logging that no containerVMs have been removed? If not should that be stated here? i.e. "No container VMs found, but proceeding with delete of VCH due to use of --force"

IMO there's a possibility that the containerVMs exist else where, so they would be "orphaned". Just want to make sure the user understands that no other VMs will be removed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added this

@andrewtchin andrewtchin changed the title Prevent vic-machine from deleting non cVMs [specific ci=6-03-Delete] Prevent vic-machine from deleting non cVMs Nov 6, 2017
Copy link
Contributor

@mdharamadas1 mdharamadas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

err = errors.Errorf("Failed to fetch guest info of appliance vm: %s", err)
return false, err
}
extraconfig.Decode(extraconfig.MapSource(info), &cspec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a call to migrate to be version tolerant. While we've not updated the id field yet we should code defensively.

Alternative is that this function is changed to check for the presence of the version field itself: https://github.com/vmware/vic/blob/master/lib/config/executor/container_vm.go#L206

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

log.Warnf("No container VMs found, but proceeding with delete of VCH due to --force")
err = nil
}
if d.parentResourcepool != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check here instead of at point of use in DeleteVCHInstances where it was before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is required since I use d.parentResourcepool above and that could be nil if the parent RP was deleted OOB

// if container delete failed, do not remove anything else
log.Infof("Specify --force to force delete")
return err
d.parentResourcepool, err = d.getComputeResource(vmm, conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what adding this check here gains us - the reported error condition was having had the appliance moved without updating the ComputeResource reference in the configuration, therefore we ended up with d.parentResourcePool == thecluster - this change will not prevent that.

DeleteVCHInstances would return an error in its old form if no parent resource pool could be found.

Copy link
Contributor Author

@andrewtchin andrewtchin Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next change below was to allow a VCH that is moved to another RP to be deleted if identified by moid (--id) when the original RP that the VCH was created in no longer exists. In this case parentResourcepool would previously have returned an error causing this to bail and not delete the VCH even though it is found by its moid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.parentResourcePool can have the following values:
a. moid of originally targeted resource pool (endpointVM may not still be in that pool)
b. current parent pool of the endpointVM

In either case we now delete children of that pool that can be identified as cVMs, then attempt to destroy that pool if it's empty.

I'm unsure what happens if the endpointVM is the last VM in the cluster and the pool returned is the root pool for the cluster. I do not know what happens if you call destroy on the root resource pool of a cluster, but to be safe I'd rather we never do so.

if ok {
// child is vch; detach all attached disks so later removal of images is successful
// Do not delete a VCH in the target RP if it is not the target VCH
if child.Reference() != vmm.Reference() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be allowing multiple VCHs in a single resource pool - the pool is part of the VCH. If it's possible to deploy in this manner then a check should be added in the create path and we should look at preventing the appliance from being moved out of it's parent pool.
Not going to block this PR on it however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to address if the user OOB moved a VCH into another VCH's RP. I will open an issue to disable moving a VCH

@andrewtchin andrewtchin force-pushed the 6603/vic-machine-overzealous branch from ee88c38 to c37c105 Compare November 7, 2017 20:15
Copy link
Contributor

@hickeng hickeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be more failsafe in this when it comes to the parent resource pool. I think we should remove the failover path for the actual parent pool - this code was added in #3116 but I'm unsure what specific scenario was being addressed with this path. We should also confirm that it's not possible to deploy the endpointVM directly into the root pool, and perhaps a sanity check to return nil if the ComputeResource[0] is the root pool.

Given this then for delete, if the parent pool is nil and --force is supplied, we do not attempt to delete children or delete the pool afterwards.

isContainerVM is okay, and used effectively for deleting the children.

return err
}
// Assume the cVMs and RP have already been deleted
log.Warnf("No container VMs found, but proceeding with delete of VCH due to --force")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a path by which both err and d.parentResourcepool are non-nil so this logic will simply skip the attempt to delete the VCH when we hit the d.parentResourcePool != nil condition.

The text about containerVMs is also misleading given we've not looked for containerVMs at this point.

// if container delete failed, do not remove anything else
log.Infof("Specify --force to force delete")
return err
d.parentResourcepool, err = d.getComputeResource(vmm, conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.parentResourcePool can have the following values:
a. moid of originally targeted resource pool (endpointVM may not still be in that pool)
b. current parent pool of the endpointVM

In either case we now delete children of that pool that can be identified as cVMs, then attempt to destroy that pool if it's empty.

I'm unsure what happens if the endpointVM is the last VM in the cluster and the pool returned is the root pool for the cluster. I do not know what happens if you call destroy on the root resource pool of a cluster, but to be safe I'd rather we never do so.

5. Delete the VM and RP to cleanup

### Expected Outcome:
1. All steps should succeed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need another two tests:

  1. endpointVM is moved out of it's pool, pool is left alone
  2. endpointVM is moved into the root of the cluster and the original pool is deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, for 1) the endpointVM should be successfully deleted and the RP and its children should not be deleted
and for 2) the original pool should be deleted OOB after endpointVM is moved to the root pool then vic-machine delete should successfully delete the endpointVM?
Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. correct
  2. correct - and other VMs in the root pool should be left alone (which is what the current test asserts but worth being explicit)

@andrewtchin
Copy link
Contributor Author

Added the integration tests, working through them now.
@hickeng

  • For "I think we should remove the failover path for the actual parent pool - this code was added in Delete old VCH #3116 but I'm unsure what specific scenario was being addressed with this path." what failover path is that that i need to remove?
  • I have confirmed that on create we always create a new RP so we can't have a VCH in the root RP. Given that do we still need this? "a sanity check to return nil if the ComputeResource[0] is the root pool."
  • Does this "I do not know what happens if you call destroy on the root resource pool of a cluster, but to be safe I'd rather we never do so." mean that after we have deleted a VCH that was moved into the root RP we do not want to call the pool destroy?

@hickeng
Copy link
Contributor

hickeng commented Nov 16, 2017

There should be a follow up issue to look at disabling Move for the individual VMs.

@andrewtchin andrewtchin force-pushed the 6603/vic-machine-overzealous branch 2 times, most recently from 820279d to 451b14b Compare November 17, 2017 23:35
Copy link
Contributor

@hickeng hickeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the repeated iteration and addressing my paranoia!

}
// Can't find the RP VCH was created in to delete cVMs, continue anyway
log.Warnf("No container VMs found, but proceeding with delete of VCH due to --force")
err = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getComputeResource never returns non-nil for both rp and err. If you set err = nil here then we will still skip over the delete of the containerVMs (not sure why the method is called DeleteVCHInstances).

I don't know what that will mean for deleting images if they are still attached to cVMs - I suspect it will fail with errors about locked vmdks if any of the cVMs are powered on. I don't see that that will cause any other "interesting" behaviours like deleting non-VCH related things, but please sanity check me if you haven't looked at this path.


# Delete with force
${moid}= Get VM Moid %{VCH-NAME}
${ret} ${output}= Run And Return Rc And Output bin/vic-machine-linux delete --target %{TEST_URL} --user %{TEST_USERNAME} --password=%{TEST_PASSWORD} --compute-resource=%{TEST_RESOURCE} --id ${moid} --force
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, if you're specifying --id then I don't think you need --compute-resource. I frequently use --id to override which VCH I'm deleting when working with nimbus or vmc.



Delete VCH moved from its RP
Run Keyword If '%{HOST_TYPE}' == 'ESXi' Pass Execution Test skipped on ESX due to unable to move into RP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ESX does have resource pools and we should be able to move VMs between them. It's possible that govc is missing exposure of the appropriate command.
(I'm not sure how to view resource pools in the H5 client however).

@andrewtchin andrewtchin changed the title [specific ci=6-03-Delete] Prevent vic-machine from deleting non cVMs [full ci] Prevent vic-machine from deleting non cVMs Nov 18, 2017
@andrewtchin andrewtchin force-pushed the 6603/vic-machine-overzealous branch from 451b14b to 71e19b4 Compare November 19, 2017 15:04
@andrewtchin
Copy link
Contributor Author

Replacement PR in #6816 to get the build to run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants