[full ci] Prevent vic-machine from deleting non cVMs by andrewtchin · Pull Request #6679 · vmware/vic

andrewtchin · 2017-11-02T19:12:08Z

cgtexmex · 2017-11-03T18:00:41Z

lib/install/management/delete.go

 	}
+	if d.parentResourcepool != nil {
+		if d.parentResourcepool.Reference() == defaultrp.Reference() {
+			return fmt.Errorf("Targed VCH is in cluster's default resource pool. Refusing to delete it.")


Why would we not remove if it's in the cluster default? We won't install there, so it's been moved there...why not remove it?

I thought this was a requirement, but that makes sense to me. I removed this

Well..hell...looking at the issue I see why you'd think that was a requirement..I think the first item should have said "vic-machine create will fail if the target rp is the default rp"...was really thinking of adding a new validation rule. I'm not sure that's really needed...

I think we prevent the install into a cluster default RP by creating an RP on each deploy. Previously there was an issue that would (in some cases) result in a VCH deployed to the default RP, but I think since 1.2.1 that's fixed...

yea you're right 👍

cgtexmex · 2017-11-03T18:04:15Z

lib/install/management/delete.go

+			return err
+		}
+		// Assume the cVMs and RP have already been deleted
+		log.Warnf("Proceeding with delete of VCH due to --force")


Will there be later feedback / logging that no containerVMs have been removed? If not should that be stated here? i.e. "No container VMs found, but proceeding with delete of VCH due to use of --force"

IMO there's a possibility that the containerVMs exist else where, so they would be "orphaned". Just want to make sure the user understands that no other VMs will be removed...

Good point. Added this

mdharamadas1

lgtm

hickeng · 2017-11-06T20:43:12Z

lib/install/management/appliance.go

+		err = errors.Errorf("Failed to fetch guest info of appliance vm: %s", err)
+		return false, err
+	}
+	extraconfig.Decode(extraconfig.MapSource(info), &cspec)


This requires a call to migrate to be version tolerant. While we've not updated the id field yet we should code defensively.

Alternative is that this function is changed to check for the presence of the version field itself: https://github.com/vmware/vic/blob/master/lib/config/executor/container_vm.go#L206

hickeng · 2017-11-06T20:53:48Z

lib/install/management/delete.go

+		log.Warnf("No container VMs found, but proceeding with delete of VCH due to --force")
+		err = nil
+	}
+	if d.parentResourcepool != nil {


Why is this check here instead of at point of use in DeleteVCHInstances where it was before?

this is required since I use d.parentResourcepool above and that could be nil if the parent RP was deleted OOB

hickeng · 2017-11-06T20:56:51Z

lib/install/management/delete.go

-		// if container delete failed, do not remove anything else
-		log.Infof("Specify --force to force delete")
-		return err
+	d.parentResourcepool, err = d.getComputeResource(vmm, conf)


I'm not sure what adding this check here gains us - the reported error condition was having had the appliance moved without updating the ComputeResource reference in the configuration, therefore we ended up with d.parentResourcePool == thecluster - this change will not prevent that.

DeleteVCHInstances would return an error in its old form if no parent resource pool could be found.

This and the next change below was to allow a VCH that is moved to another RP to be deleted if identified by moid (--id) when the original RP that the VCH was created in no longer exists. In this case parentResourcepool would previously have returned an error causing this to bail and not delete the VCH even though it is found by its moid

d.parentResourcePool can have the following values:
a. moid of originally targeted resource pool (endpointVM may not still be in that pool)
b. current parent pool of the endpointVM

In either case we now delete children of that pool that can be identified as cVMs, then attempt to destroy that pool if it's empty.

I'm unsure what happens if the endpointVM is the last VM in the cluster and the pool returned is the root pool for the cluster. I do not know what happens if you call destroy on the root resource pool of a cluster, but to be safe I'd rather we never do so.

hickeng · 2017-11-06T20:59:31Z

lib/install/management/delete.go

 		if ok {
-			// child is vch; detach all attached disks so later removal of images is successful
+			// Do not delete a VCH in the target RP if it is not the target VCH
+			if child.Reference() != vmm.Reference() {


We should not be allowing multiple VCHs in a single resource pool - the pool is part of the VCH. If it's possible to deploy in this manner then a check should be added in the create path and we should look at preventing the appliance from being moved out of it's parent pool.
Not going to block this PR on it however.

This was to address if the user OOB moved a VCH into another VCH's RP. I will open an issue to disable moving a VCH

hickeng

We need to be more failsafe in this when it comes to the parent resource pool. I think we should remove the failover path for the actual parent pool - this code was added in #3116 but I'm unsure what specific scenario was being addressed with this path. We should also confirm that it's not possible to deploy the endpointVM directly into the root pool, and perhaps a sanity check to return nil if the ComputeResource[0] is the root pool.

Given this then for delete, if the parent pool is nil and --force is supplied, we do not attempt to delete children or delete the pool afterwards.

isContainerVM is okay, and used effectively for deleting the children.

hickeng · 2017-11-09T16:09:23Z

lib/install/management/delete.go

+			return err
+		}
+		// Assume the cVMs and RP have already been deleted
+		log.Warnf("No container VMs found, but proceeding with delete of VCH due to --force")


I do not see a path by which both err and d.parentResourcepool are non-nil so this logic will simply skip the attempt to delete the VCH when we hit the d.parentResourcePool != nil condition.

The text about containerVMs is also misleading given we've not looked for containerVMs at this point.

hickeng · 2017-11-09T16:22:16Z

lib/install/management/delete.go

-		// if container delete failed, do not remove anything else
-		log.Infof("Specify --force to force delete")
-		return err
+	d.parentResourcepool, err = d.getComputeResource(vmm, conf)


d.parentResourcePool can have the following values:
a. moid of originally targeted resource pool (endpointVM may not still be in that pool)
b. current parent pool of the endpointVM

In either case we now delete children of that pool that can be identified as cVMs, then attempt to destroy that pool if it's empty.

I'm unsure what happens if the endpointVM is the last VM in the cluster and the pool returned is the root pool for the cluster. I do not know what happens if you call destroy on the root resource pool of a cluster, but to be safe I'd rather we never do so.

hickeng · 2017-11-09T16:24:27Z

tests/test-cases/Group6-VIC-Machine/6-03-Delete.md

+5. Delete the VM and RP to cleanup
+
+### Expected Outcome:
+1. All steps should succeed


We need another two tests:

endpointVM is moved out of it's pool, pool is left alone

endpointVM is moved into the root of the cluster and the original pool is deleted.

Just to be clear, for 1) the endpointVM should be successfully deleted and the RP and its children should not be deleted
and for 2) the original pool should be deleted OOB after endpointVM is moved to the root pool then vic-machine delete should successfully delete the endpointVM?
Thanks

correct

correct - and other VMs in the root pool should be left alone (which is what the current test asserts but worth being explicit)

andrewtchin · 2017-11-15T22:06:49Z

Added the integration tests, working through them now.
@hickeng

For "I think we should remove the failover path for the actual parent pool - this code was added in Delete old VCH #3116 but I'm unsure what specific scenario was being addressed with this path." what failover path is that that i need to remove?
I have confirmed that on create we always create a new RP so we can't have a VCH in the root RP. Given that do we still need this? "a sanity check to return nil if the ComputeResource[0] is the root pool."
Does this "I do not know what happens if you call destroy on the root resource pool of a cluster, but to be safe I'd rather we never do so." mean that after we have deleted a VCH that was moved into the root RP we do not want to call the pool destroy?

hickeng · 2017-11-16T01:09:06Z

This is the failover path I'm talking about - https://github.com/vmware/vic/pull/3116/files#diff-786c272aaac433e0eaae7299e08b52b7R91
If we remove the failover path that switches to the actual parent and not the recorded parent then no, we do not need the sanity check.
Correct - we do not want to even try to destroy the root pool

There should be a follow up issue to look at disabling Move for the individual VMs.

hickeng

Thanks for the repeated iteration and addressing my paranoia!

hickeng · 2017-11-18T00:19:40Z

lib/install/management/delete.go

+		}
+		// Can't find the RP VCH was created in to delete cVMs, continue anyway
+		log.Warnf("No container VMs found, but proceeding with delete of VCH due to --force")
+		err = nil


getComputeResource never returns non-nil for both rp and err. If you set err = nil here then we will still skip over the delete of the containerVMs (not sure why the method is called DeleteVCHInstances).

I don't know what that will mean for deleting images if they are still attached to cVMs - I suspect it will fail with errors about locked vmdks if any of the cVMs are powered on. I don't see that that will cause any other "interesting" behaviours like deleting non-VCH related things, but please sanity check me if you haven't looked at this path.

hickeng · 2017-11-18T00:25:01Z

tests/test-cases/Group6-VIC-Machine/6-03-Delete.robot

+
+    # Delete with force
+    ${moid}=  Get VM Moid  %{VCH-NAME}
+    ${ret}  ${output}=  Run And Return Rc And Output  bin/vic-machine-linux delete --target %{TEST_URL} --user %{TEST_USERNAME} --password=%{TEST_PASSWORD} --compute-resource=%{TEST_RESOURCE} --id ${moid} --force


As a note, if you're specifying --id then I don't think you need --compute-resource. I frequently use --id to override which VCH I'm deleting when working with nimbus or vmc.

hickeng · 2017-11-18T00:25:52Z

tests/test-cases/Group6-VIC-Machine/6-03-Delete.robot

+
+
+Delete VCH moved from its RP
+    Run Keyword If  '%{HOST_TYPE}' == 'ESXi'  Pass Execution  Test skipped on ESX due to unable to move into RP


ESX does have resource pools and we should be able to move VMs between them. It's possible that govc is missing exposure of the appropriate command.
(I'm not sure how to view resource pools in the H5 client however).

andrewtchin · 2017-11-21T00:58:22Z

Replacement PR in #6816 to get the build to run

vmwclabot added the cla-not-required label Nov 2, 2017

andrewtchin force-pushed the 6603/vic-machine-overzealous branch from 47c8440 to 72fd963 Compare November 2, 2017 19:13

andrewtchin requested review from cgtexmex, emlin and hickeng November 3, 2017 14:41

cgtexmex reviewed Nov 3, 2017

View reviewed changes

cgtexmex approved these changes Nov 6, 2017

View reviewed changes

andrewtchin changed the title ~~Prevent vic-machine from deleting non cVMs~~ [specific ci=6-03-Delete] Prevent vic-machine from deleting non cVMs Nov 6, 2017

mdharamadas1 approved these changes Nov 6, 2017

View reviewed changes

hickeng reviewed Nov 6, 2017

View reviewed changes

andrewtchin force-pushed the 6603/vic-machine-overzealous branch from ee88c38 to c37c105 Compare November 7, 2017 20:15

hickeng suggested changes Nov 9, 2017

View reviewed changes

andrewtchin force-pushed the 6603/vic-machine-overzealous branch 2 times, most recently from 820279d to 451b14b Compare November 17, 2017 23:35

hickeng approved these changes Nov 18, 2017

View reviewed changes

andrewtchin changed the title ~~[specific ci=6-03-Delete] Prevent vic-machine from deleting non cVMs~~ [full ci] Prevent vic-machine from deleting non cVMs Nov 18, 2017

andrewtchin force-pushed the 6603/vic-machine-overzealous branch from 451b14b to 71e19b4 Compare November 19, 2017 15:04

Andrew Chin added 3 commits November 20, 2017 15:08

Make vic-machine delete safe again

56e104d

update

f55737b

update

92cf5cd

andrewtchin force-pushed the 6603/vic-machine-overzealous branch from 68e9530 to 92cf5cd Compare November 20, 2017 23:18

andrewtchin mentioned this pull request Nov 21, 2017

Prevent vic-machine from deleting non cVMs #6816

Merged

andrewtchin closed this Nov 21, 2017



		Delete VCH moved from its RP
		Run Keyword If '%{HOST_TYPE}' == 'ESXi' Pass Execution Test skipped on ESX due to unable to move into RP

Conversation

andrewtchin commented Nov 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdharamadas1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewtchin Nov 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hickeng left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewtchin commented Nov 15, 2017

Uh oh!

hickeng commented Nov 16, 2017

Uh oh!

hickeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewtchin commented Nov 21, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andrewtchin Nov 7, 2017 •

edited

Loading

hickeng left a comment •

edited

Loading