destroy: return aws resources that could not be deleted#3772
destroy: return aws resources that could not be deleted#3772staebler wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
/hold This builds on #3765. |
7d5fcb4 to
9465ece
Compare
|
As suggested in #3765 (comment), the destroyer will now collect--but not attempt to delete--the un-deleted tagged resources when the context expires while trying to delete the EC2 instances. |
ed89e9c to
464da6b
Compare
|
/hold cancel #3765 has merged. |
|
/assign @patrickdillon @jstuever |
|
Hi, we really need someone to take a look at this. @patrickdillon @jstuever do you have some cycles to review? @sdodson @abhinavdahiya can you assign other reviewers if @patrickdillon and @jstuever can't get to it? |
|
/cc @abhinavdahiya |
|
I'll take another look at this after code freeze; sorry for the delay. |
abhinavdahiya
left a comment
There was a problem hiding this comment.
I am not in favor of the second level of tracking that is being added here for remaining,
i.e need to delete the instance (that we identified from tag) but that usually requires us to also clean up instance iam profile so we also add that profile resource to the tracker.
I think the best way to provide most information is to just tracker the tagged resources for users to describe remaining, our discovery is best effort and a lot of the times there are other resources that block deletion of the tagged resources that only AWS knows or provides in the error message.
So telling the user VPC deletion is left is all the information that is required for user to go ahead and delete the VPC in case we cannot.
Thanks for the review, @abhinavdahiya. I can live with only returning the tagged resources that could not be deleted rather than also including the next-level resources. I will rework this PR with that in mind. |
464da6b to
5a09a63
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
@abhinavdahiya I've re-written the changes. The logic should be much easier to follow now. |
There was a problem hiding this comment.
I would like to add a comment about why EC2 instances are special and need to be in the terminated state first before attempting to delete other resources. Can someone shed light on this for me?
There was a problem hiding this comment.
This commit message should provide context on why we stop instances before anything else. cf69c1e
There was a problem hiding this comment.
I think this needs a region argument otherwise this might not function correctly..?
aws.NewConfig().WithRegion(region) something like that..
There was a problem hiding this comment.
The current code in master does not specify the region. The awsSession should already be configured with the region.
installer/pkg/destroy/aws/aws.go
Line 168 in d91c40b
|
The change looks good in at high level, the destroy isn't completing currently see e2e-aws exceeding the 4 hour timeout with no deprovision logs. So i think there is something missing here. maybe https://github.com/openshift/installer/pull/3772/files#discussion_r504111660 ?? |
The `RunWithContext` method has been modified to return a slice of ARNs that could not be destroyed. Only the ARNs of the first-level resources will be returned. For example, when deleting a VPC, the uninstaller will first delete other resources that use the VPC. Any of those resources that are blocked from being deleted but are not tagged for the cluster will not show up in the list of blocked resources. However, the first-level VPC will show up as blocked in this case. This will be used by Hive to expose to the user the resources that cannot be deleted so that the user can take action on those resources. https://issues.redhat.com/browse/CO-973
5a09a63 to
5db7fb7
Compare
|
@staebler: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The tests are timing out running e2e-aws-gather-core-dump, before any attempts to deprovision. |
I think that it is a problem with this PR in particular and not the code in general. This may be because this PR is quite old and is maybe not getting some changes that have been made to the tests in the past couple months. I copied the branch and opened a new PR with the same code (#4270), and the e2e-aws test passes. I am going to close this PR in favor of the new one. /close |
|
@staebler: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The
RunWithContextmethod has been modified to return a slice of ARNs that could not be destroyed. Only the ARNs of the first-level resources will be returned. For example, when deleting a VPC, the uninstaller will first delete other resources that use the VPC. Any of those resources that are blocked from being deleted but are not tagged for the cluster will not show up in the list of blocked resources. However, the first-level VPC will show up as blocked in this case.This will be used by Hive to expose to the user the resources that cannot be deleted so that the user can take action on those resources.
https://issues.redhat.com/browse/CO-973