Skip to content

Commit d4b9756

Browse files
committed
contrib/pkg/awstagdeprovision: Switch to DescribeInstancesPages
In case there are too many instance to fit on the single DescribeInstances page. Docs for the new function are in [1]. This might address an issue I saw yesterday, where the instance reaper exited early: $ oc logs --timestamps -f e2e-aws -c teardown | tee /tmp/teardown.log $ cat /tmp/teardown.log 2018-10-24T01:45:49.406653647Z Gathering artifacts ... ... 2018-10-24T01:46:16.142364238Z Waiting for logs ... ... 2018-10-24T01:46:19.33849685Z Deprovisioning cluster ... ... 2018-10-24T01:46:19.359616557Z level=debug msg="Deleting instances" ... 2018-10-24T01:46:19.988936278Z level=debug msg="deleting instance: i-0bfc77b0fd7bbe707" ... 2018-10-24T01:46:20.173421738Z level=debug msg="deleting instance: i-0905586e42655a097" ... 2018-10-24T01:46:20.362874514Z level=debug msg="deleting instance: i-06ae20414f46aaccc" ... 2018-10-24T01:46:20.527601571Z level=debug msg="deleting instance: i-0bd8dc53eb954d0b8" ... 2018-10-24T01:46:20.713777056Z level=debug msg="deleting instance: i-01c91b49aba53d43b" ... 2018-10-24T01:46:20.891650892Z level=debug msg="deleting instance: i-0326b5e815732422e" ... 2018-10-24T01:46:21.0556686Z level=debug msg="deleting instance: i-05c9d0368d46be9b2" ... 2018-10-24T01:46:21.186803438Z level=debug msg="Exiting deleting instances" ... 2018-10-24T01:46:31.187047842Z level=debug msg="Deleting instances" ... 2018-10-24T01:46:31.533318629Z level=debug msg="Exiting deleting instances" 2018-10-24T01:46:31.533340968Z level=debug msg="goroutine deleteInstances complete" ... 2018-10-24T02:34:00.038768501Z level=debug msg="Deleting VPCs" 2018-10-24T02:34:00.463719417Z level=debug msg="deleting VPC: vpc-057311209bfc67050" 2018-10-24T02:34:00.528213402Z level=debug msg="error deleting VPC vpc-057311209bfc67050: DependencyViolation: The vpc 'vpc-057311209bfc67050' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: 65f40f64-8e08-467f-8ae8-cd320d9630c7" 2018-10-24T02:34:00.528272636Z level=debug msg="Exiting deleting VPCs" ... 2018-10-24T02:48:25.570568032Z level=debug msg="Deleting VPCs" 2018-10-24T02:48:25.739046406Z level=debug msg="Exiting deleting VPCs" 2018-10-24T02:48:25.739139735Z level=debug msg="goroutine deleteVPCs complete" ... That attempts deletion for seven instances, which sounds right (one bootstrap, and three masters and workers each). But you can see that VPC deletion hung for over an hour due to a blocking dependency. I ended up deleting a leftover master via the AWS console, which allowed me to delete the VPC (also from the console). It's possible that the destroy logic would have cleaned up the VPC on its own, but with 14 minutes between attempts I didn't want to wait (can we cap the exponential backoff? Or just poll every two minutes or something without backoff). Unfortunately I did not collect tag information from that master, so I'm not entirely sure why the automated destroyer missed it. My initial guess was that we had more than one page of instances in the account and the leftover master missed the first page, causing the instance goroutine to exit thinking its task was complete. But it looks like the instance requests are filtered on the server side, which makes "no instances in the first page that match but there are instances in later pages" less likely ;). Still, solid pagination seems like a useful thing to have even if it wasn't the cause of this particular issue. [1]: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#EC2.DescribeInstancesPages
1 parent 35b7998 commit d4b9756

File tree

1 file changed

+9
-14
lines changed

1 file changed

+9
-14
lines changed

contrib/pkg/awstagdeprovision/awstagdeprovision.go

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -700,17 +700,9 @@ func deleteInstances(session *session.Session, filter AWSFilter, clusterName str
700700
Values: []*string{aws.String("running")},
701701
})
702702

703-
for {
704-
results, err := ec2Client.DescribeInstances(&describeInstancesInput)
705-
if err != nil {
706-
logger.Debugf("error listing instances: %v", err)
707-
return false, nil
708-
}
709-
710-
if len(results.Reservations) == 0 {
711-
break
712-
}
713-
703+
found := false
704+
err := ec2Client.DescribeInstancesPages(&describeInstancesInput, func(results *ec2.DescribeInstancesOutput, lastPage bool) bool {
705+
found = found || len(results.Reservations) > 0
714706
for _, reservation := range results.Reservations {
715707
for _, instance := range reservation.Instances {
716708
// first delete any instance profiles (they are not tagged)
@@ -724,7 +716,7 @@ func deleteInstances(session *session.Session, filter AWSFilter, clusterName str
724716

725717
// now delete the instance
726718
logger.Debugf("deleting instance: %v", *instance.InstanceId)
727-
_, err = ec2Client.TerminateInstances(&ec2.TerminateInstancesInput{
719+
_, err := ec2Client.TerminateInstances(&ec2.TerminateInstancesInput{
728720
InstanceIds: []*string{instance.InstanceId},
729721
})
730722
if err != nil {
@@ -736,10 +728,13 @@ func deleteInstances(session *session.Session, filter AWSFilter, clusterName str
736728
}
737729
}
738730

739-
return false, nil
731+
return lastPage
732+
})
733+
if err != nil {
734+
logger.Debugf("error describing instances: %v", err)
740735
}
741736

742-
return true, nil
737+
return found, nil
743738
}
744739

745740
// deleteSecurityGroupRules will attempt to delete all the rules defined in the given security group

0 commit comments

Comments
 (0)