Commit f945dbb
committed
awstagdeprovision: Ignore more errors
We're leaking clusters in CI because of errors like [1]:
time="2018-11-27T18:48:25Z" level=fatal msg="Unrecoverable error/timed out: error converting route53 zones to internal AWS objects: Throttling: Rate exceeded\n\tstatus code: 400, request id: 0573f1b4-f275-11e8-b479-fd079d6c6b48"
With this commit, we just assume that any error will go away
eventually, and keep rolling forward with exponential backoff. When
that assumption breaks down, we expect the caller (e.g. ci-operator or
a human user) to kill teardown (and optionally fix whatever was
blocking it).
Docs for AWS rate limits are in [2]; the main takeaway is that these
limits are set by AWS with no way for us to request changes, and that
most are per-account (not per-VPC or other resource that scales with
the number of simultaneous CI clusters).
[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/738/pull-ci-openshift-installer-master-e2e-aws/1639/artifacts/e2e-aws/installer/.openshift_install.log
[2]: https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html1 parent 91c0cee commit f945dbb
1 file changed
+3
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1157 | 1157 | | |
1158 | 1158 | | |
1159 | 1159 | | |
1160 | | - | |
| 1160 | + | |
| 1161 | + | |
1161 | 1162 | | |
1162 | 1163 | | |
1163 | 1164 | | |
| |||
1381 | 1382 | | |
1382 | 1383 | | |
1383 | 1384 | | |
1384 | | - | |
| 1385 | + | |
1385 | 1386 | | |
1386 | 1387 | | |
1387 | 1388 | | |
| |||
0 commit comments