ci-operator/step-registry/ipi/deprovision: 25m active_deadline_seconds#13884
Conversation
c6de8ef to
538bec7
Compare
Like 3c915e2 (ci-operator/step-registry/openshift/e2e/test: Add 2h active_deadline_seconds, 2020-10-09, openshift#12647), but for the deprovision step. This helps hold time to gather assets, if ipi-deprovision gets hung up, like it did for over 2h here [1]: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-api-actuator-pkg/190/pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-aws-operator/1329702034437312512/build-log.txt | grep -A2 'Executing pod.*ipi- deprovision' 2020/11/20 09:54:54 Executing pod "e2e-aws-operator-ipi-deprovision-deprovision" 2020/11/20 09:54:57 Container cp-secret-wrapper in pod e2e-aws-operator-ipi-deprovision-deprovision completed successfully {"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2020-11-20T12:24:09Z"} 25 minutes should be enough for most teardown. We have a number of successful runs which take longer: $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=build-log&context=0&search=Pod+.*ipi-deprovision-deprovision+succeeded+after' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's/.*after \([0-9]*\)\([smh]\).*/\2 \1/' | sort -n | uniq -c 29 h 1 1 h 2 16 m 1 21 m 10 36 m 11 33 m 12 47 m 13 40 m 14 23 m 15 21 m 16 22 m 17 20 m 18 17 m 19 18 m 2 11 m 20 13 m 21 19 m 22 22 m 23 8 m 24 17 m 25 12 m 26 8 m 27 14 m 28 9 m 29 16 m 3 15 m 30 17 m 31 ... Success after over an hour is crazy. Which jobs were those? $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=build-log&context=0&search=Pod+.*ipi-deprovision-deprovision+succeeded+after+[0-9]h' | jq -r 'to_entries[] | .key as $uri | .value | to_entries[].value[].co ntext[] | $uri + " " + .' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-kube-reporting-metering-operator-release-4.5-metering-periodic-aws/1333923915013033984 2020/12/02 01:27:54 Pod metering-periodic-aws-ipi-deprovision-deprovision succeeded after 1h12m0s https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-odo-master-v4.6-integration-e2e-periodic/1333923958377943040 2020/12/02 01:28:12 Pod integration-e2e-periodic-ipi-deprovision-deprovision succeeded after 1h5m41s https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.6-e2e-aws-proxy/1333804827905363968 2020/12/01 19:44:10 Pod e2e-aws-proxy-ipi-deprovision-deprovision succeeded after 1h4m2s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/che-incubator_chectl/992/pull-ci-che-incubator-chectl-master-v5-chectl-e2e-operator-installer/1333897643826352128 2020/12/02 01:02:07 Pod chectl-e2e-operator-installer-ipi-deprovision-deprovision succeeded after 1h41m5s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/cri-o_cri-o/4377/pull-ci-cri-o-cri-o-master-e2e-agnostic/1333901999330037760 2020/12/02 00:10:14 Pod e2e-agnostic-ipi-deprovision-deprovision succeeded after 1h8m0s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/cri-o_cri-o/4394/pull-ci-cri-o-cri-o-master-e2e-agnostic/1333866701950816256 2020/12/01 23:03:06 Pod e2e-agnostic-ipi-deprovision-deprovision succeeded after 1h15m3s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift-psap_special-resource-operator/41/pull-ci-openshift-psap-special-resource-operator-master-e2e-aws/1333903530364243968 2020/12/02 00:16:52 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h12m8s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-logging-operator/821/pull-ci-openshift-cluster-logging-operator-master-functional/1333919264872075264 2020/12/02 01:00:51 Pod functional-ipi-deprovision-deprovision succeeded after 1h3m35s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/852/pull-ci-openshift-cluster-network-operator-master-e2e-ovn-hybrid-step-registry/1333766191600111616 2020/12/01 16:15:42 Pod e2e-ovn-hybrid-step-registry-ipi-deprovision-deprovision succeeded after 1h4m35s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/886/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-windows-custom-vxlan/1333813238646706176 2020/12/01 18:01:39 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h8m32s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/891/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-windows-custom-vxlan/1333907617315033088 2020/12/02 01:32:28 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h23m3s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_console-operator/491/pull-ci-openshift-console-operator-master-e2e-aws-operator/1333879999068901376 2020/12/02 00:11:37 Pod e2e-aws-operator-ipi-deprovision-deprovision succeeded after 1h7m16s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4429/pull-ci-openshift-installer-master-e2e-aws-workers-rhel7/1333912709506273280 2020/12/02 01:34:18 Pod e2e-aws-workers-rhel7-ipi-deprovision-deprovision succeeded after 1h12m35s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4440/pull-ci-openshift-installer-master-e2e-aws-shared-vpc/1333864843601514496 2020/12/01 23:32:40 Pod e2e-aws-shared-vpc-ipi-deprovision-deprovision succeeded after 1h2m49s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes-autoscaler/180/pull-ci-openshift-kubernetes-autoscaler-master-e2e-aws-operator/1333874595857436672 2020/12/01 23:02:01 Pod e2e-aws-operator-ipi-deprovision-deprovision succeeded after 1h19m11s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-api-operator/769/pull-ci-openshift-machine-api-operator-master-e2e-aws/1333776539803717632 2020/12/01 16:58:56 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h15m39s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2210/pull-ci-openshift-machine-config-operator-master-e2e-aws-serial/1333867792788623360 2020/12/02 00:09:53 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h19m58s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2210/pull-ci-openshift-machine-config-operator-master-okd-e2e-aws/1333839577021943808 2020/12/01 21:59:09 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h2m1s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2272/pull-ci-openshift-machine-config-operator-master-e2e-aws-serial/1333899974395564032 2020/12/02 01:07:48 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h5m13s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/657/pull-ci-openshift-oc-master-e2e-aws/1333850678639988736 2020/12/01 21:43:41 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h6m6s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/657/pull-ci-openshift-oc-master-e2e-aws/1333910781766406144 2020/12/02 00:49:46 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h17m17s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/660/pull-ci-openshift-oc-master-e2e-aws-serial/1333891301321478144 2020/12/01 23:33:01 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h3m49s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/660/pull-ci-openshift-oc-master-e2e-aws-serial/1333919666019504128 2020/12/02 01:01:44 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h2m25s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/365/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-windows-custom-vxlan/1333781276464779264 2020/12/01 17:09:03 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h31m37s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/366/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-windows-custom-vxlan/1333877629530411008 2020/12/02 00:02:31 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h7m45s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_prometheus/67/pull-ci-openshift-prometheus-master-e2e-aws/1333873401026056192 2020/12/02 00:45:30 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h23m0s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/13998/rehearse-13998-pull-ci-openshift-windows-machine-config-operator-release-4.7-aws-e2e-operator/1333906001283256320 2020/12/02 01:22:07 Pod aws-e2e-operator-ipi-deprovision-deprovision succeeded after 1h1m36s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sriov-network-operator/416/pull-ci-openshift-sriov-network-operator-master-e2e-aws/1333878269035941888 2020/12/02 00:29:39 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h51m18s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sriov-network-operator/425/pull-ci-openshift-sriov-network-operator-master-e2e-aws/1333762244420308992 2020/12/01 16:07:41 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h22m15s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496 2020/12/02 01:06:12 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 2h15m28s Checking that slowest job: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496/artifacts/e2e-aws/ipi-deprovision-deprovision/.openshift_install.log 2>/dev/null | head -n1 time="2020-12-01T22:51:17Z" level=debug msg="OpenShift Installer unreleased-master-3988-g47b4c1a046ee43bc500ff19c98a231a754cc2a70-dirty" $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496/artifacts/e2e-aws/ipi-deprovision-deprovision/.openshift_install.log | tail -n1 time="2020-12-02T01:06:11Z" level=info msg="Time elapsed: 2h14m54s" So the delay is legitimate. But it's due to a hiccup in IAM-user listing (AWS-side bug? Throttling?): $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496/artifacts/e2e-aws/ipi-deprovision-deprovision/.openshift_install.log | grep -A1 'search for IAM users' time="2020-12-01T22:53:53Z" level=debug msg="search for IAM users" time="2020-12-02T00:08:08Z" level=debug msg="search for IAM instance profiles" -- time="2020-12-02T00:38:54Z" level=debug msg="search for IAM users" time="2020-12-02T01:04:14Z" level=debug msg="search for IAM instance profiles" -- time="2020-12-02T01:05:27Z" level=debug msg="search for IAM users" time="2020-12-02T01:05:42Z" level=debug msg="search for IAM instance profiles" -- time="2020-12-02T01:05:45Z" level=debug msg="search for IAM users" time="2020-12-02T01:05:48Z" level=debug msg="search for IAM instance profiles" Anyhow, letting a few of those slip through the cracks isn't a problem, because we will still have logs from the first 25 minutes of the deprovision logs in the job itself, and any remaining resources will eventually be reaped by [1] and other backstops. Picking a number that's less than 30m gives us a change to wrap things up within the termination grace period granted to Prow jobs. [1]: https://github.com/openshift/ci-tools/blob/9dd6f87a5cd070f9e81af7a6cb181bb466f47933/cmd/ipi-deprovision/ipi-deprovision.sh
538bec7 to
33d8d63
Compare
|
I've bumped the timeout to 25m with c6de8ef8ea -> 33d8d63. Notes in the new commit message in an attempt to motivate the choice of 25m (it's messy). |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@wking: Updated the following 2 configmaps:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Like 3c915e2 (#12647), but for the deprovision step. This helps hold time to gather assets, if ipi-deprovision gets hung up, like it did for over 2h here (rhbz#1901737):