From 33d8d63261f4834865099ab7f78f0007257bd25c Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Wed, 25 Nov 2020 15:40:45 -0800 Subject: [PATCH] ci-operator/step-registry/ipi/deprovision: 25m active_deadline_seconds Like 3c915e2f0c (ci-operator/step-registry/openshift/e2e/test: Add 2h active_deadline_seconds, 2020-10-09, #12647), but for the deprovision step. This helps hold time to gather assets, if ipi-deprovision gets hung up, like it did for over 2h here [1]: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-api-actuator-pkg/190/pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-aws-operator/1329702034437312512/build-log.txt | grep -A2 'Executing pod.*ipi- deprovision' 2020/11/20 09:54:54 Executing pod "e2e-aws-operator-ipi-deprovision-deprovision" 2020/11/20 09:54:57 Container cp-secret-wrapper in pod e2e-aws-operator-ipi-deprovision-deprovision completed successfully {"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2020-11-20T12:24:09Z"} 25 minutes should be enough for most teardown. We have a number of successful runs which take longer: $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=build-log&context=0&search=Pod+.*ipi-deprovision-deprovision+succeeded+after' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's/.*after \([0-9]*\)\([smh]\).*/\2 \1/' | sort -n | uniq -c 29 h 1 1 h 2 16 m 1 21 m 10 36 m 11 33 m 12 47 m 13 40 m 14 23 m 15 21 m 16 22 m 17 20 m 18 17 m 19 18 m 2 11 m 20 13 m 21 19 m 22 22 m 23 8 m 24 17 m 25 12 m 26 8 m 27 14 m 28 9 m 29 16 m 3 15 m 30 17 m 31 ... Success after over an hour is crazy. Which jobs were those? $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=build-log&context=0&search=Pod+.*ipi-deprovision-deprovision+succeeded+after+[0-9]h' | jq -r 'to_entries[] | .key as $uri | .value | to_entries[].value[].co ntext[] | $uri + " " + .' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-kube-reporting-metering-operator-release-4.5-metering-periodic-aws/1333923915013033984 2020/12/02 01:27:54 Pod metering-periodic-aws-ipi-deprovision-deprovision succeeded after 1h12m0s https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-odo-master-v4.6-integration-e2e-periodic/1333923958377943040 2020/12/02 01:28:12 Pod integration-e2e-periodic-ipi-deprovision-deprovision succeeded after 1h5m41s https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.6-e2e-aws-proxy/1333804827905363968 2020/12/01 19:44:10 Pod e2e-aws-proxy-ipi-deprovision-deprovision succeeded after 1h4m2s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/che-incubator_chectl/992/pull-ci-che-incubator-chectl-master-v5-chectl-e2e-operator-installer/1333897643826352128 2020/12/02 01:02:07 Pod chectl-e2e-operator-installer-ipi-deprovision-deprovision succeeded after 1h41m5s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/cri-o_cri-o/4377/pull-ci-cri-o-cri-o-master-e2e-agnostic/1333901999330037760 2020/12/02 00:10:14 Pod e2e-agnostic-ipi-deprovision-deprovision succeeded after 1h8m0s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/cri-o_cri-o/4394/pull-ci-cri-o-cri-o-master-e2e-agnostic/1333866701950816256 2020/12/01 23:03:06 Pod e2e-agnostic-ipi-deprovision-deprovision succeeded after 1h15m3s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift-psap_special-resource-operator/41/pull-ci-openshift-psap-special-resource-operator-master-e2e-aws/1333903530364243968 2020/12/02 00:16:52 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h12m8s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-logging-operator/821/pull-ci-openshift-cluster-logging-operator-master-functional/1333919264872075264 2020/12/02 01:00:51 Pod functional-ipi-deprovision-deprovision succeeded after 1h3m35s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/852/pull-ci-openshift-cluster-network-operator-master-e2e-ovn-hybrid-step-registry/1333766191600111616 2020/12/01 16:15:42 Pod e2e-ovn-hybrid-step-registry-ipi-deprovision-deprovision succeeded after 1h4m35s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/886/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-windows-custom-vxlan/1333813238646706176 2020/12/01 18:01:39 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h8m32s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/891/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-windows-custom-vxlan/1333907617315033088 2020/12/02 01:32:28 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h23m3s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_console-operator/491/pull-ci-openshift-console-operator-master-e2e-aws-operator/1333879999068901376 2020/12/02 00:11:37 Pod e2e-aws-operator-ipi-deprovision-deprovision succeeded after 1h7m16s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4429/pull-ci-openshift-installer-master-e2e-aws-workers-rhel7/1333912709506273280 2020/12/02 01:34:18 Pod e2e-aws-workers-rhel7-ipi-deprovision-deprovision succeeded after 1h12m35s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4440/pull-ci-openshift-installer-master-e2e-aws-shared-vpc/1333864843601514496 2020/12/01 23:32:40 Pod e2e-aws-shared-vpc-ipi-deprovision-deprovision succeeded after 1h2m49s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes-autoscaler/180/pull-ci-openshift-kubernetes-autoscaler-master-e2e-aws-operator/1333874595857436672 2020/12/01 23:02:01 Pod e2e-aws-operator-ipi-deprovision-deprovision succeeded after 1h19m11s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-api-operator/769/pull-ci-openshift-machine-api-operator-master-e2e-aws/1333776539803717632 2020/12/01 16:58:56 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h15m39s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2210/pull-ci-openshift-machine-config-operator-master-e2e-aws-serial/1333867792788623360 2020/12/02 00:09:53 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h19m58s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2210/pull-ci-openshift-machine-config-operator-master-okd-e2e-aws/1333839577021943808 2020/12/01 21:59:09 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h2m1s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2272/pull-ci-openshift-machine-config-operator-master-e2e-aws-serial/1333899974395564032 2020/12/02 01:07:48 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h5m13s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/657/pull-ci-openshift-oc-master-e2e-aws/1333850678639988736 2020/12/01 21:43:41 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h6m6s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/657/pull-ci-openshift-oc-master-e2e-aws/1333910781766406144 2020/12/02 00:49:46 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h17m17s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/660/pull-ci-openshift-oc-master-e2e-aws-serial/1333891301321478144 2020/12/01 23:33:01 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h3m49s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/660/pull-ci-openshift-oc-master-e2e-aws-serial/1333919666019504128 2020/12/02 01:01:44 Pod e2e-aws-serial-ipi-deprovision-deprovision succeeded after 1h2m25s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/365/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-windows-custom-vxlan/1333781276464779264 2020/12/01 17:09:03 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h31m37s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/366/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-windows-custom-vxlan/1333877629530411008 2020/12/02 00:02:31 Pod e2e-aws-ovn-windows-custom-vxlan-ipi-deprovision-deprovision succeeded after 1h7m45s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_prometheus/67/pull-ci-openshift-prometheus-master-e2e-aws/1333873401026056192 2020/12/02 00:45:30 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h23m0s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/13998/rehearse-13998-pull-ci-openshift-windows-machine-config-operator-release-4.7-aws-e2e-operator/1333906001283256320 2020/12/02 01:22:07 Pod aws-e2e-operator-ipi-deprovision-deprovision succeeded after 1h1m36s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sriov-network-operator/416/pull-ci-openshift-sriov-network-operator-master-e2e-aws/1333878269035941888 2020/12/02 00:29:39 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h51m18s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sriov-network-operator/425/pull-ci-openshift-sriov-network-operator-master-e2e-aws/1333762244420308992 2020/12/01 16:07:41 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 1h22m15s https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496 2020/12/02 01:06:12 Pod e2e-aws-ipi-deprovision-deprovision succeeded after 2h15m28s Checking that slowest job: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496/artifacts/e2e-aws/ipi-deprovision-deprovision/.openshift_install.log 2>/dev/null | head -n1 time="2020-12-01T22:51:17Z" level=debug msg="OpenShift Installer unreleased-master-3988-g47b4c1a046ee43bc500ff19c98a231a754cc2a70-dirty" $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496/artifacts/e2e-aws/ipi-deprovision-deprovision/.openshift_install.log | tail -n1 time="2020-12-02T01:06:11Z" level=info msg="Time elapsed: 2h14m54s" So the delay is legitimate. But it's due to a hiccup in IAM-user listing (AWS-side bug? Throttling?): $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_image-registry/257/pull-ci-openshift-image-registry-master-e2e-aws/1333873914832490496/artifacts/e2e-aws/ipi-deprovision-deprovision/.openshift_install.log | grep -A1 'search for IAM users' time="2020-12-01T22:53:53Z" level=debug msg="search for IAM users" time="2020-12-02T00:08:08Z" level=debug msg="search for IAM instance profiles" -- time="2020-12-02T00:38:54Z" level=debug msg="search for IAM users" time="2020-12-02T01:04:14Z" level=debug msg="search for IAM instance profiles" -- time="2020-12-02T01:05:27Z" level=debug msg="search for IAM users" time="2020-12-02T01:05:42Z" level=debug msg="search for IAM instance profiles" -- time="2020-12-02T01:05:45Z" level=debug msg="search for IAM users" time="2020-12-02T01:05:48Z" level=debug msg="search for IAM instance profiles" Anyhow, letting a few of those slip through the cracks isn't a problem, because we will still have logs from the first 25 minutes of the deprovision logs in the job itself, and any remaining resources will eventually be reaped by [1] and other backstops. Picking a number that's less than 30m gives us a change to wrap things up within the termination grace period granted to Prow jobs. [1]: https://github.com/openshift/ci-tools/blob/9dd6f87a5cd070f9e81af7a6cb181bb466f47933/cmd/ipi-deprovision/ipi-deprovision.sh --- .../deprovision/deprovision/ipi-deprovision-deprovision-ref.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/ci-operator/step-registry/ipi/deprovision/deprovision/ipi-deprovision-deprovision-ref.yaml b/ci-operator/step-registry/ipi/deprovision/deprovision/ipi-deprovision-deprovision-ref.yaml index 5df3d6716c066..13414bca181a3 100644 --- a/ci-operator/step-registry/ipi/deprovision/deprovision/ipi-deprovision-deprovision-ref.yaml +++ b/ci-operator/step-registry/ipi/deprovision/deprovision/ipi-deprovision-deprovision-ref.yaml @@ -2,6 +2,7 @@ ref: as: ipi-deprovision-deprovision from: installer commands: ipi-deprovision-deprovision-commands.sh + active_deadline_seconds: 1500 resources: requests: cpu: 1000m