test: increase e2e test run with 15 minutes by sinnykumari · Pull Request #2184 · openshift/machine-config-operator

sinnykumari · 2020-10-27T14:16:15Z

fixes e2e-gcp-op test failing in ci due to timeout

openshift-ci-robot · 2020-10-27T14:16:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sinnykumari · 2020-10-27T14:23:57Z

By analyzing some of gcp-op test run logs, it seems system reboot time has increased by around 30 seconds.

With increased reboot time ~90 seconds
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2182/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1320812936498778112/artifacts/e2e-gcp-op/pods/openshift-machine-config-operator_machine-config-daemon-5c86g_machine-config-daemon.log

I1026 20:31:17.810600    1972 update.go:1590] initiating reboot: Node will reboot into config rendered-worker-ddf64801127323a695f308d91109d951
I1026 20:31:17.893443    1972 daemon.go:641] Shutting down MachineConfigDaemon
I1026 20:32:50.940436    2088 start.go:108] Version: machine-config-daemon-4.6.0-202006240615.p0-370-g999521b6-dirty (999521b61c81577b156331b7bf8495347a8503c1)
I1026 20:32:50.949591    2088 start.go:121] Calling chroot("/rootfs")
I1026 20:32:50.949813    2088 rpm-ostree.go:261] Running captured: rpm-ostree status --json
I1026 20:32:51.457544    2088 daemon.go:226] Booted osImageURL: registry.build01.ci.openshift.org/ci-op-y4rk2lpr/stable@sha256:ce348cfb50d39297969c9a0c2f928d23eb2ab8ded7cacd5e39685bd0931bbfac (47.82.202010261347-0)

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2181/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1321047711511744512/artifacts/e2e-gcp-op/pods/openshift-machine-config-operator_machine-config-daemon-h5bd9_machine-config-daemon.log

I1027 12:09:08.575381    1848 update.go:1607] initiating reboot: Node will reboot into config rendered-worker-4ad27ca30531f77416a00e38bda8c8e6
I1027 12:09:08.668785    1848 daemon.go:641] Shutting down MachineConfigDaemon
I1027 12:10:44.764737    2142 start.go:108] Version: machine-config-daemon-4.6.0-202006240615.p0-370-gf72a5ace-dirty (f72a5ace0b5432ad96bba59b2b91633d8bb8315c)
I1027 12:10:44.772268    2142 start.go:121] Calling chroot("/rootfs")
I1027 12:10:44.772468    2142 rpm-ostree.go:261] Running captured: rpm-ostree status --json
I1027 12:10:45.343467    2142 daemon.go:226] Booted osImageURL: registry.build01.ci.openshift.org/ci-op-gdsvbywh/stable@sha256:cd6d0d6c4cbaa7ceaca75e7a00fad2ced6344bc89c38cadfa121b24209038e2f (47.82.202010270142-0)

Previous reboot time ~60 seconds

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2177/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1319724839077941248/artifacts/e2e-gcp-op/pods/openshift-machine-config-operator_machine-config-daemon-wmvdr_machine-config-daemon.log

I1023 21:06:25.466932    1842 update.go:1590] initiating reboot: Node will reboot into config rendered-infra-54a8f08a03796f8c94187a986edcbc7a
I1023 21:06:25.594958    1842 daemon.go:641] Shutting down MachineConfigDaemon
I1023 21:07:30.693671    1850 start.go:108] Version: machine-config-daemon-4.6.0-202006240615.p0-370-g56ded555-dirty (56ded5550c030d88cdffa8de630ff1a1287303f3)
I1023 21:07:30.700093    1850 start.go:121] Calling chroot("/rootfs")
I1023 21:07:30.700507    1850 rpm-ostree.go:261] Running captured: rpm-ostree status --json
I1023 21:07:31.107832    1850 daemon.go:226] Booted osImageURL: registry.build01.ci.openshift.org/ci-op-vcw3wmdq/stable@sha256:113a36da35d4aff5f8ef43ff97bed0e97cdaea2139c8db44fbac31051bec43c8 (47.82.202010231442-0)

sinnykumari · 2020-10-27T14:25:35Z

I am not sure why the time has increased, we can investigate that later. Until then let's increase the timeout so that we unblock PRs.

kikisdeliveryservice · 2020-10-27T16:33:15Z

I think we should figure out the underlying problem and not extend the test time. Since reboot time per node is 50% more

kikisdeliveryservice · 2020-10-27T16:50:36Z

(As per slack, we're doing some investigation on this before deciding how to resolve)

sinnykumari · 2020-10-27T17:56:05Z

/hold

cgwalters · 2020-11-04T21:39:54Z

Sadly we never summarized this bug anywhere public on the MCO repo AFAICS.

openshift/cluster-network-operator#859 merged which should help - let's try to verify that.
The other case is https://bugzilla.redhat.com/show_bug.cgi?id=1893360 - it needs some design work.

cgwalters · 2020-11-06T19:05:15Z

OK another status update on this; I kept being confused how it wasn't helping but it turns out CI (and nightly) payload generation has been broken until just now, so our CI runs were still using an old cluster-network-operator.

Let's keep an eye out now to see if openshift/cluster-network-operator#859 actually improves things!

fixes e2e-gcp-op test failing in ci due to timeout

sinnykumari · 2020-11-10T10:28:56Z

nightly images are green now, openshift/cluster-network-operator#859 should be included in recent payload in ci run now. For sanity check, re triggered test here.

cgwalters · 2020-11-16T18:43:51Z

We should probably consider this to start clearing out the PR backlog.
/retest

kikisdeliveryservice · 2020-11-16T19:10:20Z

We should probably consider this to start clearing out the PR backlog.

e2e-aws and e2e-aws-serial are also both currently broken and being worked on. we shouldnt be overriding all of those (3+) tests just to get prs in imo.

for gcp-op: openshift/cluster-dns-operator#213 (comment) (and #2229) need to be merged but can't bc of the above e2e-aws issues

so it seems like first the e2e-aws tests needs to get fixed bc that's blocking the dns pr which once that merges unblocks our ci.

cgwalters · 2020-11-16T19:19:35Z

It's all interlinked though. We now have so many PRs outstanding that we're waiting on AWS "leases" in some, e.g.
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/213/pull-ci-openshift-cluster-dns-operator-master-e2e-aws/1328393945482268672
is waiting - it won't even start until we clear the queue some.

For example we could combine this PR with #2229
and merge when e2e-gcp-op goes green, even if e2e-aws isn't (changes in our tests clearly can't affect that).

Basically I think we should try to do something other than be blocked.

kikisdeliveryservice · 2020-11-16T19:36:07Z

Right but a bigger problem mentioned in slack is that ci and nightly payload acceptance is also broken due to this...

I'm going to go and try to create more urgency on the aws bc we shouldnt be merging in these condiation but it should also be more important than it is rn.

I don't think overriding required tests across the board is the right choice. we can land 2229 but we will still be blocked on other required tests. if there are maybe problems with payloads and across ocp.... aws really needs to get fixed bc green means nothing. ☹️

openshift-merge-robot · 2020-11-16T23:15:51Z

@sinnykumari: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-gcp-op	`dcf26d4`	link	`/test e2e-gcp-op`
ci/prow/e2e-ovn-step-registry	`dcf26d4`	link	`/test e2e-ovn-step-registry`
ci/prow/e2e-agnostic-upgrade	`dcf26d4`	link	`/test e2e-agnostic-upgrade`
ci/prow/e2e-aws-serial	`dcf26d4`	link	`/test e2e-aws-serial`
ci/prow/e2e-aws	`dcf26d4`	link	`/test e2e-aws`
ci/prow/okd-e2e-aws	`dcf26d4`	link	`/test okd-e2e-aws`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kikisdeliveryservice · 2020-11-17T00:19:12Z

As an update: this comes down to https://bugzilla.redhat.com/show_bug.cgi?id=1897604 (which is also blocking the dns fix: openshift/cluster-dns-operator#213) and blocking all merges across ocp. There's a new channel (incident-kcm..) now where people having started working on it.

Instead of trying to hack around and merge this, which won't help bc other required tests are blocking on all repos incl this one, we are waiting for the above bz to be resolved.

sinnykumari · 2020-11-19T10:52:38Z

Closing this PR since actual slowness issue has been fixed with openshift/cluster-dns-operator#213
/close

openshift-ci-robot · 2020-11-19T10:52:55Z

@sinnykumari: Closed this PR.

Details

In response to this:

Closing this PR since actual slowness issue has been fixed with openshift/cluster-dns-operator#213
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from kikisdeliveryservice and runcom October 27, 2020 14:16

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 27, 2020

sinnykumari mentioned this pull request Oct 27, 2020

Bug 1890074: daemon: allow one to one mapping of extension on OKD #2181

Merged

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 27, 2020

test: increase e2e test run with 15 minutes

dcf26d4

fixes e2e-gcp-op test failing in ci due to timeout

sinnykumari force-pushed the gcp-op-timeout branch from 743fe9b to dcf26d4 Compare November 10, 2020 10:26

sinnykumari mentioned this pull request Nov 17, 2020

Bug 1897361: ctrcfg_test: Wait for our prior target config #2229

Merged

kikisdeliveryservice added the team-mco label Nov 18, 2020

openshift-ci-robot closed this Nov 19, 2020

Conversation

sinnykumari commented Oct 27, 2020

Uh oh!

openshift-ci-robot commented Oct 27, 2020

Uh oh!

sinnykumari commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sinnykumari commented Oct 27, 2020

Uh oh!

kikisdeliveryservice commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kikisdeliveryservice commented Oct 27, 2020

Uh oh!

sinnykumari commented Oct 27, 2020

Uh oh!

cgwalters commented Nov 4, 2020

Uh oh!

cgwalters commented Nov 6, 2020

Uh oh!

sinnykumari commented Nov 10, 2020

Uh oh!

cgwalters commented Nov 16, 2020

Uh oh!

kikisdeliveryservice commented Nov 16, 2020

Uh oh!

cgwalters commented Nov 16, 2020

Uh oh!

kikisdeliveryservice commented Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-merge-robot commented Nov 16, 2020

Uh oh!

kikisdeliveryservice commented Nov 17, 2020

Uh oh!

sinnykumari commented Nov 19, 2020

Uh oh!

openshift-ci-robot commented Nov 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sinnykumari commented Oct 27, 2020 •

edited

Loading

kikisdeliveryservice commented Oct 27, 2020 •

edited

Loading

kikisdeliveryservice commented Nov 16, 2020 •

edited

Loading