Bug 1942164: Increase OVN upgrade timeout to 90m from 75m #26202

jluhrsen · 2021-06-04T18:32:28Z

The OVN upgrade jobs almost always take longer than 75m
so this fails. There is more context to this here:
https://bugzilla.redhat.com/show_bug.cgi?id=1942164

Signed-off-by: Jamo Luhrsen jluhrsen@gmail.com

jluhrsen · 2021-06-04T18:32:59Z

/retitle Bug 1942164: Increase upgrade timeout to 90m from 75m

openshift-ci · 2021-06-04T18:33:12Z

@jluhrsen: This pull request references Bugzilla bug 1942164, which is invalid:

expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1942164: Increase upgrade timeout to 90m from 75m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vrutkovs · 2021-06-11T07:12:30Z

#26219 would extend soft limit on upgrade time. Lets rework this PR to do the same for OVN (probably extending OVN on AWS configuration even more)

stbenjam · 2021-06-17T13:52:36Z

I think we want this to only bump it for OVN, like we did only for AWS: #26219

jluhrsen · 2021-06-24T18:24:01Z

@vrutkovs , @stbenjam , please see the latest version of this effort. basically, if it's AWS it should end up at 105m and if it's not AWS then 75 for OpenShiftSDN and 90 for OVN.

jluhrsen · 2021-06-24T20:47:02Z

/retest

jluhrsen · 2021-06-28T16:52:05Z

/retest

jluhrsen · 2021-06-28T19:50:27Z

/retest

jluhrsen · 2021-06-29T16:52:08Z

/bugzilla refresh

openshift-ci · 2021-06-29T16:52:12Z

@jluhrsen: This pull request references Bugzilla bug 1942164, which is invalid:

expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jluhrsen · 2021-06-29T16:52:15Z

/retest

The OVN upgrade jobs are expected to take longer than OpenShiftSDN. There is more context to this here: https://bugzilla.redhat.com/show_bug.cgi?id=1942164 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>

jluhrsen · 2021-06-30T06:11:13Z

/retest

jluhrsen · 2021-06-30T15:18:35Z

/retest

jluhrsen · 2021-06-30T18:06:58Z

/bugzilla refresh

openshift-ci · 2021-06-30T18:07:05Z

@jluhrsen: This pull request references Bugzilla bug 1942164, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.9.0) matches configured target release for branch (4.9.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @wangke19

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2021-06-30T18:19:48Z

test/e2e/upgrade/upgrade.go

+		durationToSoftFailure = (baseDurationToSoftFailure + 30) * time.Minute
+	} else {
+		// if the cluster is on AWS we've already bumped the timeout enough, but if not we need to check if
+		// the CNI is OVN and increase our timeout for that


OVN and provider-LB delay sources seem orthogonal. If that's right, can we use:

if infra.Status.PlatformStatus.Type == configv1.AWSPlatformType { // due to https://bugzilla.redhat.com/show_bug.cgi?id=1943804 upgrades take ~12 extra minutes on AWS // and see commit d69db34a816f3ce8a9ab567621d145c5cd2d257f which notes that some AWS upgrades can // take an undiagnosed ~15m beyond that. durationToSoftFailure += 30 * time.Minute } network, err := c.ConfigV1().Networks().Get(context.Background(), "cluster", metav1.GetOptions{}) framework.ExpectNoError(err) if network.Status.NetworkType == "OVNKubernetes" { // deploying with OVN is expected to take longer. on average, ~15m longer // some extra context to this increase which links to a jira showing which operators take longer: // compared to OpenShiftSDN: // https://bugzilla.redhat.com/show_bug.cgi?id=1942164 durationToSoftFailure += 15 * time.Minute }

or similar?

but if it's OVN + AWS we will end up with only +15 when we want +30. I can do the platform check second so that even if it's OVN it will still end up with +30, but that was sort of why I did it this way to start with so I could comment the code about
the AWS increase being good enough for OVN anyway. Let me know what's best though. I'd like to get this in so we can start making small progress to improving that ovn-upgrade job

stbenjam · 2021-07-12T14:55:26Z

OVN upgrades are sitting at 0% mostly because of this, changes look reasonable to me

/assign @bparees @smarterclayton
/lgtm

bparees · 2021-07-12T16:26:12Z

/approve

openshift-ci · 2021-07-12T16:26:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, jluhrsen, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/OWNERS~~ [bparees]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vrutkovs · 2021-07-12T20:51:18Z

/retest

openshift-bot · 2021-07-12T23:15:01Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-12T23:39:00Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-12T23:51:07Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-13T00:03:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-13T01:51:02Z

/retest

Please review the full test history for this PR and help us cut down flakes.

jluhrsen · 2021-07-13T03:22:31Z

the upgrade job will never pass

openshift-ci · 2021-07-13T03:56:01Z

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-gcp-disruptive	`6f44b29`	link	`/test e2e-gcp-disruptive`
ci/prow/e2e-aws-disruptive	`6f44b29`	link	`/test e2e-aws-disruptive`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-07-13T04:05:00Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-13T04:18:06Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-07-13T06:07:05Z

@jluhrsen: All pull requests linked via external trackers have merged:

openshift/origin#26202

Bugzilla bug 1942164 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1942164: Increase OVN upgrade timeout to 90m from 75m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>

jluhrsen · 2021-07-13T21:48:34Z

this didn't work like I thought. this new PR should clean it up:
#26324

The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>

openshift-ci bot changed the title ~~Increase upgrade timeout to 90m from 75m~~ Bug 1942164: Increase upgrade timeout to 90m from 75m Jun 4, 2021

openshift-ci bot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 4, 2021

openshift-ci bot requested review from mfojtik and soltysh June 4, 2021 18:35

wking mentioned this pull request Jun 10, 2021

Bug 1970975: upgrade test: expect upgrades to take longer on AWS #26219

Merged

jluhrsen force-pushed the longer-upgrade-timeout branch from 6f44b29 to 20fb10a Compare June 24, 2021 18:22

openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. and removed bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. labels Jun 29, 2021

Increase OVN upgrade timeout by 15m

5ec3836

The OVN upgrade jobs are expected to take longer than OpenShiftSDN. There is more context to this here: https://bugzilla.redhat.com/show_bug.cgi?id=1942164 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>

jluhrsen force-pushed the longer-upgrade-timeout branch from 20fb10a to 5ec3836 Compare June 29, 2021 20:37

openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 30, 2021

openshift-ci bot requested a review from wangke19 June 30, 2021 18:07

wking reviewed Jun 30, 2021

View reviewed changes

openshift-ci bot assigned bparees, smarterclayton and stbenjam Jul 12, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2021

bparees changed the title ~~Bug 1942164: Increase upgrade timeout to 90m from 75m~~ Bug 1942164: Increase OVN upgrade timeout to 90m from 75m Jul 12, 2021

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2021

openshift-merge-robot merged commit 4439d90 into openshift:master Jul 13, 2021

jluhrsen mentioned this pull request Jul 13, 2021

Bug 1942164: Fix time calc ordering for upgrades #26324

Merged

wking mentioned this pull request Nov 30, 2021

Bug 1929650: Increase OVN upgrade timeout by 15m #26654

Merged

DavidHurta mentioned this pull request Mar 3, 2022

Bug 1885322: Increase OVN upgrade timeout by 15m #26878

Closed

Bug 1942164: Increase OVN upgrade timeout to 90m from 75m #26202

Bug 1942164: Increase OVN upgrade timeout to 90m from 75m #26202

Uh oh!

Conversation

jluhrsen commented Jun 4, 2021

Uh oh!

jluhrsen commented Jun 4, 2021

Uh oh!

openshift-ci bot commented Jun 4, 2021

Uh oh!

vrutkovs commented Jun 11, 2021

Uh oh!

stbenjam commented Jun 17, 2021

Uh oh!

jluhrsen commented Jun 24, 2021

Uh oh!

jluhrsen commented Jun 24, 2021

Uh oh!

jluhrsen commented Jun 28, 2021

Uh oh!

jluhrsen commented Jun 28, 2021

Uh oh!

jluhrsen commented Jun 29, 2021

Uh oh!

openshift-ci bot commented Jun 29, 2021

Uh oh!

jluhrsen commented Jun 29, 2021

Uh oh!

jluhrsen commented Jun 30, 2021

Uh oh!

jluhrsen commented Jun 30, 2021

Uh oh!

jluhrsen commented Jun 30, 2021

Uh oh!

openshift-ci bot commented Jun 30, 2021

Uh oh!

wking Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jluhrsen Jun 30, 2021

Choose a reason for hiding this comment

Uh oh!

stbenjam commented Jul 12, 2021

Uh oh!

bparees commented Jul 12, 2021

Uh oh!

openshift-ci bot commented Jul 12, 2021

Uh oh!

vrutkovs commented Jul 12, 2021

Uh oh!

openshift-bot commented Jul 12, 2021

Uh oh!

openshift-bot commented Jul 12, 2021

Uh oh!

openshift-bot commented Jul 12, 2021

Uh oh!

openshift-bot commented Jul 13, 2021

Uh oh!

openshift-bot commented Jul 13, 2021

Uh oh!

jluhrsen commented Jul 13, 2021

Uh oh!

openshift-ci bot commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-bot commented Jul 13, 2021

Uh oh!

openshift-bot commented Jul 13, 2021

Uh oh!

openshift-ci bot commented Jul 13, 2021

Uh oh!

jluhrsen commented Jul 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

wking Jun 30, 2021 •

edited

Loading

openshift-ci bot commented Jul 13, 2021 •

edited

Loading