Skip to content

Conversation

@jluhrsen
Copy link
Contributor

@jluhrsen jluhrsen commented Jun 4, 2021

The OVN upgrade jobs almost always take longer than 75m
so this fails. There is more context to this here:
https://bugzilla.redhat.com/show_bug.cgi?id=1942164

Signed-off-by: Jamo Luhrsen jluhrsen@gmail.com

@jluhrsen
Copy link
Contributor Author

jluhrsen commented Jun 4, 2021

/retitle Bug 1942164: Increase upgrade timeout to 90m from 75m

@openshift-ci openshift-ci bot changed the title Increase upgrade timeout to 90m from 75m Bug 1942164: Increase upgrade timeout to 90m from 75m Jun 4, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 4, 2021

@jluhrsen: This pull request references Bugzilla bug 1942164, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1942164: Increase upgrade timeout to 90m from 75m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 4, 2021
@openshift-ci openshift-ci bot requested review from mfojtik and soltysh June 4, 2021 18:35
@vrutkovs
Copy link
Contributor

#26219 would extend soft limit on upgrade time. Lets rework this PR to do the same for OVN (probably extending OVN on AWS configuration even more)

@stbenjam
Copy link
Member

I think we want this to only bump it for OVN, like we did only for AWS: #26219

@jluhrsen jluhrsen force-pushed the longer-upgrade-timeout branch from 6f44b29 to 20fb10a Compare June 24, 2021 18:22
@jluhrsen
Copy link
Contributor Author

@vrutkovs , @stbenjam , please see the latest version of this effort. basically, if it's AWS it should end up at 105m and if it's not AWS then 75 for OpenShiftSDN and 90 for OVN.

@jluhrsen
Copy link
Contributor Author

/retest

2 similar comments
@jluhrsen
Copy link
Contributor Author

/retest

@jluhrsen
Copy link
Contributor Author

/retest

@jluhrsen
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. and removed bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. labels Jun 29, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 29, 2021

@jluhrsen: This pull request references Bugzilla bug 1942164, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jluhrsen
Copy link
Contributor Author

/retest

The OVN upgrade jobs are expected to take longer than
OpenShiftSDN.

There is more context to this here:
  https://bugzilla.redhat.com/show_bug.cgi?id=1942164

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@jluhrsen jluhrsen force-pushed the longer-upgrade-timeout branch from 20fb10a to 5ec3836 Compare June 29, 2021 20:37
@jluhrsen
Copy link
Contributor Author

/retest

1 similar comment
@jluhrsen
Copy link
Contributor Author

/retest

@jluhrsen
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 30, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 30, 2021

@jluhrsen: This pull request references Bugzilla bug 1942164, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @wangke19

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from wangke19 June 30, 2021 18:07
durationToSoftFailure = (baseDurationToSoftFailure + 30) * time.Minute
} else {
// if the cluster is on AWS we've already bumped the timeout enough, but if not we need to check if
// the CNI is OVN and increase our timeout for that
Copy link
Member

@wking wking Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVN and provider-LB delay sources seem orthogonal. If that's right, can we use:

if infra.Status.PlatformStatus.Type == configv1.AWSPlatformType {
  // due to https://bugzilla.redhat.com/show_bug.cgi?id=1943804 upgrades take ~12 extra minutes on AWS
  // and see commit d69db34a816f3ce8a9ab567621d145c5cd2d257f which notes that some AWS upgrades can
  // take an undiagnosed ~15m beyond that.
  durationToSoftFailure += 30 * time.Minute
}
network, err := c.ConfigV1().Networks().Get(context.Background(), "cluster", metav1.GetOptions{})
framework.ExpectNoError(err)
if network.Status.NetworkType == "OVNKubernetes" {
  // deploying with OVN is expected to take longer. on average, ~15m longer
  // some extra context to this increase which links to a jira showing which operators take longer:
  // compared to OpenShiftSDN:
  //   https://bugzilla.redhat.com/show_bug.cgi?id=1942164
  durationToSoftFailure += 15 * time.Minute
}

or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if it's OVN + AWS we will end up with only +15 when we want +30. I can do the platform check second so that even if it's OVN it will still end up with +30, but that was sort of why I did it this way to start with so I could comment the code about
the AWS increase being good enough for OVN anyway. Let me know what's best though. I'd like to get this in so we can start making small progress to improving that ovn-upgrade job

@stbenjam
Copy link
Member

OVN upgrades are sitting at 0% mostly because of this, changes look reasonable to me

/assign @bparees @smarterclayton
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2021
@bparees bparees changed the title Bug 1942164: Increase upgrade timeout to 90m from 75m Bug 1942164: Increase OVN upgrade timeout to 90m from 75m Jul 12, 2021
@bparees
Copy link
Contributor

bparees commented Jul 12, 2021

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, jluhrsen, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2021
@vrutkovs
Copy link
Contributor

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@jluhrsen
Copy link
Contributor Author

the upgrade job will never pass

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 13, 2021

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-disruptive 6f44b29 link /test e2e-gcp-disruptive
ci/prow/e2e-aws-disruptive 6f44b29 link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 4439d90 into openshift:master Jul 13, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 13, 2021

@jluhrsen: All pull requests linked via external trackers have merged:

Bugzilla bug 1942164 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1942164: Increase OVN upgrade timeout to 90m from 75m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jluhrsen added a commit to jluhrsen/origin that referenced this pull request Jul 13, 2021
The original PR openshift#26202 use
parens in the wrong place and the actual time calcs were happening
wrong and garbage values were being used, like this job [0] where
the test output looked like:
  : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes

This should fix that.

[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@jluhrsen
Copy link
Contributor Author

this didn't work like I thought. this new PR should clean it up:
#26324

wking pushed a commit to wking/origin that referenced this pull request Nov 30, 2021
The original PR openshift#26202 use
parens in the wrong place and the actual time calcs were happening
wrong and garbage values were being used, like this job [0] where
the test output looked like:
  : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes

This should fix that.

[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request Mar 2, 2022
The original PR openshift#26202 use
parens in the wrong place and the actual time calcs were happening
wrong and garbage values were being used, like this job [0] where
the test output looked like:
  : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes

This should fix that.

[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request Mar 3, 2022
The original PR openshift#26202 use
parens in the wrong place and the actual time calcs were happening
wrong and garbage values were being used, like this job [0] where
the test output looked like:
  : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes

This should fix that.

[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request Mar 4, 2022
The original PR openshift#26202 use
parens in the wrong place and the actual time calcs were happening
wrong and garbage values were being used, like this job [0] where
the test output looked like:
  : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes

This should fix that.

[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants