-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Bug 1942164: Increase OVN upgrade timeout to 90m from 75m #26202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1942164: Increase OVN upgrade timeout to 90m from 75m #26202
Conversation
|
/retitle Bug 1942164: Increase upgrade timeout to 90m from 75m |
|
@jluhrsen: This pull request references Bugzilla bug 1942164, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
#26219 would extend soft limit on upgrade time. Lets rework this PR to do the same for OVN (probably extending OVN on AWS configuration even more) |
|
I think we want this to only bump it for OVN, like we did only for AWS: #26219 |
6f44b29 to
20fb10a
Compare
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/bugzilla refresh |
|
@jluhrsen: This pull request references Bugzilla bug 1942164, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
The OVN upgrade jobs are expected to take longer than OpenShiftSDN. There is more context to this here: https://bugzilla.redhat.com/show_bug.cgi?id=1942164 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
20fb10a to
5ec3836
Compare
|
/retest |
1 similar comment
|
/retest |
|
/bugzilla refresh |
|
@jluhrsen: This pull request references Bugzilla bug 1942164, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| durationToSoftFailure = (baseDurationToSoftFailure + 30) * time.Minute | ||
| } else { | ||
| // if the cluster is on AWS we've already bumped the timeout enough, but if not we need to check if | ||
| // the CNI is OVN and increase our timeout for that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OVN and provider-LB delay sources seem orthogonal. If that's right, can we use:
if infra.Status.PlatformStatus.Type == configv1.AWSPlatformType {
// due to https://bugzilla.redhat.com/show_bug.cgi?id=1943804 upgrades take ~12 extra minutes on AWS
// and see commit d69db34a816f3ce8a9ab567621d145c5cd2d257f which notes that some AWS upgrades can
// take an undiagnosed ~15m beyond that.
durationToSoftFailure += 30 * time.Minute
}
network, err := c.ConfigV1().Networks().Get(context.Background(), "cluster", metav1.GetOptions{})
framework.ExpectNoError(err)
if network.Status.NetworkType == "OVNKubernetes" {
// deploying with OVN is expected to take longer. on average, ~15m longer
// some extra context to this increase which links to a jira showing which operators take longer:
// compared to OpenShiftSDN:
// https://bugzilla.redhat.com/show_bug.cgi?id=1942164
durationToSoftFailure += 15 * time.Minute
}or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if it's OVN + AWS we will end up with only +15 when we want +30. I can do the platform check second so that even if it's OVN it will still end up with +30, but that was sort of why I did it this way to start with so I could comment the code about
the AWS increase being good enough for OVN anyway. Let me know what's best though. I'd like to get this in so we can start making small progress to improving that ovn-upgrade job
|
OVN upgrades are sitting at 0% mostly because of this, changes look reasonable to me /assign @bparees @smarterclayton |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, jluhrsen, stbenjam The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
4 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
the upgrade job will never pass |
|
@jluhrsen: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@jluhrsen: All pull requests linked via external trackers have merged: Bugzilla bug 1942164 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
|
this didn't work like I thought. this new PR should clean it up: |
The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
The OVN upgrade jobs almost always take longer than 75m
so this fails. There is more context to this here:
https://bugzilla.redhat.com/show_bug.cgi?id=1942164
Signed-off-by: Jamo Luhrsen jluhrsen@gmail.com