Bug 1834895: Bump waitForPoolComplete to 30 mins#1723
Bug 1834895: Bump waitForPoolComplete to 30 mins#1723yuqi-zhang wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
LOL I was just about to submit a PR for 25 😄 |
|
/retest Different errors |
|
/retest |
|
Looks like waiting for 30 mins for MCP to complete update on all nodes is not enough for all the tests we run. Two of the e2e-op tests failed due to timeout.
Not sure why this is happening all of a sudden. From the e2e-gcp-op test history this seems to have started around 2020-05-08 19:11:36 UTC. This doesn't seem like related to MCO since we have merged last PR in MCO master on 2020-05-06 . @miabbott @cgwalters Wondering if something changed on RHCOS side which could be related to this 🤔 |
|
Right, that sounds like we hit the 2h timeout, let me also try bumping that |
15b5236 to
050bdee
Compare
|
Looks like I spoke a bit early about time issue, after looking closely at controller log, it seems each node takes approaximately 8 mins, so, 30 mins should be enough? |
|
It seems that the timing it takes for the MCD to get ready post reboot went up from 30s to 5min example successful: example now: so its taking the informer 4min extra to get ready? |
|
@yuqi-zhang: This pull request references Bugzilla bug 1834895, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@yuqi-zhang: This pull request references Bugzilla bug 1834895, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
the underlying cause of the timeouts should be figured out and the fix will fix our ci... i dont think bumping the timeout is going to facilitate that... for now holding as i will look further.. /hold |
|
Well turns out bumping didn't do anything anyway and the test ran for 3.5 hours.. Looking the TestMCDeployed from this run with the extra time it is taking 3750.12s, but when I look at a passing run from recently (last Thursday) it takes ~1558.13s. In 4.4 ~1508.10 and 4.3, the same test took ~926.09? Something very recent is causing this to just take a ton of time and I'm not yet sure what it is.. Here's a passing run from a few days ago for comparison, showing the timing differences.. |
|
Opened 2 BZs for the differences I see between new runs that fail and older (a few days) runs that pass: |
Tests are timing out for e2e-gcp-op due to hitting this timeout, which takes just a little over 20 minutes at the moment. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
050bdee to
aeb0328
Compare
|
Just for curiosity purposes I bumped to 4h to see how long it actually takes the test to finish |
|
@yuqi-zhang: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Fixed in #1731 |
Tests are timing out for e2e-gcp-op due to hitting this timeout,
which takes just a little over 20 minutes at the moment.