Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Feb 18, 2022

At least since we split to partitioned CI jobs our test runtimes per job are typically at 30-40 min. 180 is way too large. I've run into this at #5820 #5824

Edit: I linked the wrong PR, here is the test failure I was linking where the test suite teardown blocked the job for 3h https://github.com/dask/distributed/runs/5248080147?check_suite_focus=true

@fjetter fjetter requested a review from crusaderky February 18, 2022 15:05
@github-actions
Copy link
Contributor

github-actions bot commented Feb 18, 2022

Unit Test Results

       11 files  ±  0         11 suites  ±0   6h 23m 5s ⏱️ - 11m 43s
  2 607 tests ±  0    2 525 ✔️  -   1    80 💤  -   1  1 +1  1 🔥 +1 
14 173 runs   - 18  13 267 ✔️ +37  903 💤  - 58  2 +2  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit 4e8a54e. ± Comparison against base commit d2d76c0.

♻️ This comment has been updated with latest results.

@crusaderky
Copy link
Collaborator

crusaderky commented Feb 19, 2022

Windows is currently taking 56 minutes. A 50% margin on top of that feels a bit too thin for me, and we did experience in the past cases where the CI would take substantially longer.

Note that this will not help with a stuck test - there's pytest-timeout after 5 minutes for that. It only changes what is being done when the whole CI host is very slow,

from: many tests randomly failing because of previously unexplored timeouts
to: the whole suite going K.O. without a junit report

I'm a bit conflicted about which of the two behaviours is more desirable.

@crusaderky
Copy link
Collaborator

crusaderky commented Feb 20, 2022

To back up my previous post, today CI is being sluggish across all OSs:
https://github.com/dask/distributed/runs/5264425081?check_suite_focus=true

[EDIT] NVM, I had accidentally nuked the ci1/not ci1 separation

@fjetter
Copy link
Member Author

fjetter commented Feb 21, 2022

I xrefed the wrong PR. I ran into a job timeout over here https://github.com/dask/distributed/runs/5248080147?check_suite_focus=true

I typically see windows builds finish in 40-50 min. I'm fine to keep the threshold as is. I figured we introduced this number back when we didn't partition the tests

@crusaderky
Copy link
Collaborator

I think it's reasonable to reduce it to from 3 to 2 hours. A 100% buffer on top of the typical runtime sounds ok and if we exceed it a wealth of test will get random timeouts anyway.

@crusaderky crusaderky merged commit 43dfb61 into dask:main Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants