Skip to content

Conversation

@hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Nov 3, 2022

This PR improves the helper functions used in a number of test_balance* tests by simplifying the logic. Inspired by #7250.

Might fix

cc @crusaderky

  • Tests added / passed
  • Passes pre-commit run --all-files

@hendrikmakait hendrikmakait changed the title Improve robustness of helper functions in test_steal Improve robustness of helper functions in test_steal.py Nov 3, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       24 files  ±0         24 suites  ±0   10h 12m 11s ⏱️ + 20m 57s
  3 319 tests ±0    3 211 ✔️ +1     105 💤  - 1  3 ±0 
39 132 runs  ±0  37 242 ✔️ +1  1 887 💤  - 1  3 ±0 

For more details on these failures, see this check.

Results for commit c0a1abb. ± Comparison against base commit 6015a2c.

♻️ This comment has been updated with latest results.

@hendrikmakait hendrikmakait self-assigned this Nov 4, 2022
config or {},
{
"distributed.scheduler.unknown-task-duration": "1s",
"distributed.scheduler.default-task-durations": default_task_durations,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed to avoid the reliance on stealing tasks of unknown duration. Instead, we now provide default durations for all relevant tasks. This concern was initially voiced here:

The test implicitly relies on tasks of unknown duration to be stolen (#5572). It should be changed not to rely on this specific use case.

#7243 (comment)

list(key_split(key[5:])) # Remove "task-" prefix
for key in w.data.keys()
if key.startswith("task-")
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the reasoning behind this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests should now check where the tasks have eventually been executed after stealing took place. This is the behavior we want to test here and relies less on internals as the previous version. The latter relied on checking the scheduler's worker state after stealing but during processing.

hendrikmakait and others added 2 commits November 8, 2022 14:33
Co-authored-by: crusaderky <crusaderky@gmail.com>
@hendrikmakait
Copy link
Member Author

CI seems to be broken, I'm not sure what's causing this.

@crusaderky
Copy link
Collaborator

I fixed a trivial issue but test_balance_expensive_tasks is still failing

@hendrikmakait
Copy link
Member Author

I fixed a trivial issue but test_balance_expensive_tasks is still failing

Thanks for fixing the pre-commit issue, @crusaderky, that one totally tripped me up because I only checked the code on the branch but not the diff against main. I managed to fix test_balance_expensive_tasks by further adjusting the test setup helpers and slightly changing the tests. This was necessary since occupancy slightly changed with the new test helpers. This should be good for another review now.

@hendrikmakait hendrikmakait marked this pull request as draft November 14, 2022 16:07
@hendrikmakait
Copy link
Member Author

After test_balance_expensive_tasks flaked 25% of the time on CI, I investigated and found that heartbeats affected the cost estimation, e.g., adding await asyncio.sleep(1) before the calls to steal.balance() would reliably fail the test. I've adjusted the code to avoid heartbeats and the test has not flaked in 2000x local runs. For comparison, before it flaked once in 500 runs. As long as we don't see any problems on CI, I'd finally consider this problem fixed.

@hendrikmakait
Copy link
Member Author

...still investigating after CI keeps failing.

@hendrikmakait hendrikmakait marked this pull request as ready for review November 28, 2022 10:48
@hendrikmakait hendrikmakait changed the title Improve robustness of helper functions in test_steal.py Fix test_balance_expensive_tasks and improve helper functions in test_steal.py Jan 23, 2023
@crusaderky crusaderky merged commit 99d4112 into dask:main Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants