-
-
Notifications
You must be signed in to change notification settings - Fork 748
Store ready and constrained tasks in heapsets #6711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
crusaderky
commented
Jul 11, 2022
- Closes Tasks with resource constraints ignore priority #6137
- Closes Resources are ignored if a task is cancelled and then added again #6710
bfc7ca5 to
09631ae
Compare
09631ae to
1ddce73
Compare
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 6h 49m 48s ⏱️ - 12m 41s For more details on these failures, see this check. Results for commit 1ddce73. ± Comparison against base commit 91a9e5d. |
gjoseph92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this refactor! _next_ready_task is great, and using HeapSets makes things simpler and more readable.
I'm concerned about this making test_steal_concurrent_simple and maybe other stealing tests flaky. Indeed, that seems to be happening: https://github.com/dask/distributed/runs/7288303161. Adding something like popright to HeapSet would be a very simple fix that would let us get this in quickly (instead of having to rewrite those tests to not rely on race conditions).
| self, ts: TaskState, *, stimulus_id: str | ||
| ) -> RecsInstrs: | ||
| if self.validate: | ||
| assert ts.state == "constrained" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I thought these sorts of assertions were superfluous (the transition function is guaranteed to only be called if this is the case (unless of course it's been registered for tasks of the wrong state))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily; there are a lot of _transition_* methods that are called directly from other _transition_ methods.
distributed/tests/test_steal.py
Outdated
|
|
||
| # ready is a heap but we don't need last, just not the next | ||
| _, victim_key = w0.state.ready[-1] | ||
| victim_key = w0.state.ready.peek().key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above would make me think peek isn't right for these (since it is the next). Seems like this test is relying on race conditions already (that move_task_request happens faster than slowinc); picking the first task might significantly increase the probability of those race conditions being triggered. I would think assert ws1.has_what may fail intermittently if ws0 starts executing victim_key before move_task_request lands?
In #6598 I added a HeapSet.popright method which might be useful here: https://github.com/dask/distributed/pull/6598/files#diff-8d01f1d2f0aff66e9a55d31f036edddb0c44d82c8bfcc1eefc015b9ac8d661edR100-R109. Note that this isn't a PR I'd ever expect to merge, so not sure if popright would ever have any usage beyond this test—just mentioning that it's come up once before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for spotting it!