RFC Set priorities for p2p shuffle tasks #7926
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is currently just a working theory and I don't have a proper reproducer, yet.
However, I observed some suboptimal scheduling behavior when two shuffles where running back-to-back, e.g.
df1.merge(df2).groupby(...).agg(..., split_out=2, shuffle="p2p")the suboptimal behavior is that the transfer tasks of the second shuffle are not necessarily prioritized. However, they basically act as a RMM sink and should likely have strict prioritization over other foo if we want to schedule memory optimal. A similar argument holds for the shuffle output task which literally reads data from disk into memory and therefore acts as a memory producer.
We were already able to track down a significant portion of the root task queuing impact down to improper ordering (see dask/dask#9995 and #7526 (comment)) s.t. queuing de-facto deprioritizes root tasks.
I believe that the
shuffletasks should be classified as root tasks in this scheduling paradigm but teaching this to our heuristic seems cumbersome / impossible. Instead, we can just literally (de-)prioritize the tasks accordingly.I need to follow up with a decent measurement to solidify this theory. For context, I stumbled over this scheduling behavior while playing with coiled/benchmarks#883 but couldn't get it to succeed, yet, for a couple of other reasons.
cc @hendrikmakait