RFC Set priorities for p2p shuffle tasks #7926

fjetter · 2023-06-16T11:19:53Z

This is currently just a working theory and I don't have a proper reproducer, yet.

However, I observed some suboptimal scheduling behavior when two shuffles where running back-to-back, e.g.

df1.merge(df2).groupby(...).agg(..., split_out=2, shuffle="p2p")

the suboptimal behavior is that the transfer tasks of the second shuffle are not necessarily prioritized. However, they basically act as a RMM sink and should likely have strict prioritization over other foo if we want to schedule memory optimal. A similar argument holds for the shuffle output task which literally reads data from disk into memory and therefore acts as a memory producer.

We were already able to track down a significant portion of the root task queuing impact down to improper ordering (see dask/dask#9995 and #7526 (comment)) s.t. queuing de-facto deprioritizes root tasks.
I believe that the shuffle tasks should be classified as root tasks in this scheduling paradigm but teaching this to our heuristic seems cumbersome / impossible. Instead, we can just literally (de-)prioritize the tasks accordingly.
I need to follow up with a decent measurement to solidify this theory. For context, I stumbled over this scheduling behavior while playing with coiled/benchmarks#883 but couldn't get it to succeed, yet, for a couple of other reasons.

cc @hendrikmakait

github-actions · 2023-06-16T12:22:09Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      20 files ±0       20 suites ±0 11h 20m 20s ⏱️ - 23m 8s
  3 681 tests ±0   3 572 ✔️ ±0   108 💤 ±0 1 ❌ +1
35 600 runs ±0 33 835 ✔️ +2 1 764 💤 - 1 1 ❌ ±0

For more details on these failures, see this check.

Results for commit 13d1011. ± Comparison against base commit 4a0c489.

Set priorities for p2p shuffle tasks

13d1011

fjetter mentioned this pull request Jun 20, 2023

Imbalanced join workflow coiled/benchmarks#883

Closed

hendrikmakait added the shuffle label Aug 16, 2023

hendrikmakait mentioned this pull request Aug 16, 2023

[Tracking] Advancements for P2P #8043

Open

15 tasks

fjetter closed this Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC Set priorities for p2p shuffle tasks #7926

RFC Set priorities for p2p shuffle tasks #7926

Uh oh!

fjetter commented Jun 16, 2023

Uh oh!

github-actions bot commented Jun 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

RFC Set priorities for p2p shuffle tasks #7926

RFC Set priorities for p2p shuffle tasks #7926

Uh oh!

Conversation

fjetter commented Jun 16, 2023

Uh oh!

github-actions bot commented Jun 16, 2023

Unit Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants