Do not reuse shuffles between merges #557

hendrikmakait · 2023-12-15T11:41:48Z

Follow-up to #361 and dask-expr's version of dask/distributed#8416

phofl · 2023-12-15T11:52:19Z

Can you explain what this is doing?

phofl · 2023-12-15T11:57:35Z

My understanding: We are not reusing shuffles for merges anymore (not a fan of fwiw), to solve issues when a worker gets killed (spot replacement for example) between the merges. But does this actually solve the underlying problem in the sense that it's also resilient against workers leaving during the merge?

hendrikmakait · 2023-12-15T12:44:36Z

Generally speaking, merge relies on the coordination between the input shuffles such that they end up putting matching output partitions on the same workers. This coordination is manageable for an individual merge. With reuse, we suddenly have to coordinate all the shuffles used in several merges. If that does not work, the merge will deadlock (we could also add a kill switch but that's only marginally better). I am convinced that I can draw up scenarios where this fails right now. I'm not yet sure whether it's at all possible to coordinate within a reasonable amount of time, at the very least it would add significant complexity.

hendrikmakait · 2023-12-15T14:41:34Z

Note that this is unrelated to workers leaving, it is about workers joining. In the case of workers leaving, the usual P2P restart mechanism kicks in.

phofl · 2023-12-18T13:05:00Z

Are there other cases where this could deadlock except workers joining then?

If not, can we somehow pass the number of workers we want to operate on into the function that moves the output partitions to the workers?

That's not possible right now, but we should be able to do this when we generate the graph on the scheduler.

hendrikmakait · 2023-12-18T13:14:50Z

Are there other cases where this could deadlock except workers joining then?

None that I am aware of.

If not, can we somehow pass the number of workers we want to operate on into the function that moves the output partitions to the workers?

That's not possible right now, but we should be able to do this when we generate the graph on the scheduler.

Possibly, I guess? As you said, it's not possible right now and would depend on the implementation of materialization on the scheduler. I generally expect us to beable to leverage the expression DAG for more optimizations. Note that reuse also doesn't work with diskless P2P (which is still experimental).

phofl · 2023-12-18T13:24:22Z

thx

hendrikmakait added 3 commits December 15, 2023 12:40

Do not reuse

8f84744

fix

0f89c16

Minor

865990c

docs

69751cd

phofl approved these changes Dec 18, 2023

View reviewed changes

phofl merged commit 60e4cd7 into dask:main Dec 18, 2023

phofl mentioned this pull request Dec 18, 2023

Explore passing workers into P2P merges #596

Open

crusaderky mentioned this pull request Dec 19, 2023

p2p worker selection is brittle for TPCH query 7 dask/distributed#8424

Closed

hendrikmakait deleted the do-not-reuse-shuffles branch December 19, 2023 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not reuse shuffles between merges #557

Do not reuse shuffles between merges #557

Uh oh!

hendrikmakait commented Dec 15, 2023 •

edited

Loading

Uh oh!

phofl commented Dec 15, 2023

Uh oh!

phofl commented Dec 15, 2023

Uh oh!

hendrikmakait commented Dec 15, 2023

Uh oh!

hendrikmakait commented Dec 15, 2023

Uh oh!

phofl commented Dec 18, 2023

Uh oh!

hendrikmakait commented Dec 18, 2023

Uh oh!

phofl commented Dec 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Do not reuse shuffles between merges #557

Do not reuse shuffles between merges #557

Uh oh!

Conversation

hendrikmakait commented Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phofl commented Dec 15, 2023

Uh oh!

phofl commented Dec 15, 2023

Uh oh!

hendrikmakait commented Dec 15, 2023

Uh oh!

hendrikmakait commented Dec 15, 2023

Uh oh!

phofl commented Dec 18, 2023

Uh oh!

hendrikmakait commented Dec 18, 2023

Uh oh!

phofl commented Dec 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hendrikmakait commented Dec 15, 2023 •

edited

Loading