Skip to content

Do we need bounded channels in RepartitionExec? #18538

@adriangb

Description

@adriangb

Now that we have a good spilling implementation in #18207, do we still want bounded channels for the in-memory data? This was first introduced in #4867.

My feeling is that we could probably drop it. The one situation I am worried about is when we fill RepartitionExec's buffers and consume the entire memory budget then the query fails. i.e. if we could do "cooperative" spilling RepartitionExec would be the ideal candidate to spill and this would not be a problem (an upstream GroupBy could ask other operators to spill, RepartitionExec would spill easily and free up memory). But today that's not the case.

One way to collect more information is to run ClickBench (and other benchmarks) w/o the Distribution infrastructure and compare runtimes, peak memory use and behavior under constrained memory budgets.

cc @Dandandan @alamb @crepererum @2010YOUY01

Metadata

Metadata

Assignees

Labels

performanceMake DataFusion fasterphysical-planChanges to the physical-plan crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions