fix: repartition for grouping set #16983

chenkovsky · 2025-07-30T16:26:50Z

Which issue does this PR close?

Closes Incorrect results of aggregation with grouping sets with single target partition #16965.

Rationale for this change

For aggregation with grouping set,
the group by expressions differ between the partial and final stages,

the implementation of final stage aggregation depends on the order of input.

What changes are included in this PR?

repartition if group by contains grouping set

Are these changes tested?

UT

Are there any user-facing changes?

No

chenkovsky · 2025-07-31T02:39:30Z

datafusion/core/src/physical_planner.rs

+                        || has_grouping_id)
+                    && session_state.config().repartition_aggregations();

                let next_partition_mode = if can_repartition {


maybe we should put code here if can_repartition || has_grouping_id {

alamb · 2025-09-02T13:03:15Z

@thinkharderdev / @avantgardnerio -- do you have some time to help review this PR?

thinkharderdev · 2025-09-03T13:00:23Z

datafusion/sqllogictest/test_files/aggregate.slt

+01)ProjectionExec: expr=[id@0 as id]
+02)--AggregateExec: mode=FinalPartitioned, gby=[id@0 as id, __grouping_id@1 as __grouping_id], aggr=[], ordering_mode=PartiallySorted([0])
+03)----CoalesceBatchesExec: target_batch_size=8192
+04)------RepartitionExec: partitioning=Hash([id@0, __grouping_id@1], 1), input_partitions=2


Maybe I'm not understanding something but how does "repartitioning" to a single partition change anything?

it has single partition, but multiple record batches. aggregation assumes that records in same group are adjacent, but it's not true for this case. repartition solves this problem.

Sorry, been busy for past few days so just getting back to this. I think I understand the underlying issue now since id is a const we infer it as a singleton which is why we get the issue.

Still I'm concerned that we are solving this with a pretty blunt instrument. Adding a repartition to ever aggregation with a grouping set can have a non-trivial cost, especially in a distributed query.

Looking into it a bit more, it seems like in this case we infer SortProperties::Singleton for the id expr in the final aggregation which I think is incorrect.

the underlying issue is that in aggregation with group_id, the partial aggregation and final aggregation have different group columns. if partition num is greater than 1, it always do repartition, so this problem is covered up.

github-actions · 2025-11-27T02:09:01Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

fix: repartition for grouping set if target_partitions = 1

5d8dca6

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 30, 2025

adriangb self-assigned this Jul 30, 2025

update

75bf986

chenkovsky changed the title ~~fix: repartition for grouping set if target_partitions = 1~~ fix: repartition for grouping set Jul 31, 2025

chenkovsky commented Jul 31, 2025

View reviewed changes

chenkovsky added 6 commits August 11, 2025 09:02

update

2486334

fmt

7930b35

Merge branch 'main' into fix/grouping_set_single_partition

d326c91

Merge branch 'main' into fix/grouping_set_single_partition

b74d801

Merge branch 'main' into fix/grouping_set_single_partition

c30e995

update

ce7cfe7

thinkharderdev reviewed Sep 3, 2025

View reviewed changes

chenkovsky mentioned this pull request Oct 7, 2025

feat: optimize grouping and introduced unparsing and substrait support #16161

Closed

github-actions bot added the Stale PR has not had any activity for some time label Nov 27, 2025

github-actions bot closed this Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: repartition for grouping set #16983

fix: repartition for grouping set #16983

Uh oh!

chenkovsky commented Jul 30, 2025 •

edited

Loading

Uh oh!

chenkovsky Jul 31, 2025

Uh oh!

alamb commented Sep 2, 2025

Uh oh!

thinkharderdev Sep 3, 2025

Uh oh!

chenkovsky Sep 3, 2025

Uh oh!

thinkharderdev Sep 6, 2025

Uh oh!

chenkovsky Sep 11, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: repartition for grouping set #16983

fix: repartition for grouping set #16983

Uh oh!

Conversation

chenkovsky commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

chenkovsky Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Sep 2, 2025

Uh oh!

thinkharderdev Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

chenkovsky Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

thinkharderdev Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

chenkovsky Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chenkovsky commented Jul 30, 2025 •

edited

Loading