-
Notifications
You must be signed in to change notification settings - Fork 1.9k
remove repartition exec from coalesce batches optimizer #19239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
run benchmarks |
|
run benchmark tpch_mem10 |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR removes RepartitionExec from the CoalesceBatches optimizer by updating test expectations to reflect the optimizer change. The CoalesceBatchesExec node is no longer inserted after RepartitionExec operations in the physical execution plans.
- Removes
CoalesceBatchesExecoptimizer behavior after repartitioning operations - Updates all affected SQL logic test files to reflect the new physical plan structure
- Includes minor whitespace corrections in test files
Reviewed changes
Copilot reviewed 53 out of 55 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| window_limits.slt | Removed CoalesceBatchesExec from window aggregate plans |
| window.slt | Removed CoalesceBatchesExec from multiple window and aggregate operations |
| unnest.slt | Removed CoalesceBatchesExec from unnest aggregate plans |
| union.slt | Removed CoalesceBatchesExec from union and aggregate plans |
| tpch/plans/*.slt.part | Removed CoalesceBatchesExec from all TPC-H query plans (Q1-Q22) |
| timestamps.slt | Removed extra spaces in error message assertions |
| spark/math/hex.slt | Fixed line number formatting |
| subquery*.slt | Removed CoalesceBatchesExec from subquery aggregate plans |
| select.slt | Removed CoalesceBatchesExec from aggregate with limit plans |
| repartition.slt | Removed CoalesceBatchesExec from repartition aggregate plans |
| qualify.slt | Removed CoalesceBatchesExec from qualify window plans |
| push_down_filter.slt | Removed CoalesceBatchesExec from grouping sets plans |
| preserve_file_partitioning.slt | Removed CoalesceBatchesExec from partitioned file plans |
| predicates.slt | Removed CoalesceBatchesExec from join filter pushdown plans |
| order.slt | Removed CoalesceBatchesExec from ordered aggregate plans |
| limit.slt | Removed CoalesceBatchesExec from limited aggregate plans |
| joins.slt & join.slt.part | Removed CoalesceBatchesExec from various join plans |
| insert*.slt | Removed CoalesceBatchesExec from insert with window plans |
| explain_tree.slt | Removed CoalesceBatchesExec from tree visualization |
| distinct_on.slt | Removed CoalesceBatchesExec from distinct aggregate plans |
| cte.slt | Removed CoalesceBatchesExec from recursive CTE plans |
| count_star_rule.slt | Removed CoalesceBatchesExec from count aggregate plans |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great -- thank you so much @jizezhang 🙏 I am sorry it took so long to review.
I'll plan to merge this tomorrow. Whoever merges this should make sure to rerun the tests as I think there is a non trivial chance of conflicts
| target_batch_size, | ||
| )))) | ||
| } else if let Some(async_exec) = plan_any.downcast_ref::<AsyncFuncExec>() { | ||
| if let Some(async_exec) = plan_any.downcast_ref::<AsyncFuncExec>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the only thing left is AsyncFuncExec and we could entirely remove the CoalesceBatches pass (and maybe even structure)
| 54)----------------------FilterExec: r_name@1 = AMERICA, projection=[r_regionkey@0] | ||
| 55)------------------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 56)--------------------------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/region.tbl]]}, projection=[r_regionkey, r_name], file_type=csv, has_header=false | ||
| 05)--------RepartitionExec: partitioning=Hash([o_year@0], 4), input_partitions=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these complex query plans are so much easier to read now
|
run benchmark clickbench_partitioned |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
I merged up to resolve some conflicts |
|
run benchmarks |
|
(I don't think this PR makes any meaningful difference on performance, as expected) |
|
Thank you @jizezhang @Dandandan |
|
🤖 |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
BatchCoalescerintoRepartitionExec#18782.Rationale for this change
What changes are included in this PR?
Removes
RepartitionExecfromCoalesceBatchesoptimizer.Are these changes tested?
Yes
Are there any user-facing changes?
No