Remove coalesce batches rule and deprecate CoalesceBatchesExec#19622
Remove coalesce batches rule and deprecate CoalesceBatchesExec#19622alamb merged 18 commits intoapache:mainfrom
Conversation
CoalesceBatchesExec
|
Looks great, can you fix the clippy errors @feniljain ? |
|
run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
QQuery 23 in clickbench_partitioned just halved the time 😲 |
I think this is not correct, it is super flaky lately. @alamb do you know the source of this? Is it running OOM / swapping perhaps or slow / not predictable block storage? |
Ouuh
Did you mean to tag Andrew instead of bot? |
Yeah @alamb :D |
|
run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
QQuery 23 still seems to be leading ahead! |
I think it is the same flakiness |
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs
Outdated
Show resolved
Hide resolved
I suspect this has to do with timing. Basically Q23 is like This can now takes advantage of dynamic filter pushdown which can prune some of the files. However, the updates of the dynamic filters is not deterministic so I think that accounts for the variance run to run -- depending on what rows have been seen already will potentially skip entire files I suspect this accounts for the non determinsim I will see if I can reproduce the results locally |
|
run benchmark clickbench_partitioned |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Thank you @feniljain -- this looks great to me
| @@ -440,9 +439,7 @@ async fn test_replace_multiple_input_repartition_with_extra_steps( | |||
| let repartition_rr = repartition_exec_round_robin(source); | |||
| let repartition_hash = repartition_exec_hash(repartition_rr); | |||
There was a problem hiding this comment.
not for this PR, but I think now actually we have fixed the double repartition as well, as described in https://datafusion.apache.org/blog/output/2025/12/15/avoid-consecutive-repartitions/
@gene-bordegaray is that correct? if so I will file a ticket to try and update these tests so they match what datafusion creates now (a single round robin hash)
|
🤖: Benchmark completed Details
|
|
Epic work @feniljain -- thanks again |
…e#19622) ## Which issue does this PR close? - Closes apache#19591 ## Rationale for this change Explained in issue itself ## What changes are included in this PR? - Removed coalesce batches rule - Deprecate `CoalesceBatchesExec` ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, added a deprecation tag on `CoalesceBatchesExec`
Which issue does this PR close?
CoalesceBatchesoptimizer rule #19591Rationale for this change
Explained in issue itself
What changes are included in this PR?
CoalesceBatchesExecAre these changes tested?
Yes
Are there any user-facing changes?
Yes, added a deprecation tag on
CoalesceBatchesExec