-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Execute sort in parallel when a limit is used after sort #3527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3527 +/- ##
==========================================
+ Coverage 85.75% 85.79% +0.03%
==========================================
Files 299 300 +1
Lines 55311 55356 +45
==========================================
+ Hits 47432 47492 +60
+ Misses 7879 7864 -15
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- nice work @Dandandan
| use std::sync::Arc; | ||
|
|
||
| /// Optimizer rule that makes sort parallel if a limit is used after sort (`ORDER BY LIMIT N`) | ||
| /// The plan will use `SortPreservingMergeExec` to merge the results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 FYI @tustvold -- more use of this operator 💯
| .as_any() | ||
| .downcast_ref::<SortExec>() | ||
| .unwrap() | ||
| .preserve_partitioning(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if https://docs.rs/datafusion/12.0.0/datafusion/physical_plan/trait.ExecutionPlan.html#tymethod.output_partitioning could be used here instead? But maybe it doesn't matter as we already know it is exactly a SortExec
| "GlobalLimitExec: skip=0, fetch=10", | ||
| " SortExec: [the_min@2 DESC]", | ||
| " CoalescePartitionsExec", | ||
| " SortPreservingMergeExec: [the_min@2 DESC]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Benchmark runs are scheduled for baseline = 9b22100 and contender = 3a9e0d0. 3a9e0d0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3526
Rationale for this change
Improves performance on
ORDER BY <expr> LIMIT Nqueries.vs master:
What changes are included in this PR?
Are there any user-facing changes?