-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Clean up SortExec creation and add doc comments #5889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| )) as _; | ||
| let sort2 = Arc::new( | ||
| SortExec::new(sort_exprs.clone(), repartition_exec, None) | ||
| .with_preserve_partitioning(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes it clearer that the sort is being constructed to preserve the input partitioning
|
WHile looking at the code again, I realized that |
| } | ||
|
|
||
| /// Whether this `SortExec` preserves partitioning of the children | ||
| pub fn with_fetch(mut self, fetch: Option<usize>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if limit might be a more standard name for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the term comes from Spark (some earlier versions of the code had the word Limit). In any event I think fetch is at least consistent with the rest of the DataFusion codebase, such as in the implementation of sort:
* Clean up SortExec creation and add doc comments * Reduce API surface * restore sort bench * fix benchmark * Add with_fetch
Which issue does this PR close?
Rationale for this change
While reviewing #5881 I had to look up what exactly "new_with_partitioning" did and what all the arguments meant and found the whole thing quite obtuse. Let's make this clearer.
What changes are included in this PR?
preserve_partitioningdoesSortExec::new()which is infallable (try_new always returned true)SortExec::with_preserve_partitioningto set the preserve partitioning flagSortExec::with_fetchto set the fetchSortExec::try_new()andSortExec::new_with_partitioningas deprecatedAre these changes tested?
Yes, by clippy and existing tests
Are there any user-facing changes?
The old APIs are deprecated, but they will still work