-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
enhancementNew feature or requestNew feature or requestperformanceMake DataFusion fasterMake DataFusion faster
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We can use the availability of a Limit to speed up sorts in DataFusion:
Describe the solution you'd like
We can start utilizing the limit in different places:
- lexsort_to_indices supports a
limitargument - this argument can be filled (if limit is smaller than batch size) - a
LocalLimitcan be added before SortPreservingMergeExec (after Execute sort in parallel when a limit is used after sort #3527), limiting the input for this - Probably
SortPreservingMergeStreamcan start utilizing optional limits too - only merging the first n sorted rows for each batch.
This issue is about adding the limit to lexsort_to_indices. This will accomplish the following:
- using the faster
partial_sort - reducing the
takeinput and thus size of output batches - reducing the memory needed for
SortPreservingMergeExec(and spill to disk when running OOM / shuffle data in Ballista)
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestperformanceMake DataFusion fasterMake DataFusion faster