-
Notifications
You must be signed in to change notification settings - Fork 268
Open
Labels
Description
What is the problem the feature request solves?
It is very common to have scan -> filter as inputs to a join. The copying of data in the filter can be expensive when the batch contains strings and complex types, and the result of the filter is discarded after the join.
I believe that it would be more efficient to have the join use a selection vector to read inputs from the scanned batch rather than perform a filter.
This issue is for tracking the work to create a small prototype to demonstrate. If succesful, then we can discuss making changes in upstream DataFusion to add support for a new ColumnarValue::ArrayWithSelectionVector and then add a specialization in SortMergeJoin to take advantage of this.
Describe the potential solution
No response
Additional context
No response