perf: Optimize array_position for scalar needle#20532
Conversation
|
Benchmarks:
|
| let arr_from = resolve_start_from(third_arg.first(), num_rows)?; | ||
|
|
||
| let result = match haystack.data_type() { | ||
| List(_) => { |
There was a problem hiding this comment.
should we support FixedSizeList?
There was a problem hiding this comment.
We could. The current code works with FixedSizeList because the input list will be coerced (relatively cheap but not free). We could consider adding native support for FixedSizeList -- we'd also be able to avoid walking the offsets buffer to find matches. That would be slightly faster, although I think it would only be a small win.
I'm inclined to skip it to avoid the code duplication, but maybe I'm underestimating the perf win or how widespread FixedSizeList values are in real-world workloads. wdyt?
There was a problem hiding this comment.
it can be done later, basically it should work fine until the function gets called without planner, like Comet for example
|
@comphead Awesome, thanks for the review! |
## Which issue does this PR close? N/A ## Rationale for this change In #20374, `array_has` with a scalar needle was optimized to reconstruct matches more efficiently. Unfortunately, that code was incorrect for sliced arrays: `values()` returns the entire value buffer (including elements outside the visible slice), so we need to skip the corresponding indexes in the result bitmap. We could fix this by just skipping indexes, but it seems more robust and efficient to arrange to not compare the needle against elements outside the visible range in the first place. `array_position` has a similar behavior (introduced in #20532): it didn't have the buggy behavior, but it still did extra work for sliced arrays by comparing against elements outside the visible range. Benchmarking the revised code, there is no performance regression for unsliced arrays. ## What changes are included in this PR? * Fix `array_has` bug for sliced arrays with scalar needle * Improve `array_has` and `array_position` to not compare against elements outside the visible range of a sliced array * Add unit test for `array_has` bug * Add unit test to increase confidence in `array_position` behavior for sliced arrays ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
) ## Which issue does this PR close? N/A ## Rationale for this change In apache#20374, `array_has` with a scalar needle was optimized to reconstruct matches more efficiently. Unfortunately, that code was incorrect for sliced arrays: `values()` returns the entire value buffer (including elements outside the visible slice), so we need to skip the corresponding indexes in the result bitmap. We could fix this by just skipping indexes, but it seems more robust and efficient to arrange to not compare the needle against elements outside the visible range in the first place. `array_position` has a similar behavior (introduced in apache#20532): it didn't have the buggy behavior, but it still did extra work for sliced arrays by comparing against elements outside the visible range. Benchmarking the revised code, there is no performance regression for unsliced arrays. ## What changes are included in this PR? * Fix `array_has` bug for sliced arrays with scalar needle * Improve `array_has` and `array_position` to not compare against elements outside the visible range of a sliced array * Add unit test for `array_has` bug * Add unit test to increase confidence in `array_position` behavior for sliced arrays ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
## Which issue does this PR close? - Closes #20769. ## Rationale for this change `array_positions` previously compared the needle against each row's sub-array individually. When the needle is a scalar (the common case), we can do a single bulk `arrow_ord::cmp::not_distinct` comparison against the entire flat values buffer and then walk the result bitmap, which is significantly faster: the speedup on the `array_positions()` microbenchmarks ranges from 5x to 40x, depending on the size of the array. The same pattern has already been applied to `array_position` (#20532), and previously to other array UDFs. ## What changes are included in this PR? - Add benchmarks for `array_positions`. - Implement bulk-comparison optimization - Refactor `array_position`'s existing fast path slightly for consistency - Code cleanup to use "haystack" and "needle" consistently, not vague terms like "list_array" and "element" - Add unit tests for `array_positions` with sliced ListArrays, for peace of mind - Add unit tests for sliced lists and sliced lists with nulls for the new `array_positions` fast path. ## Are these changes tested? Yes. ## Are there any user-facing changes? No. ## AI usage Multiple AI tools were used to iterate on this PR. I have reviewed and understand the resulting code. --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>
## Which issue does this PR close? - Closes apache#20530 ## Rationale for this change The previous implementation of `array_position` used `compare_element_to_list` for every input row. When the needle is a scalar (quite common), we can do much better by searching over the entire flat haystack values array with a single call to `arrow_ord::cmp::not_distinct`. We can then iterate over the resulting set bits to determine per-row results. This is ~5-10x faster than the previous implementation for typical inputs. ## What changes are included in this PR? * Implement new fast path for `array_position` with scalar needle * Improve docs for `array_position` * Don't use `internal_err` to report a user-visible error ## Are these changes tested? Yes, and benchmarked. Additional tests added in a separate PR (apache#20531) ## Are there any user-facing changes? No.
) ## Which issue does this PR close? N/A ## Rationale for this change In apache#20374, `array_has` with a scalar needle was optimized to reconstruct matches more efficiently. Unfortunately, that code was incorrect for sliced arrays: `values()` returns the entire value buffer (including elements outside the visible slice), so we need to skip the corresponding indexes in the result bitmap. We could fix this by just skipping indexes, but it seems more robust and efficient to arrange to not compare the needle against elements outside the visible range in the first place. `array_position` has a similar behavior (introduced in apache#20532): it didn't have the buggy behavior, but it still did extra work for sliced arrays by comparing against elements outside the visible range. Benchmarking the revised code, there is no performance regression for unsliced arrays. ## What changes are included in this PR? * Fix `array_has` bug for sliced arrays with scalar needle * Improve `array_has` and `array_position` to not compare against elements outside the visible range of a sliced array * Add unit test for `array_has` bug * Add unit test to increase confidence in `array_position` behavior for sliced arrays ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
## Which issue does this PR close? - Closes apache#20769. ## Rationale for this change `array_positions` previously compared the needle against each row's sub-array individually. When the needle is a scalar (the common case), we can do a single bulk `arrow_ord::cmp::not_distinct` comparison against the entire flat values buffer and then walk the result bitmap, which is significantly faster: the speedup on the `array_positions()` microbenchmarks ranges from 5x to 40x, depending on the size of the array. The same pattern has already been applied to `array_position` (apache#20532), and previously to other array UDFs. ## What changes are included in this PR? - Add benchmarks for `array_positions`. - Implement bulk-comparison optimization - Refactor `array_position`'s existing fast path slightly for consistency - Code cleanup to use "haystack" and "needle" consistently, not vague terms like "list_array" and "element" - Add unit tests for `array_positions` with sliced ListArrays, for peace of mind - Add unit tests for sliced lists and sliced lists with nulls for the new `array_positions` fast path. ## Are these changes tested? Yes. ## Are there any user-facing changes? No. ## AI usage Multiple AI tools were used to iterate on this PR. I have reviewed and understand the resulting code. --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Which issue does this PR close?
array_positionwith scalar needle #20530Rationale for this change
The previous implementation of
array_positionusedcompare_element_to_listfor every input row. When the needle is a scalar (quite common), we can do much better by searching over the entire flat haystack values array with a single call toarrow_ord::cmp::not_distinct. We can then iterate over the resulting set bits to determine per-row results.This is ~5-10x faster than the previous implementation for typical inputs.
What changes are included in this PR?
array_positionwith scalar needlearray_positioninternal_errto report a user-visible errorAre these changes tested?
Yes, and benchmarked. Additional tests added in a separate PR (#20531)
Are there any user-facing changes?
No.