ARROW-8907: [Rust] Implement scalar comparison operations #7261
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently comparing an array to a scalar / literal value using the comparison operations defined in the comparison kernel here
https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
is very inefficient because:
(1) an array with the scalar value repeated has to be created, taking time and wasting memory
(2) time is spent during comparison to load the same literal values over and over
This pull request implements scalar comparison functions and benchmarks indicate good performance gains:
eq Float32 time: [938.54 us 950.28 us 962.65 us]
eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]
The benchmark results above show that the scalar comparison function is about 12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
And this is before accounting for creating the literal array.
In a more complex benchmark, the scalar comparison version is about 40% faster overall when we account for not having to create arrays of scalar / literal values.
Here are the benchmark results:
filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]
And here is the code for the benchmark:
https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230
In future these scalar comparison operations could be used to improve the performance of DataFusion with expressions comparing arrays to scalar / literal values which happens often in real world queries.
@paddyhoran @andygrove let me know what you think