perf: Optimize `array_has_any()` with scalar arg by neilconway · Pull Request #20385 · apache/datafusion

neilconway · 2026-02-16T22:05:06Z

Which issue does this PR close?

Closes Optimize array_has_any() for scalar arg #20384.
See Improve performance of array_has #18181 for related context.

Rationale for this change

When array_has_any is passed a scalar for either of its arguments, we can use a much faster algorithm: rather than doing O(N*M) comparisons for each row of the columnar arg, we can build a hash table on the scalar argument and probe it instead.

What changes are included in this PR?

Add benchmark to cover the one-scalar-arg case
Implement optimization as described above

Note that we fallback to a linear scan when the scalar arg is smaller than a threshold (<= 8 elements), because benchmarks suggested probing a HashSet is not profitable for very small arrays.

Are these changes tested?

Yes. Tests pass and benchmarked.

Are there any user-facing changes?

No.

neilconway · 2026-02-16T22:06:32Z

Benchmark results:

 group                                          base                                    target
  -----                                          ----                                    ------
  array_has_all/all_found_small_needle/10        1.10      7.3±2.65ms        ? ?/sec     1.00      6.6±0.08ms        ? ?/sec
  array_has_all/all_found_small_needle/100       1.00     14.9±0.04ms        ? ?/sec     1.02     15.2±0.09ms        ? ?/sec
  array_has_all/all_found_small_needle/500       1.00     52.0±0.11ms        ? ?/sec     1.02     53.0±0.17ms        ? ?/sec
  array_has_all/not_all_found/10                 1.00      6.3±0.05ms        ? ?/sec     1.00      6.3±0.03ms        ? ?/sec
  array_has_all/not_all_found/100                1.00     13.7±0.07ms        ? ?/sec     1.01     13.9±0.09ms        ? ?/sec
  array_has_all/not_all_found/500                1.00     46.5±0.22ms        ? ?/sec     1.03     47.8±0.33ms        ? ?/sec
  array_has_all_strings/all_found/10             1.18      5.3±0.02ms        ? ?/sec     1.00      4.5±0.01ms        ? ?/sec
  array_has_all_strings/all_found/100            1.00     14.9±0.11ms        ? ?/sec     1.00     14.8±0.05ms        ? ?/sec
  array_has_all_strings/all_found/500            1.00     56.9±0.24ms        ? ?/sec     1.00     57.1±1.13ms        ? ?/sec
  array_has_all_strings/not_all_found/10         1.00      3.9±0.03ms        ? ?/sec     1.19      4.6±0.01ms        ? ?/sec
  array_has_all_strings/not_all_found/100        1.00     13.4±0.03ms        ? ?/sec     1.00     13.3±0.19ms        ? ?/sec
  array_has_all_strings/not_all_found/500        1.00     67.5±0.12ms        ? ?/sec     1.01     67.9±0.25ms        ? ?/sec
  array_has_any/no_match/10                      1.00      7.4±0.08ms        ? ?/sec     1.00      7.4±0.04ms        ? ?/sec
  array_has_any/no_match/100                     1.00     23.7±0.06ms        ? ?/sec     1.00     23.8±0.07ms        ? ?/sec
  array_has_any/no_match/500                     1.00     96.0±0.14ms        ? ?/sec     1.01     97.0±0.19ms        ? ?/sec
  array_has_any/scalar_no_match/10               3.62      7.9±0.08ms        ? ?/sec     1.00      2.2±0.02ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.17     24.3±0.11ms        ? ?/sec     1.00     20.8±0.09ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.00     96.8±0.30ms        ? ?/sec     1.40    135.2±0.83ms        ? ?/sec
  array_has_any/scalar_some_match/10             6.53      6.9±0.08ms        ? ?/sec     1.00   1060.1±4.31µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.37     14.8±0.10ms        ? ?/sec     1.00     10.8±0.12ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.00     49.2±0.15ms        ? ?/sec     1.67     82.3±0.83ms        ? ?/sec
  array_has_any/some_match/10                    1.00      6.5±0.08ms        ? ?/sec     1.01      6.6±0.26ms        ? ?/sec
  array_has_any/some_match/100                   1.00     14.9±0.07ms        ? ?/sec     1.01     15.0±0.04ms        ? ?/sec
  array_has_any/some_match/500                   1.00     52.1±0.11ms        ? ?/sec     1.01     52.9±0.12ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            15.87     6.1±0.05ms        ? ?/sec     1.00    386.8±1.76µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           12.87     6.4±0.04ms        ? ?/sec     1.00    496.2±3.32µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          18.87     9.8±0.15ms        ? ?/sec     1.00    520.2±5.31µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         114.06    69.1±0.79ms        ? ?/sec    1.00    605.9±9.78µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         12.91     3.8±0.01ms        ? ?/sec     1.00    290.8±1.47µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        7.09      5.7±0.09ms        ? ?/sec     1.00    801.0±7.75µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       30.32    29.3±0.23ms        ? ?/sec     1.00   967.6±15.09µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      323.46   311.0±2.99ms        ? ?/sec    1.00   961.5±14.46µs        ? ?/sec
  array_has_any_strings/no_match/10              1.01      5.0±0.04ms        ? ?/sec     1.00      4.9±0.02ms        ? ?/sec
  array_has_any_strings/no_match/100             1.05     22.2±0.18ms        ? ?/sec     1.00     21.1±0.03ms        ? ?/sec
  array_has_any_strings/no_match/500             1.05    133.6±0.58ms        ? ?/sec     1.00    127.7±0.33ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       7.02      6.6±0.02ms        ? ?/sec     1.00    939.6±2.90µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      3.11     26.5±0.11ms        ? ?/sec     1.00      8.5±0.04ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.38    166.5±1.60ms        ? ?/sec     1.00    120.6±0.51ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     5.42      5.5±0.03ms        ? ?/sec     1.00   1010.9±5.30µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.93     15.5±0.05ms        ? ?/sec     1.00      8.0±0.09ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     57.0±0.18ms        ? ?/sec     1.15     65.4±0.93ms        ? ?/sec
  array_has_any_strings/some_match/10            1.01      4.3±0.01ms        ? ?/sec     1.00      4.3±0.01ms        ? ?/sec
  array_has_any_strings/some_match/100           1.07     14.6±0.06ms        ? ?/sec     1.00     13.6±0.03ms        ? ?/sec
  array_has_any_strings/some_match/500           1.04     53.7±0.17ms        ? ?/sec     1.00     51.8±0.20ms        ? ?/sec
  array_has_i64/found/10                         1.00    709.5±4.90µs        ? ?/sec     1.01    713.5±4.85µs        ? ?/sec
  array_has_i64/found/100                        1.00  1146.7±45.85µs        ? ?/sec     1.04  1190.1±80.82µs        ? ?/sec
  array_has_i64/found/500                        1.02      4.7±0.16ms        ? ?/sec     1.00      4.6±0.11ms        ? ?/sec
  array_has_i64/not_found/10                     1.01    696.9±5.06µs        ? ?/sec     1.00    692.3±2.63µs        ? ?/sec
  array_has_i64/not_found/100                    1.00  1124.8±41.63µs        ? ?/sec     1.12  1254.4±117.95µs        ? ?/sec
  array_has_i64/not_found/500                    1.00      4.6±0.18ms        ? ?/sec     1.00      4.6±0.08ms        ? ?/sec
  array_has_strings/found/10                     1.00   1238.9±7.20µs        ? ?/sec     1.00   1238.2±6.99µs        ? ?/sec
  array_has_strings/found/100                    1.01      3.2±0.05ms        ? ?/sec     1.00      3.1±0.03ms        ? ?/sec
  array_has_strings/found/500                    1.01     15.5±0.31ms        ? ?/sec     1.00     15.3±0.22ms        ? ?/sec
  array_has_strings/not_found/10                 1.01    776.4±4.59µs        ? ?/sec     1.00    771.5±3.34µs        ? ?/sec
  array_has_strings/not_found/100                1.02      6.5±0.10ms        ? ?/sec     1.00      6.4±0.04ms        ? ?/sec
  array_has_strings/not_found/500                1.00     16.8±0.16ms        ? ?/sec     1.01     16.9±0.11ms        ? ?/sec

The previous implementation tested the cost of building an array_has() `Expr` (!), not actually evaluating the array_has() operation itself. Refactor things along the way.

neilconway · 2026-02-18T14:12:29Z

Note that the commit fixing up the benchmarks is shared with #20374 -- I can also pull that out into a separate PR, because it's a prerequisite for any performance work on these functions

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Jefffrey · 2026-02-20T10:54:49Z

 use crate::utils::make_scalar_function;

 use std::any::Any;
+use std::collections::HashSet;


Do we get performance gains if we use hashbrown instead?

Related issue: Consider disallowing std hashmap/hashset via clippy #19869

I'm happy to benchmark, although std HashSet uses hashbrown internally these days; have we found that using hashbrown directly leads to be better performance in other circumstances?

Main point of reference is this reply:

Use a single HashMap implementation consistently across the code base #677 (comment)

Though its a few years old at this point so don't know if things have changed since

Jefffrey · 2026-02-20T11:59:04Z

+
+    let col_list: ArrayWrapper = col_arr.as_ref().try_into()?;
+    let all_col_strings = string_array_to_vec(col_list.values().as_ref());
+    let col_offsets: Vec<usize> = col_list.offsets().collect();


We could probably avoid this collect of offsets if we take advantage of peeking the iter

Interestingly, this turned out to be much slower:

array_has_any_scalar/string_no_match/1 time: [97.474 µs 98.334 µs 99.302 µs] change: [−15.448% −14.766% −14.081%] (p = 0.00 < 0.05) Performance has improved. array_has_any_scalar/string_no_match/10 time: [298.73 µs 317.71 µs 343.54 µs] change: [+32.141% +60.866% +85.741%] (p = 0.00 < 0.05) Performance has regressed. Found 20 outliers among 100 measurements (20.00%) 3 (3.00%) low mild 17 (17.00%) high severe array_has_any_scalar/string_no_match/100 time: [437.14 µs 455.34 µs 480.03 µs] change: [+22.766% +39.379% +59.120%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 1 (1.00%) low mild 17 (17.00%) high severe array_has_any_scalar/string_no_match/1000 time: [332.13 µs 351.25 µs 376.77 µs] change: [+28.480% +54.273% +78.992%] (p = 0.00 < 0.05) Performance has regressed.

I didn't dig into why; maybe dynamic dispatch because of the iterator adds a bunch of overhead? I'll leave this as-is for now.

Jefffrey · 2026-02-20T12:09:57Z

+    columnar_arg: &ColumnarValue,
+    scalar_values: &ArrayRef,
+) -> Result<ColumnarValue> {
+    let scalar_strings = string_array_to_vec(scalar_values.as_ref());


I wonder if we should defer creating this vec, since if we take the hashing path we effectively allocate a vec to allocate a hashset, which seems redundant

I tried this but it didn't seem to help the benchmarks, so I'll keep things as they were for now.

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

…n practice

…has-any-scalar

neilconway · 2026-02-20T16:17:27Z

@Jefffrey Thank you for the detailed code review! 🙏 I addressed all of your comments; please let me know if you have more feedback.

Jefffrey

How do the benchmarks look now?

Jefffrey · 2026-02-23T02:17:26Z

 use crate::utils::make_scalar_function;

 use std::any::Any;
+use std::collections::HashSet;


Main point of reference is this reply:

Use a single HashMap implementation consistently across the code base #677 (comment)

Though its a few years old at this point so don't know if things have changed since

neilconway · 2026-02-23T15:11:18Z

Benchmarks comparing the latest version (with hashbrown) versus main:

 group                                          base                                    target
  -----                                          ----                                    ------
  array_has_any/no_match/10                      1.00      7.4±0.12ms        ? ?/sec     1.00      7.4±0.12ms        ? ?/sec
  array_has_any/no_match/100                     1.01     23.9±1.06ms        ? ?/sec     1.00     23.6±0.28ms        ? ?/sec
  array_has_any/no_match/500                     1.00     96.6±1.07ms        ? ?/sec     1.02     98.5±4.88ms        ? ?/sec
  array_has_any/scalar_no_match/10               3.89      8.5±0.64ms        ? ?/sec     1.00      2.2±0.05ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.14     24.3±0.58ms        ? ?/sec     1.00     21.4±1.29ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.00     96.8±0.71ms        ? ?/sec     1.47    142.4±1.39ms        ? ?/sec
  array_has_any/scalar_some_match/10             6.62      7.0±0.24ms        ? ?/sec     1.00  1051.8±10.58µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.21     14.8±0.34ms        ? ?/sec     1.00     12.2±1.14ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.00     53.5±5.72ms        ? ?/sec     1.73     92.3±8.25ms        ? ?/sec
  array_has_any/some_match/10                    1.08      6.9±0.55ms        ? ?/sec     1.00      6.4±0.04ms        ? ?/sec
  array_has_any/some_match/100                   1.01     14.9±0.25ms        ? ?/sec     1.00     14.8±0.24ms        ? ?/sec
  array_has_any/some_match/500                   1.00     52.3±0.14ms        ? ?/sec     1.01     52.9±2.27ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            17.81     6.7±0.61ms        ? ?/sec     1.00    375.0±4.67µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           15.74     7.0±0.57ms        ? ?/sec     1.00   444.8±10.79µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          15.98    10.2±0.70ms        ? ?/sec     1.00   638.0±42.57µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         134.73    73.0±7.67ms        ? ?/sec    1.00   542.0±15.41µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         16.10     4.1±0.03ms        ? ?/sec     1.00    254.2±2.92µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        13.87     5.8±0.44ms        ? ?/sec     1.00    418.3±8.81µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       56.63    32.3±2.88ms        ? ?/sec     1.00   570.6±51.41µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      691.94  319.4±19.85ms        ? ?/sec    1.00   461.6±11.67µs        ? ?/sec
  array_has_any_strings/no_match/10              1.01      5.1±0.10ms        ? ?/sec     1.00      5.0±0.09ms        ? ?/sec
  array_has_any_strings/no_match/100             1.04     23.3±1.87ms        ? ?/sec     1.00     22.4±1.09ms        ? ?/sec
  array_has_any_strings/no_match/500             1.00    133.0±1.73ms        ? ?/sec     1.03   137.4±10.46ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       7.30      6.8±0.44ms        ? ?/sec     1.00   926.6±13.70µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      3.70     29.5±2.04ms        ? ?/sec     1.00      8.0±0.06ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.79    164.2±1.46ms        ? ?/sec     1.00     91.9±1.24ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     6.56      5.2±0.07ms        ? ?/sec     1.00    791.9±7.79µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    2.81     15.6±0.07ms        ? ?/sec     1.00      5.6±0.09ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    3.07     57.6±1.24ms        ? ?/sec     1.00     18.7±0.26ms        ? ?/sec
  array_has_any_strings/some_match/10            1.01      4.4±0.19ms        ? ?/sec     1.00      4.3±0.06ms        ? ?/sec
  array_has_any_strings/some_match/100           1.00     15.5±1.62ms        ? ?/sec     1.00     15.5±1.31ms        ? ?/sec
  array_has_any_strings/some_match/500           1.00     59.7±5.05ms        ? ?/sec     1.02     60.7±5.82ms        ? ?/sec

neilconway · 2026-02-23T15:16:04Z

@Jefffrey Got it; the default hashbrown hash function does seem like a better choice. Interestingly the benchmarks are significantly better in some cases. This is comparing the feature branch with hashbrown (target) vs std hashset (base):

  group                                          base                                   target
  -----                                          ----                                   ------
  array_has_any/no_match/10                      1.00      7.4±0.08ms        ? ?/sec    1.00      7.4±0.04ms        ? ?/sec
  array_has_any/no_match/100                     1.00     23.0±0.05ms        ? ?/sec    1.02     23.6±0.05ms        ? ?/sec
  array_has_any/no_match/500                     1.00     91.7±0.11ms        ? ?/sec    1.05     96.2±0.53ms        ? ?/sec
  array_has_any/scalar_no_match/10               1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.00ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.00     20.3±0.07ms        ? ?/sec    1.01     20.5±0.06ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.00    134.4±1.10ms        ? ?/sec    1.00    134.6±0.37ms        ? ?/sec
  array_has_any/scalar_some_match/10             1.00  1038.6±12.05µs        ? ?/sec    1.00   1036.2±4.87µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.00     10.7±0.09ms        ? ?/sec    1.00     10.7±0.07ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.00     83.1±0.39ms        ? ?/sec    1.00     83.2±0.40ms        ? ?/sec
  array_has_any/some_match/10                    1.01      6.5±0.03ms        ? ?/sec    1.00      6.4±0.04ms        ? ?/sec
  array_has_any/some_match/100                   1.00     14.6±0.06ms        ? ?/sec    1.01     14.8±0.05ms        ? ?/sec
  array_has_any/some_match/500                   1.00     50.1±0.13ms        ? ?/sec    1.06     52.9±0.22ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            1.00    359.7±1.46µs        ? ?/sec    1.04    373.2±2.89µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           1.91    844.9±9.22µs        ? ?/sec    1.00    441.6±9.23µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          1.59  1003.3±34.17µs        ? ?/sec    1.00   629.3±21.51µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         1.77   955.1±12.20µs        ? ?/sec    1.00   540.2±12.02µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         1.01    256.7±1.83µs        ? ?/sec    1.00    255.1±1.92µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        1.97   826.3±13.46µs        ? ?/sec    1.00    420.2±8.06µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       1.65   910.6±19.59µs        ? ?/sec    1.00    552.9±17.14µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      1.90   874.5±12.71µs        ? ?/sec    1.00    459.8±8.70µs        ? ?/sec
  array_has_any_strings/no_match/10              1.00      5.0±0.01ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
  array_has_any_strings/no_match/100             1.01     22.2±0.05ms        ? ?/sec    1.00     22.0±0.03ms        ? ?/sec
  array_has_any_strings/no_match/500             1.00    128.7±0.18ms        ? ?/sec    1.03    132.1±1.15ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       1.00    863.4±2.22µs        ? ?/sec    1.07    920.9±1.92µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      1.00      7.3±0.02ms        ? ?/sec    1.10      8.0±0.02ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.00     87.1±0.14ms        ? ?/sec    1.05     91.4±0.14ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     1.00    769.2±2.00µs        ? ?/sec    1.03    790.9±3.03µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.00      4.1±0.17ms        ? ?/sec    1.04      4.3±0.22ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     16.9±0.08ms        ? ?/sec    1.08     18.2±0.07ms        ? ?/sec
  array_has_any_strings/some_match/10            1.00      4.3±0.02ms        ? ?/sec    1.00      4.3±0.01ms        ? ?/sec
  array_has_any_strings/some_match/100           1.01     14.3±0.05ms        ? ?/sec    1.00     14.1±0.04ms        ? ?/sec
  array_has_any_strings/some_match/500           1.00     53.5±0.11ms        ? ?/sec    1.00     53.6±0.07ms        ? ?/sec

alamb · 2026-02-24T20:59:19Z

Thanks @Jefffrey @martin-g and @neilconway

## Which issue does this PR close? - Closes apache#20384. - See apache#18181 for related context. ## Rationale for this change When `array_has_any` is passed a scalar for either of its arguments, we can use a much faster algorithm: rather than doing O(N*M) comparisons for each row of the columnar arg, we can build a hash table on the scalar argument and probe it instead. ## What changes are included in this PR? * Add benchmark to cover the one-scalar-arg case * Implement optimization as described above Note that we fallback to a linear scan when the scalar arg is smaller than a threshold (<= 8 elements), because benchmarks suggested probing a HashSet is not profitable for very small arrays. ## Are these changes tested? Yes. Tests pass and benchmarked. ## Are there any user-facing changes? No. --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

github-actions Bot added the documentation Improvements or additions to documentation label Feb 16, 2026

neilconway force-pushed the neilc/optimize-array-has-any-scalar branch from cc2d735 to ef696bd Compare February 16, 2026 22:49

neilconway added 5 commits February 17, 2026 11:02

Revise benchmark for array_has()

266550b

The previous implementation tested the cost of building an array_has() `Expr` (!), not actually evaluating the array_has() operation itself. Refactor things along the way.

Add benchmark for array_has_any scalar fastpath

7b144b1

Optimize array_has_any() with one scalar arg

3260914

Tweak to avoid some allocations in string path

dacf850

cargo fmt

b3bbf3a

neilconway force-pushed the neilc/optimize-array-has-any-scalar branch from 83484af to b3bbf3a Compare February 17, 2026 16:02

martin-g reviewed Feb 19, 2026

View reviewed changes

Comment thread datafusion/functions-nested/benches/array_has.rs Outdated

Fix i64_no_match benchmark

9c4a653

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Jefffrey reviewed Feb 20, 2026

View reviewed changes

Simplify scalar list returning false

ef94157

github-actions Bot added the functions Changes to functions implementation label Feb 20, 2026

neilconway and others added 13 commits February 20, 2026 07:43

Simplify dispatch for scalar path, per code review

162f168

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

Simplify NULL check, per Jeffrey

1c13b5d

Improve null check, per code review

c8f0b98

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

.

04d3375

.

b7074e0

Tweak

52f1c32

.

fb696f2

Avoid vec alloc when building ScalarStringLookup::Set

8fb60df

Revert avoiding Vec for ScalarStringLookup::Set construction, cheap i…

c83c70d

…n practice

Defend against a possible sliced array

a14c7ea

Merge remote-tracking branch 'origin/main' into neilc/optimize-array-…

43a9b5d

…has-any-scalar

Don't use NEEDLE_SIZE constant for array_has_any benchmarks

8e9cc91

cargo fmt

3c98d4d

Tweak comment

c7abde1

Jefffrey approved these changes Feb 23, 2026

View reviewed changes

Switch to hashbrown from std HashSet

242f3c2

cargo fmt

1d3464e

alamb added the performance Make DataFusion faster label Feb 24, 2026

alamb added this pull request to the merge queue Feb 24, 2026

Merged via the queue into apache:main with commit 585bbf3 Feb 24, 2026
32 checks passed

Conversation

neilconway commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Feb 16, 2026

Uh oh!

neilconway commented Feb 18, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

neilconway commented Feb 20, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neilconway commented Feb 23, 2026

Uh oh!

neilconway commented Feb 23, 2026

Uh oh!

alamb commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

neilconway commented Feb 16, 2026 •

edited

Loading