Skip to content

perf: Optimize NULL handling in array_has#21471

Merged
mbutrovich merged 2 commits intoapache:mainfrom
neilconway:neilc/perf-array-has-nulls
Apr 10, 2026
Merged

perf: Optimize NULL handling in array_has#21471
mbutrovich merged 2 commits intoapache:mainfrom
neilconway:neilc/perf-array-has-nulls

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

array_has uses BooleanArray::builder to construct its results. This updates the NULL buffer in an incremental fashion (row-by-row). It is more efficient to use BooleanBufferBuilder to construct the results and separately construct the output NULL buffer via NullBuffer::union or similar.

Benchmarks (AR64):

  array_has_i64 (scalar needle path)
  - found/10: 51.7 µs -> 49.2 µs, -4.7%
  - not_found/10: 43.2 µs -> 40.0 µs, -7.2%
  - found/100: 162.0 µs -> 152.3 µs, -6.3%
  - not_found/100: 143.1 µs -> 135.8 µs, -5.2%
  - found/500: 623.4 µs -> 572.5 µs, -8.4%
  - not_found/500: 593.4 µs -> 560.5 µs, -5.5%

  array_has_all (row-converter path)
  - all_found_small_needle/10: 699.4 µs -> 641.1 µs, -8.1%
  - not_all_found/10: 533.3 µs -> 470.7 µs, -9.5%
  - all_found_small_needle/100: 5.94 ms -> 4.49 ms, -24.4%
  - not_all_found/100: 4.60 ms -> 3.83 ms, -16.7%
  - all_found_small_needle/500: 34.6 ms -> 31.7 ms, -8.3%
  - not_all_found/500: 30.4 ms -> 28.4 ms, -6.7%

  array_has_any (row-converter path)
  - some_match/10: 619.7 µs -> 563.2 µs, -9.5%
  - no_match/10: 1.11 ms -> 1.04 ms, -0.1%
  - scalar_some_match/10: 467.4 µs -> 444.6 µs, -4.2%
  - scalar_no_match/10: 1.15 ms -> 994.5 µs, -14.5%
  - some_match/100: 4.98 ms -> 4.35 ms, -12.6%
  - no_match/100: 10.47 ms -> 8.90 ms, -15.0%
  - scalar_some_match/100: 4.63 ms -> 4.19 ms, -9.4%
  - scalar_no_match/100: 10.40 ms -> 9.98 ms, -4.1%
  - some_match/500: 31.8 ms -> 28.9 ms, -9.1%
  - no_match/500: 66.1 ms -> 58.3 ms, -11.8%
  - scalar_some_match/500: 25.6 ms -> 23.5 ms, -8.3%
  - scalar_no_match/500: 58.8 ms -> 54.1 ms, -8.0%

  array_has_strings (scalar needle path)
  - found/10: 396.3 µs -> 364.7 µs, -8.5%
  - not_found/10: 69.9 µs -> 67.5 µs, -3.6%
  - found/100: 1.34 ms -> 1.25 ms, -7.0%
  - not_found/100: 2.53 ms -> 2.37 ms, -6.1%
  - found/500: 3.36 ms -> 3.11 ms, -7.5%
  - not_found/500: 8.59 ms -> 8.12 ms, -5.5%

  array_has_all_strings (string fast path)
  - all_found/10: 1.08 ms -> 1.01 ms, -7.7%
  - not_all_found/10: 659.0 µs -> 632.5 µs, -3.9%
  - all_found/100: 5.50 ms -> 5.24 ms, -4.6%
  - not_all_found/100: 4.77 ms -> 4.58 ms, -4.0%
  - all_found/500: 27.4 ms -> 26.2 ms, -4.6%
  - not_all_found/500: 30.0 ms -> 28.8 ms, -3.9%

  array_has_any_strings (string fast path)
  - some_match/10: 946.5 µs -> 872.9 µs, -6.7%
  - no_match/10: 1.20 ms -> 1.12 ms, -6.0%
  - scalar_some_match/10: 420.4 µs -> 375.7 µs, -8.7%
  - scalar_no_match/10: 344.0 µs -> 309.5 µs, -10.1%
  - some_match/100: 4.93 ms -> 4.76 ms, -3.4%
  - no_match/100: 8.76 ms -> 8.50 ms, -2.9%
  - scalar_some_match/100: 1.49 ms -> 1.40 ms, -6.0%
  - scalar_no_match/100: 2.94 ms -> 2.72 ms, -7.8%
  - some_match/500: 24.0 ms -> 23.4 ms, -2.7%
  - no_match/500: 57.0 ms -> 55.8 ms, -2.1%
  - scalar_some_match/500: 5.45 ms -> 5.07 ms, -7.0%
  - scalar_no_match/500: 34.5 ms -> 32.9 ms, -4.5%

  array_has_any_scalar (varying scalar size)
  - i64_no_match/1: 173.7 µs -> 162.6 µs, -4.3%
  - i64_no_match/10: 139.8 µs -> 125.7 µs, -9.5%
  - i64_no_match/100: 264.8 µs -> 253.7 µs, -4.6%
  - i64_no_match/1000: 155.4 µs -> 142.4 µs, -8.0%
  - string_no_match/1: 125.1 µs -> 107.5 µs, -14.3%
  - string_no_match/10: 164.8 µs -> 147.0 µs, -10.1%
  - string_no_match/100: 257.8 µs -> 243.4 µs, -5.8%
  - string_no_match/1000: 180.9 µs -> 164.5 µs, -8.9%

What changes are included in this PR?

  • Implement optimization, minor code cleanup

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the functions Changes to functions implementation label Apr 8, 2026
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continually improving performance, @neilconway!

Do you mind filing a followup issue to see if the Spark versions of array functions need these optimizations? Or did you check that already?

@mbutrovich mbutrovich added this pull request to the merge queue Apr 10, 2026
Merged via the queue into apache:main with commit eaf0a41 Apr 10, 2026
31 checks passed
@neilconway neilconway deleted the neilc/perf-array-has-nulls branch April 10, 2026 20:42
alamb added a commit to apache/arrow-rs that referenced this pull request Apr 14, 2026
…on_many` (#9692)

## Which issue does this PR close?

- Closes #8809.

## Rationale for this change

Several DataFusion PRs
([#21464](apache/datafusion#21464),
[#21468](apache/datafusion#21468),
[#21471](apache/datafusion#21471),
[#21475](apache/datafusion#21475),
[#21477](apache/datafusion#21477),
[#21482](apache/datafusion#21482),
[#21532](apache/datafusion#21532)) optimize NULL
handling in scalar functions by replacing row-by-row null buffer
construction with bulk `NullBuffer::union`. When 3+ null buffers need
combining, they chain binary `union` calls, each allocating a new
`BooleanBuffer`.

`NullBuffer::union_many` reduces this to 1 allocation (clone + in-place
ANDs). For example, from
[#21482](apache/datafusion#21482):

Before:
```rust
[array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())]
    .into_iter()
    .fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls))
```
After:
```rust
NullBuffer::union_many([
    array.nulls(),
    from_array.nulls(),
    to_array.nulls(),
    stride.and_then(|s| s.nulls()),
])
```

Per @alamb's
[suggestion](#9692 (comment)),
this PR also implements the general-purpose mutable bitwise operations
on `BooleanArray` from #8809, following the `PrimitiveArray::unary` /
`unary_mut` pattern. This builds on the
`BitAndAssign`/`BitOrAssign`/`BitXorAssign` operators added to
`BooleanBuffer` in #9567.

## What changes are included in this PR?

**`NullBuffer::union_many(impl IntoIterator<Item =
Option<&NullBuffer>>)`**: combines multiple null buffers in a single
allocation (clone + in-place `&=`). Used by DataFusion for bulk null
handling.

**`BooleanArray` bitwise operations** (6 new public methods):

Unary (`op: FnMut(u64) -> u64`):
- `bitwise_unary(&self, op)` — always allocates a new array
- `bitwise_unary_mut(self, op) -> Result<Self, Self>` — in-place if
uniquely owned, `Err(self)` if shared
- `bitwise_unary_mut_or_clone(self, op)` — in-place if uniquely owned,
allocates if shared

Binary (`op: FnMut(u64, u64) -> u64`):
- `bitwise_bin_op(&self, rhs, op)` — always allocates, unions null
buffers
- `bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self>` — in-place
if uniquely owned, `Err(self)` if shared, unions null buffers
- `bitwise_bin_op_mut_or_clone(self, rhs, op)` — in-place if uniquely
owned, allocates if shared, unions null buffers

Note: #8809 proposed the binary variants take a raw buffer and
`right_offset_in_bits`. This PR takes `&BooleanArray` instead, which
encapsulates both and matches existing patterns like
`BooleanArray::from_binary`.

## Are these changes tested?

Yes. 23 tests for the `BooleanArray` bitwise methods and 6 tests for
`union_many`, covering:
- Basic correctness (AND, OR, NOT)
- Null handling (both nullable, one nullable, no nulls, null union)
- Buffer ownership (uniquely owned → in-place, shared → `Err` /
fallback)
- Edge cases (empty arrays, sliced arrays with non-zero offset,
misaligned left/right offsets)

## Are there any user-facing changes?

Six new public methods on `BooleanArray` and one new public method on
`NullBuffer`.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Rich-T-kid pushed a commit to Rich-T-kid/datafusion that referenced this pull request Apr 21, 2026
## Which issue does this PR close?

- Closes apache#21470.

## Rationale for this change

`array_has` uses `BooleanArray::builder` to construct its results. This
updates the NULL buffer in an incremental fashion (row-by-row). It is
more efficient to use `BooleanBufferBuilder` to construct the results
and separately construct the output NULL buffer via `NullBuffer::union`
or similar.

Benchmarks (AR64):
```
  array_has_i64 (scalar needle path)
  - found/10: 51.7 µs -> 49.2 µs, -4.7%
  - not_found/10: 43.2 µs -> 40.0 µs, -7.2%
  - found/100: 162.0 µs -> 152.3 µs, -6.3%
  - not_found/100: 143.1 µs -> 135.8 µs, -5.2%
  - found/500: 623.4 µs -> 572.5 µs, -8.4%
  - not_found/500: 593.4 µs -> 560.5 µs, -5.5%

  array_has_all (row-converter path)
  - all_found_small_needle/10: 699.4 µs -> 641.1 µs, -8.1%
  - not_all_found/10: 533.3 µs -> 470.7 µs, -9.5%
  - all_found_small_needle/100: 5.94 ms -> 4.49 ms, -24.4%
  - not_all_found/100: 4.60 ms -> 3.83 ms, -16.7%
  - all_found_small_needle/500: 34.6 ms -> 31.7 ms, -8.3%
  - not_all_found/500: 30.4 ms -> 28.4 ms, -6.7%

  array_has_any (row-converter path)
  - some_match/10: 619.7 µs -> 563.2 µs, -9.5%
  - no_match/10: 1.11 ms -> 1.04 ms, -0.1%
  - scalar_some_match/10: 467.4 µs -> 444.6 µs, -4.2%
  - scalar_no_match/10: 1.15 ms -> 994.5 µs, -14.5%
  - some_match/100: 4.98 ms -> 4.35 ms, -12.6%
  - no_match/100: 10.47 ms -> 8.90 ms, -15.0%
  - scalar_some_match/100: 4.63 ms -> 4.19 ms, -9.4%
  - scalar_no_match/100: 10.40 ms -> 9.98 ms, -4.1%
  - some_match/500: 31.8 ms -> 28.9 ms, -9.1%
  - no_match/500: 66.1 ms -> 58.3 ms, -11.8%
  - scalar_some_match/500: 25.6 ms -> 23.5 ms, -8.3%
  - scalar_no_match/500: 58.8 ms -> 54.1 ms, -8.0%

  array_has_strings (scalar needle path)
  - found/10: 396.3 µs -> 364.7 µs, -8.5%
  - not_found/10: 69.9 µs -> 67.5 µs, -3.6%
  - found/100: 1.34 ms -> 1.25 ms, -7.0%
  - not_found/100: 2.53 ms -> 2.37 ms, -6.1%
  - found/500: 3.36 ms -> 3.11 ms, -7.5%
  - not_found/500: 8.59 ms -> 8.12 ms, -5.5%

  array_has_all_strings (string fast path)
  - all_found/10: 1.08 ms -> 1.01 ms, -7.7%
  - not_all_found/10: 659.0 µs -> 632.5 µs, -3.9%
  - all_found/100: 5.50 ms -> 5.24 ms, -4.6%
  - not_all_found/100: 4.77 ms -> 4.58 ms, -4.0%
  - all_found/500: 27.4 ms -> 26.2 ms, -4.6%
  - not_all_found/500: 30.0 ms -> 28.8 ms, -3.9%

  array_has_any_strings (string fast path)
  - some_match/10: 946.5 µs -> 872.9 µs, -6.7%
  - no_match/10: 1.20 ms -> 1.12 ms, -6.0%
  - scalar_some_match/10: 420.4 µs -> 375.7 µs, -8.7%
  - scalar_no_match/10: 344.0 µs -> 309.5 µs, -10.1%
  - some_match/100: 4.93 ms -> 4.76 ms, -3.4%
  - no_match/100: 8.76 ms -> 8.50 ms, -2.9%
  - scalar_some_match/100: 1.49 ms -> 1.40 ms, -6.0%
  - scalar_no_match/100: 2.94 ms -> 2.72 ms, -7.8%
  - some_match/500: 24.0 ms -> 23.4 ms, -2.7%
  - no_match/500: 57.0 ms -> 55.8 ms, -2.1%
  - scalar_some_match/500: 5.45 ms -> 5.07 ms, -7.0%
  - scalar_no_match/500: 34.5 ms -> 32.9 ms, -4.5%

  array_has_any_scalar (varying scalar size)
  - i64_no_match/1: 173.7 µs -> 162.6 µs, -4.3%
  - i64_no_match/10: 139.8 µs -> 125.7 µs, -9.5%
  - i64_no_match/100: 264.8 µs -> 253.7 µs, -4.6%
  - i64_no_match/1000: 155.4 µs -> 142.4 µs, -8.0%
  - string_no_match/1: 125.1 µs -> 107.5 µs, -14.3%
  - string_no_match/10: 164.8 µs -> 147.0 µs, -10.1%
  - string_no_match/100: 257.8 µs -> 243.4 µs, -5.8%
  - string_no_match/1000: 180.9 µs -> 164.5 µs, -8.9%
```

## What changes are included in this PR?

* Implement optimization, minor code cleanup

## Are these changes tested?

Yes.

## Are there any user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize NULL handling for array_has

2 participants