perf: Optimize NULL handling in `find_in_set` by neilconway · Pull Request #21464 · apache/datafusion

neilconway · 2026-04-08T12:01:11Z

Which issue does this PR close?

Closes Optimize NULL handling for find_in_set #21463.

Rationale for this change

find_in_set uses PrimitiveArray::<T>::builder to construct its results, which means building the internal null buffer in an iterative fashion (via repeated append_null calls). It is more efficient to construct the null buffer directly, via NullBuffer::union (when multiple arguments might be NULL) or just cloning the input null buffer (when passed a single argument).

Benchmarks (ARM64):

  - find_in_set/string_len_8: 589.0 µs → 529.0 µs (-10.2%)
  - find_in_set/string_len_32: 736.2 µs → 660.5 µs (-10.3%)
  - find_in_set/string_len_1024: 7.5 ms → 7.4 ms (-1.3%)
  - find_in_set/string_view_len_8: 616.0 µs → 579.9 µs (-5.9%)
  - find_in_set/string_view_len_32: 748.0 µs → 701.9 µs (-6.2%)
  - find_in_set/string_view_len_1024: 7.6 ms → 7.6 ms (0.0%)
  - find_in_set_scalar/string_len_8: 76.7 µs → 48.0 µs (-37.4%)
  - find_in_set_scalar/string_len_32: 76.5 µs → 47.6 µs (-37.8%)
  - find_in_set_scalar/string_len_1024: 76.2 µs → 48.0 µs (-37.0%)
  - find_in_set_scalar/string_view_len_8: 81.9 µs → 55.7 µs (-32.0%)
  - find_in_set_scalar/string_view_len_32: 85.5 µs → 56.8 µs (-33.6%)
  - find_in_set_scalar/string_view_len_1024: 85.2 µs → 57.4 µs (-32.6%)

The change should be an improvement for both scalar and array cases. The relative improvement is larger in the scalar case because the scalar case is doing less work and so NULL handling was a larger fraction of the total runtime.

What changes are included in this PR?

Optimize NULL handling for both scalar and array arg cases

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

This reverts commit 6146fe4.

neilconway · 2026-04-08T12:39:12Z

Btw I tried refactoring this to use an explicit for (so we can dispense with the zero dummy assignment), but it was a bit slower than the map approach:

  - find_in_set/string_len_8: 588.6 µs → 530.9 µs (-9.8%)
  - find_in_set/string_len_32: 735.2 µs → 669.5 µs (-8.9%)
  - find_in_set/string_len_1024: 7.5 ms → 7.4 ms (-1.3%)
  - find_in_set/string_view_len_8: 615.7 µs → 581.0 µs (-5.6%)
  - find_in_set/string_view_len_32: 743.8 µs → 708.9 µs (-4.7%)
  - find_in_set/string_view_len_1024: 7.6 ms → 7.6 ms (-0.0%)
  - find_in_set_scalar/string_len_8: 76.7 µs → 56.9 µs (-25.8%)
  - find_in_set_scalar/string_len_32: 76.7 µs → 56.8 µs (-25.9%)
  - find_in_set_scalar/string_len_1024: 76.4 µs → 56.9 µs (-25.5%)
  - find_in_set_scalar/string_view_len_8: 81.5 µs → 59.0 µs (-27.6%)
  - find_in_set_scalar/string_view_len_32: 85.3 µs → 59.4 µs (-30.4%)
  - find_in_set_scalar/string_view_len_1024: 85.0 µs → 59.2 µs (-30.4%)

mbutrovich

Thanks for continually improving performance, @neilconway!

@alamb

…on_many` (#9692) ## Which issue does this PR close? - Closes #8809. ## Rationale for this change Several DataFusion PRs ([#21464](apache/datafusion#21464), [#21468](apache/datafusion#21468), [#21471](apache/datafusion#21471), [#21475](apache/datafusion#21475), [#21477](apache/datafusion#21477), [#21482](apache/datafusion#21482), [#21532](apache/datafusion#21532)) optimize NULL handling in scalar functions by replacing row-by-row null buffer construction with bulk `NullBuffer::union`. When 3+ null buffers need combining, they chain binary `union` calls, each allocating a new `BooleanBuffer`. `NullBuffer::union_many` reduces this to 1 allocation (clone + in-place ANDs). For example, from [#21482](apache/datafusion#21482): Before: ```rust [array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())] .into_iter() .fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls)) ``` After: ```rust NullBuffer::union_many([ array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls()), ]) ``` Per @alamb's [suggestion](#9692 (comment)), this PR also implements the general-purpose mutable bitwise operations on `BooleanArray` from #8809, following the `PrimitiveArray::unary` / `unary_mut` pattern. This builds on the `BitAndAssign`/`BitOrAssign`/`BitXorAssign` operators added to `BooleanBuffer` in #9567. ## What changes are included in this PR? **`NullBuffer::union_many(impl IntoIterator<Item = Option<&NullBuffer>>)`**: combines multiple null buffers in a single allocation (clone + in-place `&=`). Used by DataFusion for bulk null handling. **`BooleanArray` bitwise operations** (6 new public methods): Unary (`op: FnMut(u64) -> u64`): - `bitwise_unary(&self, op)` — always allocates a new array - `bitwise_unary_mut(self, op) -> Result<Self, Self>` — in-place if uniquely owned, `Err(self)` if shared - `bitwise_unary_mut_or_clone(self, op)` — in-place if uniquely owned, allocates if shared Binary (`op: FnMut(u64, u64) -> u64`): - `bitwise_bin_op(&self, rhs, op)` — always allocates, unions null buffers - `bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self>` — in-place if uniquely owned, `Err(self)` if shared, unions null buffers - `bitwise_bin_op_mut_or_clone(self, rhs, op)` — in-place if uniquely owned, allocates if shared, unions null buffers Note: #8809 proposed the binary variants take a raw buffer and `right_offset_in_bits`. This PR takes `&BooleanArray` instead, which encapsulates both and matches existing patterns like `BooleanArray::from_binary`. ## Are these changes tested? Yes. 23 tests for the `BooleanArray` bitwise methods and 6 tests for `union_many`, covering: - Basic correctness (AND, OR, NOT) - Null handling (both nullable, one nullable, no nulls, null union) - Buffer ownership (uniquely owned → in-place, shared → `Err` / fallback) - Edge cases (empty arrays, sliced arrays with non-zero offset, misaligned left/right offsets) ## Are there any user-facing changes? Six new public methods on `BooleanArray` and one new public method on `NullBuffer`. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

## Which issue does this PR close? - Closes apache#21463. ## Rationale for this change `find_in_set` uses `PrimitiveArray::<T>::builder` to construct its results, which means building the internal null buffer in an iterative fashion (via repeated `append_null` calls). It is more efficient to construct the null buffer directly, via `NullBuffer::union` (when multiple arguments might be NULL) or just cloning the input null buffer (when passed a single argument). Benchmarks (ARM64): ``` - find_in_set/string_len_8: 589.0 µs → 529.0 µs (-10.2%) - find_in_set/string_len_32: 736.2 µs → 660.5 µs (-10.3%) - find_in_set/string_len_1024: 7.5 ms → 7.4 ms (-1.3%) - find_in_set/string_view_len_8: 616.0 µs → 579.9 µs (-5.9%) - find_in_set/string_view_len_32: 748.0 µs → 701.9 µs (-6.2%) - find_in_set/string_view_len_1024: 7.6 ms → 7.6 ms (0.0%) - find_in_set_scalar/string_len_8: 76.7 µs → 48.0 µs (-37.4%) - find_in_set_scalar/string_len_32: 76.5 µs → 47.6 µs (-37.8%) - find_in_set_scalar/string_len_1024: 76.2 µs → 48.0 µs (-37.0%) - find_in_set_scalar/string_view_len_8: 81.9 µs → 55.7 µs (-32.0%) - find_in_set_scalar/string_view_len_32: 85.5 µs → 56.8 µs (-33.6%) - find_in_set_scalar/string_view_len_1024: 85.2 µs → 57.4 µs (-32.6%) ``` The change should be an improvement for both scalar and array cases. The relative improvement is larger in the scalar case because the scalar case is doing less work and so NULL handling was a larger fraction of the total runtime. ## What changes are included in this PR? * Optimize NULL handling for both scalar and array arg cases ## Are these changes tested? Yes. ## Are there any user-facing changes? No.

.

f623175

github-actions Bot added the functions Changes to functions implementation label Apr 8, 2026

neilconway added 2 commits April 8, 2026 08:18

Refactor and simplify

6146fe4

Revert "Refactor and simplify"

1bc1a15

This reverts commit 6146fe4.

neilconway mentioned this pull request Apr 10, 2026

perf: Optimize NULL handling in array_remove #21532

Merged

neilconway changed the title ~~perf: Optimize find_in_set NULL handling~~ perf: Optimize NULL handling in find_in_set Apr 10, 2026

mbutrovich mentioned this pull request Apr 10, 2026

Add mutable bitwise operations to BooleanArray and NullBuffer::union_many apache/arrow-rs#9692

Merged

neilconway mentioned this pull request Apr 10, 2026

perf: Optimize Utf8View string concat #21535

Merged

mbutrovich approved these changes Apr 10, 2026

View reviewed changes

mbutrovich added this pull request to the merge queue Apr 10, 2026

Merged via the queue into apache:main with commit d61be49 Apr 10, 2026
32 checks passed

neilconway deleted the neilc/perf-find-in-set-nulls branch April 10, 2026 20:45

mbutrovich mentioned this pull request Apr 14, 2026

feat: add support for array_position expression apache/datafusion-comet#3172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize NULL handling in `find_in_set`#21464

perf: Optimize NULL handling in `find_in_set`#21464
mbutrovich merged 3 commits intoapache:mainfrom
neilconway:neilc/perf-find-in-set-nulls

neilconway commented Apr 8, 2026

Uh oh!

neilconway commented Apr 8, 2026

Uh oh!

mbutrovich left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neilconway commented Apr 8, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Apr 8, 2026

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants