perf: Optimize NULL handling in `lcm`, `gcd` by neilconway · Pull Request #21468 · apache/datafusion

neilconway · 2026-04-08T13:42:13Z

Which issue does this PR close?

Closes Optimize NULL handling for lcm, gcd #21467.

Rationale for this change

This PR implements three distinct optimizations:

lcm was computing the result NULL buffer iteratively. This is relatively slow. Switching to Arrow's try_binary kernel makes the implementation more concise and also improves performance by computing the result NULL buffer via the bitwise union of the input NULL buffers.
The gcd scalar arg path was doing similarly; switching to Arrow's try_unary yields a similar speedup.
For the gcd scalar path, computing the GCD can only fail in a few edge cases (e.g., gcd(i64::MIN, i64::MIN)). It is cheap to check for these edge-cases; for most gcd inputs, we can use Arrow's unary kernel instead of try_unary. The former is more efficient because it allows LLVM to vectorize the code more effectively.

Benchmarks (ARM64):

  - gcd array and scalar: 2.9ms → 2.2ms, -25% faster
  - lcm both array: 2.7ms → 2.0ms, -26% faster

What changes are included in this PR?

Add benchmark for lcm
Improve SLT test coverage
Move Rust unit test for lcm to SLT
Optimize lcm and gcm NULL handling
Optimize gcm to avoid overhead for edge cases

Are these changes tested?

Yes. Benchmark results above. I inspected the generated code for the gcd case to confirm that LLVM is able to generate better code for the unary case than for the try_unary case.

Are there any user-facing changes?

No.

mbutrovich

Thanks @neilconway! 🚀

mbutrovich · 2026-04-10T15:55:45Z

+    let prim = arr.as_primitive::<Int64Type>();
    match scalar {
+        Some(scalar_value) if scalar_value != 0 && scalar_value != i64::MIN => {
+            // The gcd result divides both inputs' absolute values. When the


This is a slick optimization.

@alamb

…on_many` (#9692) ## Which issue does this PR close? - Closes #8809. ## Rationale for this change Several DataFusion PRs ([#21464](apache/datafusion#21464), [#21468](apache/datafusion#21468), [#21471](apache/datafusion#21471), [#21475](apache/datafusion#21475), [#21477](apache/datafusion#21477), [#21482](apache/datafusion#21482), [#21532](apache/datafusion#21532)) optimize NULL handling in scalar functions by replacing row-by-row null buffer construction with bulk `NullBuffer::union`. When 3+ null buffers need combining, they chain binary `union` calls, each allocating a new `BooleanBuffer`. `NullBuffer::union_many` reduces this to 1 allocation (clone + in-place ANDs). For example, from [#21482](apache/datafusion#21482): Before: ```rust [array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())] .into_iter() .fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls)) ``` After: ```rust NullBuffer::union_many([ array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls()), ]) ``` Per @alamb's [suggestion](#9692 (comment)), this PR also implements the general-purpose mutable bitwise operations on `BooleanArray` from #8809, following the `PrimitiveArray::unary` / `unary_mut` pattern. This builds on the `BitAndAssign`/`BitOrAssign`/`BitXorAssign` operators added to `BooleanBuffer` in #9567. ## What changes are included in this PR? **`NullBuffer::union_many(impl IntoIterator<Item = Option<&NullBuffer>>)`**: combines multiple null buffers in a single allocation (clone + in-place `&=`). Used by DataFusion for bulk null handling. **`BooleanArray` bitwise operations** (6 new public methods): Unary (`op: FnMut(u64) -> u64`): - `bitwise_unary(&self, op)` — always allocates a new array - `bitwise_unary_mut(self, op) -> Result<Self, Self>` — in-place if uniquely owned, `Err(self)` if shared - `bitwise_unary_mut_or_clone(self, op)` — in-place if uniquely owned, allocates if shared Binary (`op: FnMut(u64, u64) -> u64`): - `bitwise_bin_op(&self, rhs, op)` — always allocates, unions null buffers - `bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self>` — in-place if uniquely owned, `Err(self)` if shared, unions null buffers - `bitwise_bin_op_mut_or_clone(self, rhs, op)` — in-place if uniquely owned, allocates if shared, unions null buffers Note: #8809 proposed the binary variants take a raw buffer and `right_offset_in_bits`. This PR takes `&BooleanArray` instead, which encapsulates both and matches existing patterns like `BooleanArray::from_binary`. ## Are these changes tested? Yes. 23 tests for the `BooleanArray` bitwise methods and 6 tests for `union_many`, covering: - Basic correctness (AND, OR, NOT) - Null handling (both nullable, one nullable, no nulls, null union) - Buffer ownership (uniquely owned → in-place, shared → `Err` / fallback) - Edge cases (empty arrays, sliced arrays with non-zero offset, misaligned left/right offsets) ## Are there any user-facing changes? Six new public methods on `BooleanArray` and one new public method on `NullBuffer`. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

## Which issue does this PR close? - Closes apache#21467. ## Rationale for this change This PR implements three distinct optimizations: 1. `lcm` was computing the result NULL buffer iteratively. This is relatively slow. Switching to Arrow's `try_binary` kernel makes the implementation more concise and also improves performance by computing the result NULL buffer via the bitwise union of the input NULL buffers. 2. The `gcd` scalar arg path was doing similarly; switching to Arrow's `try_unary` yields a similar speedup. 3. For the `gcd` scalar path, computing the GCD can only fail in a few edge cases (e.g., `gcd(i64::MIN, i64::MIN)`). It is cheap to check for these edge-cases; for most `gcd` inputs, we can use Arrow's `unary` kernel instead of `try_unary`. The former is more efficient because it allows LLVM to vectorize the code more effectively. Benchmarks (ARM64): ``` - gcd array and scalar: 2.9ms → 2.2ms, -25% faster - lcm both array: 2.7ms → 2.0ms, -26% faster ``` ## What changes are included in this PR? * Add benchmark for `lcm` * Improve SLT test coverage * Move Rust unit test for `lcm` to SLT * Optimize `lcm` and `gcm` NULL handling * Optimize `gcm` to avoid overhead for edge cases ## Are these changes tested? Yes. Benchmark results above. I inspected the generated code for the `gcd` case to confirm that LLVM is able to generate better code for the `unary` case than for the `try_unary` case. ## Are there any user-facing changes? No.

neilconway added 2 commits April 8, 2026 09:09

Add benchmark for lcm

68054b5

.

c3e1367

github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 8, 2026

neilconway changed the title ~~perf: Optimize lcd, gcd NULL handling~~ perf: Optimize NULL handling in lcd, gcd Apr 9, 2026

mbutrovich changed the title ~~perf: Optimize NULL handling in lcd, gcd~~ perf: Optimize NULL handling in lcm, gcd Apr 10, 2026

mbutrovich mentioned this pull request Apr 10, 2026

Add mutable bitwise operations to BooleanArray and NullBuffer::union_many apache/arrow-rs#9692

Merged

mbutrovich approved these changes Apr 10, 2026

View reviewed changes

mbutrovich added this pull request to the merge queue Apr 10, 2026

Merged via the queue into apache:main with commit ad7c57a Apr 10, 2026
39 checks passed

mbutrovich mentioned this pull request Apr 10, 2026

feat: Add immediate mode option for native shuffle apache/datafusion-comet#3845

Draft

neilconway deleted the neilc/perf-lcm-gcd-nulls branch April 10, 2026 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize NULL handling in `lcm`, `gcd`#21468

perf: Optimize NULL handling in `lcm`, `gcd`#21468
mbutrovich merged 2 commits intoapache:mainfrom
neilconway:neilc/perf-lcm-gcd-nulls

neilconway commented Apr 8, 2026 •

edited by mbutrovich

Loading

Uh oh!

mbutrovich left a comment

Uh oh!

mbutrovich Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neilconway commented Apr 8, 2026 • edited by mbutrovich Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neilconway commented Apr 8, 2026 •

edited by mbutrovich

Loading