Skip to content

Add mutable operations to BooleanBuffer (Bit*Assign)#9567

Merged
alamb merged 5 commits intoapache:mainfrom
Dandandan:add_mutable
Mar 18, 2026
Merged

Add mutable operations to BooleanBuffer (Bit*Assign)#9567
alamb merged 5 commits intoapache:mainfrom
Dandandan:add_mutable

Conversation

@Dandandan
Copy link
Copy Markdown
Contributor

@Dandandan Dandandan commented Mar 17, 2026

Which issue does this PR close?

Rationale for this change

I want to avoid allocating a new buffer when doing &.
We can use &= this way.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Mar 17, 2026
@Dandandan Dandandan changed the title Add mutable operations to BooleanBuffer Add mutable operations to BooleanBuffer (Bit*Assign) Mar 17, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me -- thanks @Dandandan

I do think we need some tests and some additional documentation would help

Here is a proposal:

Comment thread arrow-buffer/src/buffer/boolean.rs Outdated
self.buffer.claim(pool);
}

/// Apply a bitwise binary operation in-place, avoiding allocation if the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some tests for this code

Suggestions for "Add mutable operations to BooleanBuffer (Bit*Assign)"
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan

@alamb alamb merged commit e3926a9 into apache:main Mar 18, 2026
26 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

🚀

@mbutrovich
Copy link
Copy Markdown
Contributor

Thanks @Dandandan and @alamb. This will be helpful for me. I am increasingly refactoring some operations to "un-fuse" them actually, where we have complicated operations in a for loop that limits vectorization. I'm finding it faster to break them into stages, apply predicates, and build up boolean selection vectors to do a final output at the end. I'm hoping to adopt this API in the future.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

That is great @mbutrovich

I also personally would love to move towards more buffer reuse

Here is a ticket with some additional ideas

It would be great to know if any of those APIs would be useful to comet as ewell

alamb added a commit that referenced this pull request Apr 14, 2026
…on_many` (#9692)

## Which issue does this PR close?

- Closes #8809.

## Rationale for this change

Several DataFusion PRs
([#21464](apache/datafusion#21464),
[#21468](apache/datafusion#21468),
[#21471](apache/datafusion#21471),
[#21475](apache/datafusion#21475),
[#21477](apache/datafusion#21477),
[#21482](apache/datafusion#21482),
[#21532](apache/datafusion#21532)) optimize NULL
handling in scalar functions by replacing row-by-row null buffer
construction with bulk `NullBuffer::union`. When 3+ null buffers need
combining, they chain binary `union` calls, each allocating a new
`BooleanBuffer`.

`NullBuffer::union_many` reduces this to 1 allocation (clone + in-place
ANDs). For example, from
[#21482](apache/datafusion#21482):

Before:
```rust
[array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())]
    .into_iter()
    .fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls))
```
After:
```rust
NullBuffer::union_many([
    array.nulls(),
    from_array.nulls(),
    to_array.nulls(),
    stride.and_then(|s| s.nulls()),
])
```

Per @alamb's
[suggestion](#9692 (comment)),
this PR also implements the general-purpose mutable bitwise operations
on `BooleanArray` from #8809, following the `PrimitiveArray::unary` /
`unary_mut` pattern. This builds on the
`BitAndAssign`/`BitOrAssign`/`BitXorAssign` operators added to
`BooleanBuffer` in #9567.

## What changes are included in this PR?

**`NullBuffer::union_many(impl IntoIterator<Item =
Option<&NullBuffer>>)`**: combines multiple null buffers in a single
allocation (clone + in-place `&=`). Used by DataFusion for bulk null
handling.

**`BooleanArray` bitwise operations** (6 new public methods):

Unary (`op: FnMut(u64) -> u64`):
- `bitwise_unary(&self, op)` — always allocates a new array
- `bitwise_unary_mut(self, op) -> Result<Self, Self>` — in-place if
uniquely owned, `Err(self)` if shared
- `bitwise_unary_mut_or_clone(self, op)` — in-place if uniquely owned,
allocates if shared

Binary (`op: FnMut(u64, u64) -> u64`):
- `bitwise_bin_op(&self, rhs, op)` — always allocates, unions null
buffers
- `bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self>` — in-place
if uniquely owned, `Err(self)` if shared, unions null buffers
- `bitwise_bin_op_mut_or_clone(self, rhs, op)` — in-place if uniquely
owned, allocates if shared, unions null buffers

Note: #8809 proposed the binary variants take a raw buffer and
`right_offset_in_bits`. This PR takes `&BooleanArray` instead, which
encapsulates both and matches existing patterns like
`BooleanArray::from_binary`.

## Are these changes tested?

Yes. 23 tests for the `BooleanArray` bitwise methods and 6 tests for
`union_many`, covering:
- Basic correctness (AND, OR, NOT)
- Null handling (both nullable, one nullable, no nulls, null union)
- Buffer ownership (uniquely owned → in-place, shared → `Err` /
fallback)
- Edge cases (empty arrays, sliced arrays with non-zero offset,
misaligned left/right offsets)

## Are there any user-facing changes?

Six new public methods on `BooleanArray` and one new public method on
`NullBuffer`.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants