Add mutable operations to BooleanBuffer (Bit*Assign)#9567
Conversation
alamb
left a comment
There was a problem hiding this comment.
The code looks good to me -- thanks @Dandandan
I do think we need some tests and some additional documentation would help
Here is a proposal:
| self.buffer.claim(pool); | ||
| } | ||
|
|
||
| /// Apply a bitwise binary operation in-place, avoiding allocation if the |
There was a problem hiding this comment.
I think we need some tests for this code
Suggestions for "Add mutable operations to BooleanBuffer (Bit*Assign)"
|
🚀 |
|
Thanks @Dandandan and @alamb. This will be helpful for me. I am increasingly refactoring some operations to "un-fuse" them actually, where we have complicated operations in a for loop that limits vectorization. I'm finding it faster to break them into stages, apply predicates, and build up boolean selection vectors to do a final output at the end. I'm hoping to adopt this API in the future. |
|
That is great @mbutrovich I also personally would love to move towards more buffer reuse Here is a ticket with some additional ideas It would be great to know if any of those APIs would be useful to comet as ewell |
…on_many` (#9692) ## Which issue does this PR close? - Closes #8809. ## Rationale for this change Several DataFusion PRs ([#21464](apache/datafusion#21464), [#21468](apache/datafusion#21468), [#21471](apache/datafusion#21471), [#21475](apache/datafusion#21475), [#21477](apache/datafusion#21477), [#21482](apache/datafusion#21482), [#21532](apache/datafusion#21532)) optimize NULL handling in scalar functions by replacing row-by-row null buffer construction with bulk `NullBuffer::union`. When 3+ null buffers need combining, they chain binary `union` calls, each allocating a new `BooleanBuffer`. `NullBuffer::union_many` reduces this to 1 allocation (clone + in-place ANDs). For example, from [#21482](apache/datafusion#21482): Before: ```rust [array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())] .into_iter() .fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls)) ``` After: ```rust NullBuffer::union_many([ array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls()), ]) ``` Per @alamb's [suggestion](#9692 (comment)), this PR also implements the general-purpose mutable bitwise operations on `BooleanArray` from #8809, following the `PrimitiveArray::unary` / `unary_mut` pattern. This builds on the `BitAndAssign`/`BitOrAssign`/`BitXorAssign` operators added to `BooleanBuffer` in #9567. ## What changes are included in this PR? **`NullBuffer::union_many(impl IntoIterator<Item = Option<&NullBuffer>>)`**: combines multiple null buffers in a single allocation (clone + in-place `&=`). Used by DataFusion for bulk null handling. **`BooleanArray` bitwise operations** (6 new public methods): Unary (`op: FnMut(u64) -> u64`): - `bitwise_unary(&self, op)` — always allocates a new array - `bitwise_unary_mut(self, op) -> Result<Self, Self>` — in-place if uniquely owned, `Err(self)` if shared - `bitwise_unary_mut_or_clone(self, op)` — in-place if uniquely owned, allocates if shared Binary (`op: FnMut(u64, u64) -> u64`): - `bitwise_bin_op(&self, rhs, op)` — always allocates, unions null buffers - `bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self>` — in-place if uniquely owned, `Err(self)` if shared, unions null buffers - `bitwise_bin_op_mut_or_clone(self, rhs, op)` — in-place if uniquely owned, allocates if shared, unions null buffers Note: #8809 proposed the binary variants take a raw buffer and `right_offset_in_bits`. This PR takes `&BooleanArray` instead, which encapsulates both and matches existing patterns like `BooleanArray::from_binary`. ## Are these changes tested? Yes. 23 tests for the `BooleanArray` bitwise methods and 6 tests for `union_many`, covering: - Basic correctness (AND, OR, NOT) - Null handling (both nullable, one nullable, no nulls, null union) - Buffer ownership (uniquely owned → in-place, shared → `Err` / fallback) - Edge cases (empty arrays, sliced arrays with non-zero offset, misaligned left/right offsets) ## Are there any user-facing changes? Six new public methods on `BooleanArray` and one new public method on `NullBuffer`. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
BooleanArraythat potentially reuse the underlying allocation #8809Rationale for this change
I want to avoid allocating a new buffer when doing
&.We can use
&=this way.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?