Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 29 additions & 6 deletions rust/arrow/src/compute/kernels/comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,20 @@ macro_rules! compare_op {
let null_bit_buffer =
combine_option_bitmap($left.data_ref(), $right.data_ref(), $left.len())?;

let mut result = BooleanBufferBuilder::new($left.len());
let byte_capacity = bit_util::ceil($left.len(), 8);
let actual_capacity = bit_util::round_upto_multiple_of_64(byte_capacity);
let mut buffer = MutableBuffer::new(actual_capacity);
buffer.resize(byte_capacity);
let data = buffer.raw_data_mut();

for i in 0..$left.len() {
result.append($op($left.value(i), $right.value(i)))?;
if $op($left.value(i), $right.value(i)) {
// SAFETY: this is safe as `data` has at least $left.len() elements.
// and `i` is bound by $left.len()
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions that use this macro should now be declared unsafe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert the change with the asserts for now. I think the value functions should be marked unsafe that don't perform bound checks, and other functions that can trigger UB. The macro's themselves should be "relatively" safe I think, as long as the .len() value is correct.
I think a better solution for the future would be to have a safe iterator that doesn't do bound checking, so I think it's better to move the particular change out of this PR for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, @Dandandan . Note that all primitives and strings have iterators and FromIterator: https://github.com/apache/arrow/blob/master/rust/arrow/src/array/iterator.rs , but they are for Option<T>, not T.

I agree with you that we should mark that fn value as unsafe and offer an iterator over T (besides the one over Option<T>). That UB is really obvious and it is also a security vulnerability causing an escalation of privileges as it allows privileged access to the application's memory via out of bounds accesses.

I usually see the iterator over T when they can mask the result or OR / AND the null bitmaps, while Option<T> is used when that is not possible / useful.

bit_util::set_bit_raw(data, i);
}
}
}

let data = ArrayData::new(
Expand All @@ -58,7 +69,7 @@ macro_rules! compare_op {
None,
null_bit_buffer,
0,
vec![result.finish()],
vec![buffer.freeze()],
vec![],
);
Ok(BooleanArray::from(Arc::new(data)))
Expand All @@ -68,9 +79,21 @@ macro_rules! compare_op {
macro_rules! compare_op_scalar {
($left: expr, $right:expr, $op:expr) => {{
let null_bit_buffer = $left.data().null_buffer().cloned();
let mut result = BooleanBufferBuilder::new($left.len());

let byte_capacity = bit_util::ceil($left.len(), 8);
let actual_capacity = bit_util::round_upto_multiple_of_64(byte_capacity);
let mut buffer = MutableBuffer::new(actual_capacity);
buffer.resize(byte_capacity);
let data = buffer.raw_data_mut();

for i in 0..$left.len() {
result.append($op($left.value(i), $right))?;
if $op($left.value(i), $right) {
// SAFETY: this is safe as `data` has at least $left.len() elements
// and `i` is bound by $left.len()
unsafe {
bit_util::set_bit_raw(data, i);
}
}
}

let data = ArrayData::new(
Expand All @@ -79,7 +102,7 @@ macro_rules! compare_op_scalar {
None,
null_bit_buffer,
0,
vec![result.finish()],
vec![buffer.freeze()],
vec![],
);
Ok(BooleanArray::from(Arc::new(data)))
Expand Down