Realized this issue while working on #9610, basically something like the following test will fail because the null buffer gets dropped without applying it to the data itself.
#[test]
fn test_variant_get_perfectly_shredded_binary_view_preserves_top_level_nulls() {
let metadata =
BinaryViewArray::from_iter_values(std::iter::repeat_n(EMPTY_VARIANT_METADATA_BYTES, 3));
let typed_value: ArrayRef = Arc::new(BinaryViewArray::from(vec![
Some(b"Apache" as &[u8]),
Some(b"masked-null" as &[u8]),
Some(b"Parquet-variant" as &[u8]),
]));
let variant_array: ArrayRef = VariantArray::from_parts(
metadata,
None,
Some(typed_value),
Some(NullBuffer::from(vec![true, false, true])),
)
.into();
let result = variant_get(
&variant_array,
GetOptions::new().with_as_type(Some(FieldRef::from(Field::new(
"result",
DataType::BinaryView,
true,
)))),
)
.unwrap();
let result = result.as_binary_view();
assert_eq!(result.len(), 3);
assert_eq!(result.null_count(), 1); // This will be 0
}
Some types do actually behave correctly, like Binary because they go through the canonicalization code path which results in a correct outcome, like in:
#[test]
fn test_variant_get_perfectly_shredded_binary_preserves_top_level_nulls() {
let metadata =
BinaryViewArray::from_iter_values(std::iter::repeat_n(EMPTY_VARIANT_METADATA_BYTES, 3));
let typed_value: ArrayRef = Arc::new(BinaryArray::from(vec![
Some(b"Apache" as &[u8]),
Some(b"masked-null" as &[u8]),
Some(b"Parquet-variant" as &[u8]),
]));
let typed_value_dt = typed_value.data_type().clone();
let variant_array: ArrayRef = VariantArray::from_parts(
metadata,
None,
Some(typed_value),
Some(NullBuffer::from(vec![true, false, true])),
)
.into();
let result = variant_get(
&variant_array,
GetOptions::new().with_as_type(Some(FieldRef::from(Field::new(
"result",
DataType::Binary,
true,
)))),
)
.unwrap();
let result = result.as_binary::<i32>();
assert_eq!(result.len(), 3);
assert_eq!(result.null_count(), 1);
assert_eq!(result.data_type(), &typed_value_dt);
}
Realized this issue while working on #9610, basically something like the following test will fail because the null buffer gets dropped without applying it to the data itself.
Some types do actually behave correctly, like
Binarybecause they go through the canonicalization code path which results in a correct outcome, like in: