Skip to content

Perfectly shredded arrays with top-level null values loss nullability when typed_value is extracted #9701

@AdamGS

Description

@AdamGS

Realized this issue while working on #9610, basically something like the following test will fail because the null buffer gets dropped without applying it to the data itself.

    #[test]
    fn test_variant_get_perfectly_shredded_binary_view_preserves_top_level_nulls() {
        let metadata =
            BinaryViewArray::from_iter_values(std::iter::repeat_n(EMPTY_VARIANT_METADATA_BYTES, 3));
        let typed_value: ArrayRef = Arc::new(BinaryViewArray::from(vec![
            Some(b"Apache" as &[u8]),
            Some(b"masked-null" as &[u8]),
            Some(b"Parquet-variant" as &[u8]),
        ]));
        let variant_array: ArrayRef = VariantArray::from_parts(
            metadata,
            None,
            Some(typed_value),
            Some(NullBuffer::from(vec![true, false, true])),
        )
        .into();

        let result = variant_get(
            &variant_array,
            GetOptions::new().with_as_type(Some(FieldRef::from(Field::new(
                "result",
                DataType::BinaryView,
                true,
            )))),
        )
        .unwrap();

        let result = result.as_binary_view();
        assert_eq!(result.len(), 3);
        assert_eq!(result.null_count(), 1); // This will be 0
    }

Some types do actually behave correctly, like Binary because they go through the canonicalization code path which results in a correct outcome, like in:

    #[test]
    fn test_variant_get_perfectly_shredded_binary_preserves_top_level_nulls() {
        let metadata =
            BinaryViewArray::from_iter_values(std::iter::repeat_n(EMPTY_VARIANT_METADATA_BYTES, 3));
        let typed_value: ArrayRef = Arc::new(BinaryArray::from(vec![
            Some(b"Apache" as &[u8]),
            Some(b"masked-null" as &[u8]),
            Some(b"Parquet-variant" as &[u8]),
        ]));
        let typed_value_dt = typed_value.data_type().clone();
        let variant_array: ArrayRef = VariantArray::from_parts(
            metadata,
            None,
            Some(typed_value),
            Some(NullBuffer::from(vec![true, false, true])),
        )
        .into();

        let result = variant_get(
            &variant_array,
            GetOptions::new().with_as_type(Some(FieldRef::from(Field::new(
                "result",
                DataType::Binary,
                true,
            )))),
        )
        .unwrap();

        let result = result.as_binary::<i32>();
        assert_eq!(result.len(), 3);
        assert_eq!(result.null_count(), 1);
        assert_eq!(result.data_type(), &typed_value_dt);
    }

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions