Skip to content

[C++] Don't recursively produce nulls when appending nulls to a FixedSizeListBuilder #41188

@felipecrv

Description

@felipecrv

Describe the enhancement requested

The spec says that inner values of a Fixed-Size List are unspecified [1] if the list value is null. This exists (I suppose) to preserve the invariant that we can always set a value on any array to null by flipping a single bit on the bitmap [2]. So there was never a guarantee that a null fixed-size list value is aligned to a block of corresponding nulls at the child values array.

And yet, when FixedSizeListBuilder::AppendNull() is called, it populates the inner values array with nulls.

This means that an array that wouldn't necessarily require an internal validity bitmap like [[1, 2], null, [5, 6]] ends up getting 1 extra validity bitmap. The problem is even worse if the fixed-size lists are nested deeply — [[[1, 2], [3, 4]], null, [[9, 10], [11, 12]]] would require 3 validity bitmaps, but only the top-level one is necessary.

I propose that we ask the values builder to append empty values to the child array instead of nulls.

In summary, this is my proposal:

 Status FixedSizeListBuilder::AppendNull() {
   RETURN_NOT_OK(Reserve(1));
   UnsafeAppendToBitmap(false);
-  return value_builder_->AppendNulls(list_size_);
+  return value_builder_->AppendEmptyValues(list_size_);
 }

 Status FixedSizeListBuilder::AppendNulls(int64_t length) {
   RETURN_NOT_OK(Reserve(length));
   UnsafeAppendToBitmap(length, false);
-  return value_builder_->AppendNulls(list_size_ * length);
+  return value_builder_->AppendEmptyValues(list_size_ * length);
 }

Risks: there is a risk that previously written code assumes that scalar values in the child array of a fixed-size list are null if the list value itself is null, but that's a very fragile assumption to begin with, because (1) the spec doesn't promise this, and more importantly (2) other ways of building fixed-size list arrays don't behave like this.

[1] https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout
[2] on the types that support validity bitmaps, which are almost all types

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions