Skip to content

[C++] Handling "logical" nulls: add GetLogicalNullCount / update IsNull() to check for logical null #34361

@jorisvandenbossche

Description

@jorisvandenbossche

There are some data types where the nulls are not stored "physically" using a validity bitmap on the parent ArrayData, but through nulls in child data:

(sidenote: Dictionary arrays could be considered here as well, but are a bit a mixed bag: there is a top-level null count through nulls in the indices, but additionally also the dictionary can contain nulls. So nulls can be encoded in two different ways)

The format specification has a "null_count" (in the IPC FieldNode in the Recordbatch message, and in the C Data Interface), and in those cases this refers to the "physical" null count. And this is followed by the C++ implementation, where the base Array::null_count() (implemented by ArrayData::GetNullCount()) looks at the validity buffer (typically the first buffer) to count the the unset bits, or directly return 0 if there is no validity buffer.

However, in practice you often want to know if there are actual "logical" nulls (not considering those leads to bugs, for example #34315).

@felipecrv @westonpace and I had some discussion about this on zulip (https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Null.20count.20of.20a.20UnionArray/near/336538844), and I think our current idea would be:

  • Add a GetLogicalNullCount to complement the existing Array::null_count() / ArrayData::GetNullCount()(changing null_count() itself might be too much of a breaking change? And would also create an inconsistency with where this is used in the specs)

  • Change Array::IsNull(i) to consider logical nulls instead of just physical nulls

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions