Skip to content

[C++] Should FieldRef::Get* for a StructArray return the "raw" child array or the "flattened" field array? #14946

@jorisvandenbossche

Description

@jorisvandenbossche

Related to the discussion in #14697 (comment) (but also to the python equivalent APIs like in #14781 (comment)).

Currently, the FieldRef/FieldPath methods to get this field from a StructArray returns the child array (i.e. FieldRef::GetOne/GetAll or FieldPath::Get() when passing those methods a StructArray (or RecordBatch with struct typed field).
Basically, under the hood, it does struct_array.data().child_data[idx].

On the other hand, if you use the same FieldRef/FieldPath object in a compute context, and for example pass this to the struct_field kernel, the result you get is the "flattened" field array (I find the "flatten" term somewhat confusing), which combines the top-level validity bitmap of the struct array with the validity bitmap of the child array, to ensure you have a correct indication of missing values in the result.
Basically, under the hood this kernel is doing struct_array.GetFlattenedField(idx) instead.

So I am wondering if those two usage patterns of FieldRefs should be consistent? Although that probably depends on what the intended use case is for the direct FieldRef::GetOne / FieldPath::Get ? (currently, I don't see those used in actual code, except as helper in some testing code)

cc @pitrou @westonpace @lidavidm

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions