Skip to content

[C++][Parquet] Write Arrow relies on unspecified behavior for nested types #25665

@asfimport

Description

@asfimport

parquet/column_writer.cc WriteArrow implementations at certain points checks null counts/required data and passes through the null bitmap for encoding.  This only works for nested data types if the if the null slot on a parent implies a null slot on the leaf.  This relationship is not required by the specifications.

 

Most paths for creating arrays follow this pattern so it would be esoteric to hit this bug, but we should still fix it.

 

All branches that rely on reading nullness should generate a new null bitmap based on definition levels if the column is nested, and decisions should be based off of that.

Reporter: Micah Kornfield / @emkornfield
Assignee: Micah Kornfield / @emkornfield

PRs and other links:

Note: This issue was originally created as ARROW-9603. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions