Skip to content

[C++][Parquet] Support min_value and max_value Statistics #14870

@pitrou

Description

@pitrou

Describe the enhancement requested

The Statistics structure in Parquet files provides two ways of specifying lower and upper bounds for a data page:

  • min and max are legacy fields for compatibility with older writers, with ill-defined comparison semantics in most cases except for signed integers
  • min_value and max_value are "new" fields (introduced in 2017! - see apache/parquet-format@041708d and apache/parquet-format@bef5438) with well-defined comparison semantics depending on the logical type

Currently Parquet C++ supports only the legacy fields min and max. We should add support for reading and writing the newer ones, with the appropriate semantics on the write path.

Component(s)

Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions