Skip to content

[C++][Parquet] Fix parsing stats from min_value/max_value #34138

@wgtmac

Description

@wgtmac

Describe the bug, including details regarding any error messages, version, and platform.

The code below does not check and read from stats.min_value/max_value. If reading from a parquet file where the stats apply min_value/max_value, we are unable to read any column statistics at all.

// Extracts encoded statistics from V1 and V2 data page headers
template <typename H>
EncodedStatistics ExtractStatsFromHeader(const H& header) {
  EncodedStatistics page_statistics;
  if (!header.__isset.statistics) {
    return page_statistics;
  }
  const format::Statistics& stats = header.statistics;
  if (stats.__isset.max) {
    page_statistics.set_max(stats.max);
  }
  if (stats.__isset.min) {
    page_statistics.set_min(stats.min);
  }
  if (stats.__isset.null_count) {
    page_statistics.set_null_count(stats.null_count);
  }
  if (stats.__isset.distinct_count) {
    page_statistics.set_distinct_count(stats.distinct_count);
  }
  return page_statistics;
}

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions