Is your feature request related to a problem or challenge? Please describe what you are trying to do.
StatisticsConverter::try_new cannot extract Parquet statistics for nested fields. Internally it calls parquet_column() which explicitly bails on nested types:
|
if field.data_type().is_nested() { |
|
// Nested fields are not supported and require non-trivial logic |
|
// to correctly walk the parquet schema accounting for the |
|
// logical type rules - <https://github.com/apache/parquet-format/blob/master/LogicalTypes.md> |
|
// |
|
// For example a ListArray could correspond to anything from 1 to 3 levels |
|
// in the parquet schema |
|
return None; |
|
} |
There is no alternative constructor that accepts a Parquet leaf column index directly, so callers that have already resolved the mapping (like Datafusion's row filter logic) have no way to use StatisticsConverter for nested fields
I'd like to propose adding StatisticsConverter::from_column_index(parquet_column_index, arrow_field, parquet_schema) constructor that bypasses schema resolution and accepts a pre-resolved leaf col index directly
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
StatisticsConverter::try_newcannot extract Parquet statistics for nested fields. Internally it callsparquet_column()which explicitly bails on nested types:arrow-rs/parquet/src/arrow/mod.rs
Lines 478 to 486 in d3c7900
There is no alternative constructor that accepts a Parquet leaf column index directly, so callers that have already resolved the mapping (like Datafusion's row filter logic) have no way to use StatisticsConverter for nested fields
I'd like to propose adding
StatisticsConverter::from_column_index(parquet_column_index, arrow_field, parquet_schema)constructor that bypasses schema resolution and accepts a pre-resolved leaf col index directly