You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The Parquet type system includes LogicalTypes types without a direct arrow equivalent, such as JSON, Variant, and UUID
However, Arrow includes the idea of "Extension" types that add extra semantics to an existing Arrow physical type, and the arrow-rs parquet reader will automatically map these the relevant parquet types to a canonical Arrow extension type if the arrow_canonical_extension_types feature is set.
However, right now that mapping of Parquet LogicalType --> Arrow (Canonical) ExtensionType is hard coded, which is unfortunate as it means:
Users can not override the mapping (if they want to write their own implementation of parquet LogicalTypes, for example)
...and maintain a registry of those in the reader/writer options. Then you don't need compile time flags to support the extensions (something like DataFusion or a derivative could wire it all together at runtime).
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The Parquet type system includes LogicalTypes types without a direct arrow equivalent, such as JSON, Variant, and UUID
However, Arrow includes the idea of "Extension" types that add extra semantics to an existing Arrow physical type, and the arrow-rs parquet reader will automatically map these the relevant parquet types to a canonical Arrow extension type if the
arrow_canonical_extension_typesfeature is set.However, right now that mapping of Parquet LogicalType --> Arrow (Canonical) ExtensionType is hard coded, which is unfortunate as it means:
#[cfg(...)]sprinkled in it -- see Support parquet canonical extension type roundtrip #8409 for an exampleDescribe the solution you'd like
@paleolimbot suggested on https://github.com/apache/arrow-rs/pull/8409/files#r2371071848 that we could maintain some sort of registry that was more ergonomic to configure and would allow user defined extension types
Describe alternatives you've considered
Quoting @paleolimbot on https://github.com/apache/arrow-rs/pull/8409/files#r2371071848:
...and maintain a registry of those in the reader/writer options. Then you don't need compile time flags to support the extensions (something like DataFusion or a derivative could wire it all together at runtime).
Additional context