Describe the bug
The representation of Dictionary in the data types enum seems to exclude field metadata, so extension types are dropped when they go through arrow-rs structures:
|
Dictionary(Box<DataType>, Box<DataType>), |
The definition of RunEndEncoded and others seem to use a FieldRef and I'm wondering if it was a deliberate choice not to do this or whether it's just never come up.
To Reproduce
I used arro3 to reproduce:
import arro3.core as a3
import geoarrow.pyarrow as ga
import nanoarrow as na
import pyarrow as pa
c_schema = na.c_schema(pa.dictionary(pa.int32(), ga.wkb()))
c_schema.metadata is None
#> True
c_schema.dictionary.metadata
#> <nanoarrow._schema.SchemaMetadata>
#> - b'ARROW:extension:name': b'geoarrow.wkb'
#> - b'ARROW:extension:metadata': b'{}'
c_schema2 = na.c_schema(a3.DataType.dictionary(pa.int32(), ga.wkb()))
c_schema2.metadata is None
#> True
c_schema2.dictionary.metadata is None
#> True
Expected behavior
I would have expected the metadata to roundtrip through the arrow-rs data type representation
Additional context
Occasionally Parquet readers will return dictionary-encoded arrays on read whose representation is not entirely in control of the user.
Describe the bug
The representation of Dictionary in the data types enum seems to exclude field metadata, so extension types are dropped when they go through arrow-rs structures:
arrow-rs/arrow-schema/src/datatype.rs
Line 359 in a7f3ba8
The definition of RunEndEncoded and others seem to use a
FieldRefand I'm wondering if it was a deliberate choice not to do this or whether it's just never come up.To Reproduce
I used arro3 to reproduce:
Expected behavior
I would have expected the metadata to roundtrip through the arrow-rs data type representation
Additional context
Occasionally Parquet readers will return dictionary-encoded arrays on read whose representation is not entirely in control of the user.