(tested on pyarrow-13.0.0, Linux x64)
When writing a dictionary encoded column of large_string type to Parquet file and reading it back, it is read back as a plain string type.
Repro in Python but I see the same in C++:
>>> import pyarrow as pa
>>> import pyarrow.compute as pc
>>> import pyarrow.parquet as pq
>>> strings = pc.dictionary_encode(pa.array(["foo, bar, foo"], pa.large_string()))
>>> table = pa.table([strings], ["strings"])
>>> table.schema
strings: dictionary<values=large_string, indices=int32, ordered=0>
>>> pq.write_table(table, "table.parquet")
>>> pq.read_table("table.parquet").schema
strings: dictionary<values=string, indices=int32, ordered=0>
I'd expect to get it back as a dictionary<values=large_string, indices=int32, ordered=0> type.
Component(s)
C++, Parquet, Python