-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the enhancement requested
In our standard ( https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8 ) , the dict fallback will goes to PLAIN encoding.
But in many implementions, it says that Parquet 2.0 should support fallback to other types, in our code, there is some todo:
// Only PLAIN encoding is supported for fallback in V1
// TODO(majetideepak): Use user specified encoding for V2
if (dictionary_fallback) {
thrift_encodings.push_back(ToThrift(Encoding::PLAIN));
}
And, in parquet-mr and arrow-rs, they both support fallback to other types:
- parquet-mr: https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultV2ValuesWriterFactory.java
- arrow-rs: https://github.com/apache/arrow-rs/blob/master/parquet/src/column/writer/mod.rs#L1028-L1040
So, should we support that?
Component(s)
C++, Parquet