Skip to content

[Parquet][C++] More elaborate dictionary fallback for Parquet 2.0 #15165

@mapleFU

Description

@mapleFU

Describe the enhancement requested

In our standard ( https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8 ) , the dict fallback will goes to PLAIN encoding.

But in many implementions, it says that Parquet 2.0 should support fallback to other types, in our code, there is some todo:

    // Only PLAIN encoding is supported for fallback in V1
    // TODO(majetideepak): Use user specified encoding for V2
    if (dictionary_fallback) {
      thrift_encodings.push_back(ToThrift(Encoding::PLAIN));
    }

And, in parquet-mr and arrow-rs, they both support fallback to other types:

  1. parquet-mr: https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultV2ValuesWriterFactory.java
  2. arrow-rs: https://github.com/apache/arrow-rs/blob/master/parquet/src/column/writer/mod.rs#L1028-L1040

So, should we support that?

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions