Skip to content

[Java] Multi-batch dictionary bug in ArrowFile{Reader|Writer} #38168

@manolama

Description

@manolama

Describe the bug, including details regarding any error messages, version, and platform.

Arrow 13:

I ran into an issue with the Java ArrowFileWriter where only the first dictionary batch is flushed to the file and subsequent batches are skipped. Similarly, on reading, only the first dictionary block is read and used for subsequent data blocks, resulting in out of bounds exceptions or incorrect data. It seems as if the intent was to encode with delta encoding per https://arrow.apache.org/docs/format/Columnar.html#dictionary-messages but the isDelta flag is false on writes and I wasn't able to find an API to adjust that setting.

Component(s)

Java

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions