Skip to content

[Python] Support reading Parquet binary/string columns directly as DictionaryArray #19660

@asfimport

Description

@asfimport

Requires PARQUET-1324 and probably quite a bit of extra work

Properly implementing this will require dictionary normalization across row groups. When reading a new row group, a fast path that compares the current dictionary with the prior dictionary should be used. This also needs to handle the case where a column chunk "fell back" to PLAIN encoding mid-stream

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-3325. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions