[C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray

If the goal is to hash this data anyway into a categorical-type array, then it would be better to offer the option to "push down" the hashing into the Parquet read hot path rather than first fully materializing a dense vector of `ByteArray` values, which could use a lot of memory after decompression

**Reporter**: [Wes McKinney](https://issues.apache.org/jira/browse/ARROW-3769) / @wesm
**Assignee**: [Hatem Helal](https://issues.apache.org/jira/browse/ARROW-3769) / @hatemhelal
#### Related issues:
- [[C++] Enable reading from ByteArray and FixedLenByteArray decoders directly into arrow::BinaryBuilder or arrow::BinaryDictionaryBuilder](https://issues.apache.org/jira/browse/PARQUET-1508) (relates to)
- [[Python] Support reading Parquet binary/string columns directly as DictionaryArray](https://github.com/apache/arrow/issues/19660) (is related to)
#### PRs and other links:
- [GitHub Pull Request #3721](https://github.com/apache/arrow/pull/3721)

<sub>**Note**: *This issue was originally created as [ARROW-3769](https://issues.apache.org/jira/browse/ARROW-3769). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray #20103

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray #20103

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions