If the goal is to hash this data anyway into a categorical-type array, then it would be better to offer the option to "push down" the hashing into the Parquet read hot path rather than first fully materializing a dense vector of ByteArray values, which could use a lot of memory after decompression
Reporter: Wes McKinney / @wesm
Assignee: Hatem Helal / @hatemhelal
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-3769. Please see the migration documentation for further details.