Skip to content

[C++] Implement alternative DictionaryBuilder that always yields int32 indices #22445

@asfimport

Description

@asfimport

One problem with the current DictionaryBuilder<T> in some applications is that, if it is used to produce a series of arrays to form a ChunkedArray, it may yield constituent chunks having different index widths. For example:

chunk 0: int8 indices
chunk 1: int16 indices
chunk 2: int16 indices
chunk 3: int32 indices
chunk 4: int32 indices
chunk 5: int32 indices
chunk 6: int32 indices

Obviously this is problematic for these applications. I'm running into this issue in the context of ARROW-3772 where we are looking to decode Parquet data directly to DictionaryArray without stepping through an intermediate dense decoded stage.

I'm not sure what to call the class, whether DictionaryInt32Builder or something similar, but this would be the same API more or less as DictionaryBuilder but instead use Int32Builder for the indices rather than AdaptiveIntBuilder.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-6042. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions