Skip to content

[C++] Clarify ChunkedArray chunking strategy and policy #23326

@asfimport

Description

@asfimport

See discussion on ARROW-6784 and #5686. Among the questions:

  • Do Arrow users control the chunking, or is it an internal implementation detail they should not manage?
  • If users control it, how do they control it? E.g. if I call Take and use a ChunkedArray for the indices to take, does the chunking follow how the indices are chunked? Or should we attempt to preserve the mapping of data to their chunks in the input table/chunked array?
  • If it's an implementation detail, what is the optimal chunk size? And when is it worth reshaping (concatenating, slicing) input data to attain this optimal size? 

Reporter: Neal Richardson / @nealrichardson
Assignee: Wes McKinney / @wesm

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-7012. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions