Skip to content

[C++] The C++ API for writing datasets could be improved #30891

@asfimport

Description

@asfimport

I was working on write dataset testing in the C++ API today and ran into a number of things that were not very intuitive. All of these are abstracted away / hidden by the python / R interface so this really only applies to anyone using the C++ API directly.

  • If no partitioning is specified the write will segfault. Instead it should us a default (no-op) partitioning.
  • The min_rows_per_group option should probably default to something higher than 0
  • It's not clear how to specify the format (you do it by creating a format, then setting the file write options, which sets the format privately)
  • There is no default for basename_template
  • There is no default for filesystem (should be local filesystem)

Reporter: Weston Pace / @westonpace

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-15409. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions