-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
I was working on write dataset testing in the C++ API today and ran into a number of things that were not very intuitive. All of these are abstracted away / hidden by the python / R interface so this really only applies to anyone using the C++ API directly.
- If no partitioning is specified the write will segfault. Instead it should us a default (no-op) partitioning.
- The min_rows_per_group option should probably default to something higher than 0
- It's not clear how to specify the format (you do it by creating a format, then setting the file write options, which sets the format privately)
- There is no default for basename_template
- There is no default for filesystem (should be local filesystem)
Reporter: Weston Pace / @westonpace
Related issues:
- [C++] Improve Dataset Write Option Defaults (is duplicated by)
PRs and other links:
Note: This issue was originally created as ARROW-15409. Please see the migration documentation for further details.