-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-15256: [C++][Dataset] Add support for writing with Partitioning::Default() #33674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
|
|
Shouldn't this be given a more descriptive name than "default"? |
|
"flat"? "nothing"? |
|
"Flat" sounds good to me. cc @westonpace |
|
I'm fine with default. I think I'd prefer "none" over "flat". "flat" implies to me that something is still happening. E.g. there is still some kind of partitioning. We currently have |
|
I'm OK with |
There also is still some kind of partitioning, I think? I.e. a single flat directory? I would interpret "No" partitioning as a single file. |
|
@jorisvandenbossche you might be thinking of FilenamePartitioning (which I forgot to mention) which gives you: This partitioning is only going to split up files when there are too many rows. So, if you set ...and there will be no meaningful information in the filenames. |
|
No, I was thinking about the latter. |
|
@pitrou can be tiebreaker then :). I don't like |
|
Can we use |
|
I suppose all partitioning schemes, given an empty schema, should behave exactly the same. That might be a better solution. For example, someone working with Spark will always want to use the hive partitioning scheme. Sometimes there might not be any partitioning columns. They still would think they are working with "the hive scheme with no columns". I'm not sure how much this scenario is tested. |
From Python that is certainly tested, since if you don't pass any partitioning columns in
The downside of that is that also for other schemes like HivePartitioning files also get broken into chunks in addition to the hive-like directories, so that is not a distinguishing feature. Maybe the original "Default" partitioning is a decent name in the end, since "default" is ambiguous enough to avoid such conflicting interpretations of "flat" or "no" .. ;) |
|
@kou Thanks for the PR! This has been open for some months now without activity, so I'm going to close it out! |
|
I didn't mean to reopen. @kou can reopen if desired. However, I do think it would be good to resolve this issue. |
|
@westonpace OK! We need to find a consensus approach to resolve this. |
Yes. That will work. |
|
OK. I'll do it. |
329d1f5 to
0842f39
Compare
|
@westonpace Could you review this? CI failures are unrelated:
|
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We might need #34872 to make sure the tests run.
…ing::Default() It writes all data into one directory.
Co-Authored-By: Weston Pace <weston.pace@gmail.com>
2d5e254 to
87408a4
Compare
|
CI failures are unrelated. |
|
Benchmark runs are scheduled for baseline = c219863 and contender = 8d8d21f. 8d8d21f is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…ing::Default() (apache#33674) ### What changes are included in this PR? It writes all data into one directory. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: apache#15256 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
What changes are included in this PR?
It writes all data into one directory.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.