Skip to content

[C++][Dataset] arrow::dataset::Partitioning::Default() can't be used for writing dataset #15256

@kou

Description

@kou

Describe the bug, including details regarding any error messages, version, and platform.

Because arrow::dataset::DefaultPartitioning::Format() isn't implemented:

Result<PartitionPathFormat> Format(const compute::Expression& expr) const override {
return Status::NotImplemented("formatting paths from ", type_name(),
" Partitioning");
}

It's required in WriteBatch():

ARROW_ASSIGN_OR_RAISE(destination,
write_options.partitioning->Format(partition_expression));

Is it expected that we can't use arrow::dataset::Partitioning::Default() for writing dataset?
If it's expected, how about removing arrow::dataset::Partitioning::Default() because it's useless?
If it's not expected, how about implementing arrow::dataset::DefaultPartitioning::Format() like the following?

diff --git a/cpp/src/arrow/dataset/partition.cc b/cpp/src/arrow/dataset/partition.cc
index 46cdf9023c..13add35fb8 100644
--- a/cpp/src/arrow/dataset/partition.cc
+++ b/cpp/src/arrow/dataset/partition.cc
@@ -90,8 +90,7 @@ std::shared_ptr<Partitioning> Partitioning::Default() {
     }
 
     Result<PartitionPathFormat> Format(const compute::Expression& expr) const override {
-      return Status::NotImplemented("formatting paths from ", type_name(),
-                                    " Partitioning");
+      return PartitionPathFormat{"", ""};
     }
 
     Result<PartitionedBatches> Partition(

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions