Feature Request / Improvement
As discussed in #5626, it would be nice to have multi-arg transform supported in Iceberg, especially for bucket transform.
I wrote up a spec change proposal doc for this improvement: https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit?usp=sharing
Quoted background from the doc:
Iceberg uses a transform to produce partitioning value from a source value. Currently the supported transforms are: Years, Months, Days, Hours, Identity, Void, Truncate, Bucket. Since the current spec requires that each partitioning field consists of a source column id in the table’s schema, the above transforms only accept one argument as its input. However, it’s possible and quite common to use multiple arguments to produce a partitioning value, especially for the Bucket transform. Other transforms might require multiple arguments in the future. This document tries to add multi-arg transform support in Iceberg, especially for the bucket transform.
I also did a poc version of how multiple arg bucket would be supported in Spark: #8259. Some places are not modified yet, such as UpdatePartitionSpec, TableMetadata related.
I'd like to get feedbacks from the community before going too much further.
If we have reached the consensus that we should support multi-arg transform and the spec changes are stabilized after reviewing. I would update my code accordingly, and extend the Flink engine support.
Query engine
Spark
Feature Request / Improvement
As discussed in #5626, it would be nice to have multi-arg transform supported in Iceberg, especially for bucket transform.
I wrote up a spec change proposal doc for this improvement: https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit?usp=sharing
Quoted background from the doc:
I also did a poc version of how multiple arg bucket would be supported in Spark: #8259. Some places are not modified yet, such as UpdatePartitionSpec, TableMetadata related.
I'd like to get feedbacks from the community before going too much further.
If we have reached the consensus that we should support multi-arg transform and the spec changes are stabilized after reviewing. I would update my code accordingly, and extend the Flink engine support.
Query engine
Spark