Skip to content

multi-arg transform support #8258

@advancedxy

Description

@advancedxy

Feature Request / Improvement

As discussed in #5626, it would be nice to have multi-arg transform supported in Iceberg, especially for bucket transform.

I wrote up a spec change proposal doc for this improvement: https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit?usp=sharing

Quoted background from the doc:

Iceberg uses a transform to produce partitioning value from a source value. Currently the supported transforms are: Years, Months, Days, Hours, Identity, Void, Truncate, Bucket. Since the current spec requires that each partitioning field consists of a source column id in the table’s schema, the above transforms only accept one argument as its input. However, it’s possible and quite common to use multiple arguments to produce a partitioning value, especially for the Bucket transform. Other transforms might require multiple arguments in the future. This document tries to add multi-arg transform support in Iceberg, especially for the bucket transform.


I also did a poc version of how multiple arg bucket would be supported in Spark: #8259. Some places are not modified yet, such as UpdatePartitionSpec, TableMetadata related.
I'd like to get feedbacks from the community before going too much further.

If we have reached the consensus that we should support multi-arg transform and the spec changes are stabilized after reviewing. I would update my code accordingly, and extend the Flink engine support.

Query engine

Spark

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions