Skip to content

Add persistent IDs to partition fields #280

@rdblue

Description

@rdblue

Partition fields are assigned IDs for when they are stored in manifest files. ID assignment is done in PartitionSpec#partitionType(). That assigns IDs for each field starting at 1000.

This assignment scheme reuses IDs across partition specs. Because a manifest file is written for a single partition spec, this doesn't cause problems when multiple specs exist. But this causes problems in the entries and files metadata tables because the data file partition may have a different schema across manifest files, but reuse IDs.

For example, if part of a table is partitioned by days(ts) and another part is partitioned by hours(ts), both of these will show up in the entries table's partition struct with ID 1000.

A simple solution is to assign partition field IDs starting at 1000 across all table specs and keep the last assigned ID in table metadata. This would ensure that partition tuples will be read correctly in metadata tables when a table has multiple partition specs. In the example above, days(ts) would be assigned ID 1000, and when the second partition spec is added, hours(ts) is assigned ID 1001.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions