-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Partition fields are assigned IDs for when they are stored in manifest files. ID assignment is done in PartitionSpec#partitionType(). That assigns IDs for each field starting at 1000.
This assignment scheme reuses IDs across partition specs. Because a manifest file is written for a single partition spec, this doesn't cause problems when multiple specs exist. But this causes problems in the entries and files metadata tables because the data file partition may have a different schema across manifest files, but reuse IDs.
For example, if part of a table is partitioned by days(ts) and another part is partitioned by hours(ts), both of these will show up in the entries table's partition struct with ID 1000.
A simple solution is to assign partition field IDs starting at 1000 across all table specs and keep the last assigned ID in table metadata. This would ensure that partition tuples will be read correctly in metadata tables when a table has multiple partition specs. In the example above, days(ts) would be assigned ID 1000, and when the second partition spec is added, hours(ts) is assigned ID 1001.