Every query in Iceberg starts with the metadata. This is the JSON file that's created at each commit on an Iceberg table.
There are two versions (number three is underway):
- Describes Iceberg tables
- Everything from version 1, with support for merge-on-read deletes.
What I would suggest is reading both V1 and V2 and merging them into a common structure in memory. This includes merging some fields:
schemas is optional in V1, and schema is removed in V2. For V1 only the current schema was kept, but for V2 all the historical schemas are preserved as well. When reading a V1 table, the schema from schema would be added to schemas, and it would set the current-schema-id to the newly added schema.
- Same applies to
partition-specs
- When we read a V1 table, we'll add a
main ref to the refs dict, pointing to the current snapshot.
There are also example manifests available from the Java repository: https://github.com/apache/iceberg/tree/master/core/src/test/resources
Ps. on a tangent, but related, I'm also thinking of creating a jsonschema, would that be helpful for rust?
Every query in Iceberg starts with the metadata. This is the JSON file that's created at each commit on an Iceberg table.
There are two versions (number three is underway):
What I would suggest is reading both V1 and V2 and merging them into a common structure in memory. This includes merging some fields:
schemasis optional in V1, andschemais removed in V2. For V1 only the current schema was kept, but for V2 all the historical schemas are preserved as well. When reading a V1 table, the schema fromschemawould be added toschemas, and it would set thecurrent-schema-idto the newly added schema.partition-specsmainref to therefsdict, pointing to the current snapshot.There are also example manifests available from the Java repository: https://github.com/apache/iceberg/tree/master/core/src/test/resources
Ps. on a tangent, but related, I'm also thinking of creating a jsonschema, would that be helpful for rust?