Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Aug 30, 2022

The API wasn't consistent everywhere. Now the ids will just initialize at 1, so the user doesn't have to do this.

@Fokko Fokko marked this pull request as draft August 30, 2022 19:58
@Fokko
Copy link
Contributor Author

Fokko commented Aug 30, 2022

Waiting for #5627

The API wasn't consistent everywhere. Now the ids will just initialize
at 1, so the user doesn't have to do this.
@Fokko Fokko force-pushed the fd-update-python-docs branch from 75d7e3a to dd28be8 Compare September 1, 2022 20:02
@Fokko Fokko marked this pull request as ready for review September 1, 2022 20:03
@Fokko Fokko force-pushed the fd-update-python-docs branch from 0f32ba3 to a0806cd Compare September 2, 2022 16:36
"""A list of schemas, stored as objects with schema-id."""

current_schema_id: int = Field(alias="current-schema-id", default=DEFAULT_SCHEMA_ID)
current_schema_id: int = Field(alias="current-schema-id", default=INITIAL_SCHEMA_ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These probably shouldn't have defaults because they need to be explicitly set to some ID that exists in the list of schemas, specs, or sort orders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that defaulting the ID when creating a new schema, spec, or order is fine. But I don't think it is a good idea to default it here. At this point, we no longer have users constructing metadata by hand and we want to make sure that we're setting the ID correctly. If we re-create a schema for a new table metadata object, then we should also set the current schema ID to that schema's ID rather than relying on the same default in two places. That way if we ever change the default assignment we don't break tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I've removed this set explicitly when we get a v1 metadata.


spec_id: int = Field(alias="spec-id")
fields: Tuple[PartitionField, ...] = Field(default_factory=tuple)
spec_id: int = Field(alias="spec-id", default=INITIAL_PARTITION_SPEC_ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to handle ID assignment manually rather than defaulting. Defaulting seems to bring in complexity because if we forget to pass along an ID somewhere, it would cause problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we don't should really expose this to the user. For example, when we create a new table, we re-assign the IDs anyway (using the assign fresh IDs logic).
If we follow the Java API, and we have something similar to updateSpec: https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Table.java#L165-L171 Then we can just take the next ID. What do you think of this?

Copy link
Contributor

@rdblue rdblue Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable to me. I think we just need to make sure that reassignment is correct!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will definitely involve a lot of testing 👍🏻

@Fokko Fokko force-pushed the fd-update-python-docs branch from 61147bd to 8fa8d88 Compare September 5, 2022 20:29
@Fokko Fokko changed the title Python: Update docs and fine-tune the API Python: Fine-tune the API Sep 8, 2022
@Fokko
Copy link
Contributor Author

Fokko commented Sep 8, 2022

Split out the changes to the docs to #5727

@Fokko Fokko force-pushed the fd-update-python-docs branch from 446d94e to 6f49ae3 Compare September 19, 2022 09:23
@Fokko Fokko force-pushed the fd-update-python-docs branch from 6f49ae3 to 0614692 Compare September 19, 2022 13:58
@Fokko
Copy link
Contributor Author

Fokko commented Sep 19, 2022

@rdblue I've resolved the merge conflicts, would you have time for another pass? Thanks!

Args:
order_id (int): The id of the sort-order. To keep track of historical sorting
order_id (int): An unique id of the sort-order of a table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need "of a table" -- that assumes the context that uses the sort order.

location=None,
partition_spec=PartitionSpec(
spec_id=1, fields=(PartitionField(source_id=1, field_id=1000, transform=TruncateTransform(width=3), name="id"),)
PartitionField(source_id=1, field_id=1000, transform=TruncateTransform(width=3), name="id"), spec_id=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the mock causes the result to not match the request. We should start testing against the REST catalog servlet as soon as we can.

@rdblue rdblue merged commit b8a796e into apache:master Sep 20, 2022
@rdblue
Copy link
Contributor

rdblue commented Sep 20, 2022

Looks good. There were a couple minor things, but those aren't blockers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants