-
Notifications
You must be signed in to change notification settings - Fork 426
Correct schema behavior #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be an invalid state if snapshot is
Nonebut asnapshot_idis set, should we throw?Maybe we could consider a
schema_for(snapshot_id)API similar to https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java#L368 .I think there's a difference in the Java implementation and Python implementation on the case where there is a schema ID on the snapshot but for whatever reason the schema with that ID cannot be found. In the
schemaForJava API implementation we throw, but here we fall back to the latest. I think we should probably throw rather than assume the latest in that case because that implies there is some bad metadata and it's safer to fail than coerce to the latest schema. I think latest should only be used when there is no schema ID on the snapshot and the original case when there is nosnapshot_idset. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch @amogh-jahagirdar I'm not super strong on this one. Typically, I would not fail in these situations, but I agree that raising a warning might be appropriate here.
I know there are thoughts of pruning old schemas, which might lead to this situation, but I would expect this to happen regularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code with a warning, let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the warning makes sense for the missing schema ID case but what about the case where the snapshot_id is set but cannot be found (if line 948 returns
None)? I think the only option there would be to throw because that means there was some established snapshot_id but we can't find it anymore.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, that's a good one. I think we should check if the snapshot-id is valid earlier in the process. I've added a check now, but I'll follow up with another PR to make this more strict.