-
Notifications
You must be signed in to change notification settings - Fork 3k
Bump Nessie to 0.28.0, improve commit conflict detection #4594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "is no longer valid.", client.getRef().getName()), e); | ||
| } | ||
| String metadataLocation = null; | ||
| Reference reference = client.getRef().getReference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming these changes were done to not do additional API calls. These changes won't be necessary after #4593
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Supplier.get() a couple of times isn't necessary either.
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
Outdated
Show resolved
Hide resolved
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
Outdated
Show resolved
Hide resolved
|
|
||
| this.snapshotLog = Lists.newArrayList(base.snapshotLog); | ||
| this.previousFileLocation = base.metadataFileLocation; | ||
| this.previousFileLocation = noPreviousFile ? null : base.metadataFileLocation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little bit annoying that this needs to be set, so I wonder whether it would be better to have a builder method that allows overriding this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to override that too for something I'm working on. But I have just used the builder's withMetadataLocation. In my case, the value of base.metadataFIleLocation is already null so just calling the function works fine.
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
Show resolved
Hide resolved
| } catch (NessieNotFoundException ex) { | ||
| throw new RuntimeException( | ||
| String.format("Cannot commit: Reference '%s' no longer exists", client.getRef().getName()), ex); | ||
| String.format("Cannot commit: Reference '%s' no longer exists", updateableReference.getName()), ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would probably revert all of those changes as I don't think they are necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Supplier.get() a couple of times isn't necessary either.
b4c5b88 to
0fd33b4
Compare
| TableMetadata.Builder builder = TableMetadata.buildFrom(deserialized, true) | ||
| .setCurrentSchema(table.getSchemaId()) | ||
| .setDefaultSortOrder(table.getSortOrderId()) | ||
| .setDefaultPartitionSpec(table.getSpecId()) | ||
| .withMetadataLocation(metadataLocation); | ||
| .withMetadataLocation(metadataLocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need to pass true in here. If base doesn't have that field set, it will just be null. Otherwise it will get overridden after this builder is called with .build().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that the buildFrom/Builder.<init>() sets the final field previousFileLocation, which is then used in Builder.build() to add a MetadataLogEntry. Builder.previousFileLocation and Builder.metadataLocation are different.
d36118d to
bfc6655
Compare
| } | ||
| } | ||
|
|
||
| private static void toJson(TableMetadata metadata, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to change accessibility on these methods, why is the custom JsonGenerator needed in the Nessie code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh
fair enough
| TableMetadata.Builder builder = TableMetadata.buildFrom(TableMetadataParser.read(io(), metadataLocation)) | ||
| TableMetadata deserialized; | ||
| if (table.getMetadata() != null) { | ||
| deserialized = TableMetadataParser.fromJson(io(), metadataLocation, table.getMetadata().getMetadata()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have this path in to request metadata out of a Nessie specific object which stores the metadata as a Json String that needs to be parsed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i see, the IcebergTable class has a JsonNode? And our current public method only accepts a string, it doesn't allow directly injecting the node. I kind of think we should probably stick to that "string" api rather that opening up another method to the public but I may be paranoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not yet, but in the future, yes.)
It's already a JSON node tree (as a generic JsonNode) - so the text->nodes parsing already happened.
As I didn't want to serialize the JsonNode to a String to get a TableMetadata, I've updated the TableMetadataParser.
| if (table.getMetadata() != null) { | ||
| deserialized = TableMetadataParser.fromJson(io(), metadataLocation, table.getMetadata().getMetadata()); | ||
| } else { | ||
| deserialized = TableMetadataParser.read(io(), metadataLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise we just read from the location on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
| return new Builder(base); | ||
| } | ||
|
|
||
| public static Builder buildFrom(TableMetadata base, boolean noPreviousFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new public api here needs a java doc, I'm not sure I follow what noPreviousFile is expected to do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a "smart thing" in the builder. If it's present, then it add it as a history item. That behavior is okay, when buildFrom() is only used when it is used within a "modify TableMetadata operation". But the Nessie catalog only uses it to modify the TableMetadata as it is - so it's not a change, which requires a new history-item.
Without the noPreviousFile==true, a couple of catalog tests fail, because those assert on the number of history items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy I've been thinking on this and I'm wondering why can't implement a builder like .withPreviousFileLocation to set that to previousFileLocation to null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WFM, updated the PR
| JsonNode newMetadata; | ||
| try { | ||
| TokenBuffer tokenBuffer = new TokenBuffer(JsonUtil.factory().getCodec(), false); | ||
| TableMetadataParser.toJson(metadata, tokenBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we could avoid this usage if we use the "toJson(metadata) method and then pass that into the token buffer. I know it's a bit more wasteful but we are only doing this once per operation and it let's us keep the toJson(metadata, generator) method package private. Again I may be being paranoid here but ideally we don't open up any more methods than we absolutely have to.
Enhances Iceberg commit conflict detection by maintaining the commit-id from which a table metadata has been loaded, to use it as the "expected hash" in a Nessie commit. Makes Nessie specific properties available in `TableMetadata` properties: * `nessie.content.id` - the Nessie `Content.getId()` * `nessie.commit.id` - the commit ID used to retrieve the table metadata * `nessie.reference.name` - the reference name from which the table metadata has been loaded Also fixes an issue via `org.apache.iceberg.nessie.NessieTableOperations#loadTableMetadata` that caused too many "previous files", because the `TableMetadata.buildFrom()` assumed that it's only called for ongoing modifications.
| } catch (NessieNotFoundException ex) { | ||
| throw new RuntimeException( | ||
| String.format("Cannot commit: Reference '%s' no longer exists", client.getRef().getName()), ex); | ||
| String.format("Cannot commit: Reference '%s' no longer exists", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a formatting change
| private String buildCommitMsg(TableMetadata base, TableMetadata metadata) { | ||
| if (isSnapshotOperation(base, metadata)) { | ||
| return String.format("Iceberg %s against %s", metadata.currentSnapshot().operation(), tableName()); | ||
| return String.format("Iceberg %s against %s", metadata.currentSnapshot().operation(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a formatting change
Enhances Iceberg commit conflict detection by maintaining the commit-id from which a
table metadata has been loaded, to use it as the "expected hash" in a Nessie commit.
Makes Nessie specific properties available in
TableMetadataproperties:nessie.content.id- the NessieContent.getId()nessie.commit.id- the commit ID used to retrieve the table metadatanessie.reference.name- the reference name from which the table metadata has been loadedAlso fixes an issue via
org.apache.iceberg.nessie.NessieTableOperations#loadTableMetadatathat caused too many "previous files", because the
TableMetadata.buildFrom()assumed thatit's only called for ongoing modifications.