-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Add SnapshotRef entity to API #4019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Add SnapshotRef entity to API #4019
Conversation
| private final Long maxRefAgeMs; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For updating retention logic I think we will need a timestamp field per SnapshotRef. I'm thinking we defer that for that PR, but let me know your thoughts!
d4e1f76 to
cf9b8e6
Compare
| import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; | ||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
|
|
||
| public class SnapshotRef implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a good example where we could use https://immutables.github.io/. The advantage would be that you don't have to manually write + test your builder and you can get JSON serialization for free. However, you can still use the SnapshotReferenceParser from #3883 and not rely in the immutables lib for JSON serialization.
Thoughts @amogh-jahagirdar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
borderline here, more on the -1 side.
I think we already had a few discussions around using immutable, and we came to the conclusion that it does not help with our JSON serialization, because for TableMeadata related objects it still needs to be integrated with the parser, and for ScanTask related objects the generated serializer and deserializer cannot be leveraged by the systems that need to use it including Presto and Trino.
So for consistency with other objects in the API module I would say let's just keep the dependency tree clean and not use it. But I am definitely open to it if there are other benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my suggestion was rather meant to use Immutables without having to use Immutables Json serialization functionality as there is already the parser available that takes care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but with that we are then just saving a few lines of code. I don't see much additional benefit, and we break a few code patterns in the codebase including:
- it uses
getSomeValueinstead ofsomeValueas method name for getters - we cannot have flexible logic inside the builder to check for specific
ValidationExceptionand has to do it outside the generated builder - we have separated API module code patterns with some using Immutable and some do not
that's why I was asking for what are the other benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jackye1995 here. Immutables doesn't seem like a good fit. Maybe if we were starting the project that way, but I doubt it is worth rewriting so much existing code. And if we start using it for new classes, then we end up with slight differences without much benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but with that we are then just saving a few lines of code. I don't see much additional benefit, and we break a few code patterns in the codebase including:
- it uses
getSomeValueinstead ofsomeValueas method name for getters- we cannot have flexible logic inside the builder to check for specific
ValidationExceptionand has to do it outside the generated builder- we have separated API module code patterns with some using Immutable and some do not
that's why I was asking for what are the other benefits.
Just for clarification as there seem to be some misinterpretation of how Immutables work (and maybe I did a terrible job in explaining it previously):
IMO the biggest benefit is that you get truly immutable objects that are always in a consistent state, thus allowing you to implicitly enforce good engineering practices through those. All the other things (generated builders, JSON support, ...) are additional bonuses that you don't have to use if not needed
- you can choose any prefix for those methods, so it can be with or without
get - you can still have that flexibility. At the time when I proposed the first PR with Immutables it wasn't clear that this is something that we wanted to have but it doesn't mean that you can't have that with Immutables
- I'm not sure why it would matter that much if some classes would use Immutables and some wouldn't. As long as you're using the existing API to construct those instances, it shouldn't make a difference for users of those classes. You can have static methods and hide the
ImmutableXyzBuilderbehind it like it's being done in.iceberg/api/src/main/java/org/apache/iceberg/catalog/Namespace.java
Lines 37 to 43 in 063281a
public static Namespace of(String... levels) { Preconditions.checkArgument(null != levels, "Cannot create Namespace from null array"); if (levels.length == 0) { return empty(); } return ImmutableNamespace.builder().levels(levels).build(); - Introducing Immutables doesn't mean that we would have to rewrite everything. We should only do so where it makes sense and where we'd like to make sure things are truly immutable
I hope that helps a bit. Anyway, I didn't want to hijack this PR with a discussion around Immutables but rather just propose something that's helpful, so let's move this discussion to a different place :)
cf9b8e6 to
ba41e8e
Compare
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, this is already reviewed a few times so I am directly approving.
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, this is already reviewed a few times so I am directly approving.
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, this is already reviewed a few times in the old PR
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. The only criticism I have is that this implements equals and hashCode without really using them. I'd probably defer choosing what equality means until later, but I don't have a strong opinion about it.
Co-authored-by: wangzeyu <1249369293@qq.com> Co-authored-by: Jack Ye <yzhaoqin@amazon.com>
ba41e8e to
c899602
Compare
Co-authored-by: wangzeyu <1249369293@qq.com> Co-authored-by: Jack Ye <yzhaoqin@amazon.com> (cherry picked from commit 5b7a6b3)
Co-authored-by: wangzeyu 1249369293@qq.com
Co-authored-by: Jack Ye yzhaoqin@amazon.com
This is a smaller PR as part of breaking up the changes in https://github.com/apache/iceberg/pull/3883/files. This change includes the SnapshotRef entity in the Iceberg API.
Will include serializing/deserializing reference metadata, and reference/table API changes in separate PRs following this.