Skip to content

Conversation

@amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Feb 1, 2022

Co-authored-by: wangzeyu 1249369293@qq.com
Co-authored-by: Jack Ye yzhaoqin@amazon.com

This is a smaller PR as part of breaking up the changes in https://github.com/apache/iceberg/pull/3883/files. This change includes the SnapshotRef entity in the Iceberg API.

Will include serializing/deserializing reference metadata, and reference/table API changes in separate PRs following this.

@amogh-jahagirdar amogh-jahagirdar changed the title Core: Add SnapshotRef to API and add parser and serializer Core: Add SnapshotRef entity to API and add serializer/deserializer utility class. Feb 1, 2022
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Add SnapshotRef entity to API and add serializer/deserializer utility class. Core: Add SnapshotRef entity to API and serializer/deserializer utility class. Feb 1, 2022
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Add SnapshotRef entity to API and serializer/deserializer utility class. Core: Add SnapshotRef entity to API and add snapshot ref serializer/deserializer utility class. Feb 1, 2022
Comment on lines +34 to +35
private final Long maxRefAgeMs;

Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For updating retention logic I think we will need a timestamp field per SnapshotRef. I'm thinking we defer that for that PR, but let me know your thoughts!

@amogh-jahagirdar amogh-jahagirdar force-pushed the snapshot-ref-parser-changes branch 2 times, most recently from d4e1f76 to cf9b8e6 Compare February 2, 2022 00:07
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Add SnapshotRef entity to API and add snapshot ref serializer/deserializer utility class. Core: Add SnapshotRef entity to API Feb 2, 2022
import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;

public class SnapshotRef implements Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a good example where we could use https://immutables.github.io/. The advantage would be that you don't have to manually write + test your builder and you can get JSON serialization for free. However, you can still use the SnapshotReferenceParser from #3883 and not rely in the immutables lib for JSON serialization.

Thoughts @amogh-jahagirdar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

borderline here, more on the -1 side.

I think we already had a few discussions around using immutable, and we came to the conclusion that it does not help with our JSON serialization, because for TableMeadata related objects it still needs to be integrated with the parser, and for ScanTask related objects the generated serializer and deserializer cannot be leveraged by the systems that need to use it including Presto and Trino.

So for consistency with other objects in the API module I would say let's just keep the dependency tree clean and not use it. But I am definitely open to it if there are other benefits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my suggestion was rather meant to use Immutables without having to use Immutables Json serialization functionality as there is already the parser available that takes care of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but with that we are then just saving a few lines of code. I don't see much additional benefit, and we break a few code patterns in the codebase including:

  1. it uses getSomeValue instead of someValue as method name for getters
  2. we cannot have flexible logic inside the builder to check for specific ValidationException and has to do it outside the generated builder
  3. we have separated API module code patterns with some using Immutable and some do not

that's why I was asking for what are the other benefits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jackye1995 here. Immutables doesn't seem like a good fit. Maybe if we were starting the project that way, but I doubt it is worth rewriting so much existing code. And if we start using it for new classes, then we end up with slight differences without much benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but with that we are then just saving a few lines of code. I don't see much additional benefit, and we break a few code patterns in the codebase including:

  1. it uses getSomeValue instead of someValue as method name for getters
  2. we cannot have flexible logic inside the builder to check for specific ValidationException and has to do it outside the generated builder
  3. we have separated API module code patterns with some using Immutable and some do not

that's why I was asking for what are the other benefits.

Just for clarification as there seem to be some misinterpretation of how Immutables work (and maybe I did a terrible job in explaining it previously):
IMO the biggest benefit is that you get truly immutable objects that are always in a consistent state, thus allowing you to implicitly enforce good engineering practices through those. All the other things (generated builders, JSON support, ...) are additional bonuses that you don't have to use if not needed

  1. you can choose any prefix for those methods, so it can be with or without get
  2. you can still have that flexibility. At the time when I proposed the first PR with Immutables it wasn't clear that this is something that we wanted to have but it doesn't mean that you can't have that with Immutables
  3. I'm not sure why it would matter that much if some classes would use Immutables and some wouldn't. As long as you're using the existing API to construct those instances, it shouldn't make a difference for users of those classes. You can have static methods and hide the ImmutableXyzBuilder behind it like it's being done in
    public static Namespace of(String... levels) {
    Preconditions.checkArgument(null != levels, "Cannot create Namespace from null array");
    if (levels.length == 0) {
    return empty();
    }
    return ImmutableNamespace.builder().levels(levels).build();
    .
  4. Introducing Immutables doesn't mean that we would have to rewrite everything. We should only do so where it makes sense and where we'd like to make sure things are truly immutable

I hope that helps a bit. Anyway, I didn't want to hijack this PR with a discussion around Immutables but rather just propose something that's helpful, so let's move this discussion to a different place :)

@amogh-jahagirdar amogh-jahagirdar force-pushed the snapshot-ref-parser-changes branch from cf9b8e6 to ba41e8e Compare February 2, 2022 18:01
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, this is already reviewed a few times so I am directly approving.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, this is already reviewed a few times so I am directly approving.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, this is already reviewed a few times in the old PR

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. The only criticism I have is that this implements equals and hashCode without really using them. I'd probably defer choosing what equality means until later, but I don't have a strong opinion about it.

Co-authored-by: wangzeyu <1249369293@qq.com>
Co-authored-by: Jack Ye <yzhaoqin@amazon.com>
@amogh-jahagirdar amogh-jahagirdar force-pushed the snapshot-ref-parser-changes branch from ba41e8e to c899602 Compare February 2, 2022 22:39
@jackye1995 jackye1995 merged commit 5b7a6b3 into apache:master Feb 2, 2022
@jackye1995
Copy link
Contributor

jackye1995 commented Feb 2, 2022

Thanks for the work, merged! And thanks for the review @rdblue @nastra

sririshindra pushed a commit to sririshindra/iceberg that referenced this pull request Feb 16, 2022
Co-authored-by: wangzeyu <1249369293@qq.com>
Co-authored-by: Jack Ye <yzhaoqin@amazon.com>
(cherry picked from commit 5b7a6b3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants