Skip to content

Conversation

@rzhang10
Copy link
Contributor

@rzhang10 rzhang10 commented May 9, 2022

This PR enhances the API of NestedField to include default values, they are represented as Java Object in memory, this PR is the first PR to implement the spec PR #4301 .

The API changes will be the basis for subsequent PRs to implement the java in-mem to/from json round trip serialization and deserialization, and to support different engines to read/write default values in various formats, according to the semantics in the spec.

@rdblue @wmoustafa

Comment on lines +438 to +439
public static NestedField required(int id, String name, Type type, String doc,
Object initialDefault, Object writeDefault) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable to always expect doc if we need to declare a default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a way around this so it should be okay.

return this;
}
return new NestedField(true, id, name, type, doc);
return new NestedField(true, id, name, type, doc, initialDefault, writeDefault);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the API always expect intialDefault and writeDefault? Can there be an option to provide initialDefault only, and set writeDefault to the same value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I'd really like to hide the complexity of both defaults as you're suggesting. Unfortunately, I don't see a good way to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, but I think we could handle that in the table API.

@rdblue
Copy link
Contributor

rdblue commented May 18, 2022

@wmoustafa, @rzhang10, overall this looks fine to me. The only issue I have is that if we were to merge this now, we'd be updating a public API when there is no implementation behind it. I think we should hold off on merging this until it is time to expose the ability to set these.

Before we do this, there are related PRs that we can get done:

  • Add the JSON value parser
  • Add as much as possible to Parquet, Avro, and ORC readers, like being able to read with a fake map of default values.

@abmo-x
Copy link
Contributor

abmo-x commented Oct 21, 2022

Hi @rdblue @wmoustafa @rzhang10
we at Apple are looking to leverage this feature. wondering when this will be merged and what the roadmap for Default value support looks like.

I am happy to contribute towards any pending or todo items. Please let me know how I can help. Thanks!

@wmoustafa
Copy link
Contributor

@abmo-x, thanks for reaching out. There are a couple of PRs that got merged and one that is currently up for review (#6004).

@lirui-apache
Copy link
Contributor

Hey guys, I have some questions about the API change here.

  1. Do we need to differentiate the case when default value is not specified, and the case when default value is explicitly set to null? I guess it won't be necessary for optional columns, but not sure about required columns.
  2. NestedField is a Serializable. So I think we need to make sure the default value objects, e.g. StructLike, are also serializables right?

@github-actions
Copy link

github-actions bot commented Aug 9, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 9, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants