Skip to content

Conversation

@manirajv06
Copy link
Contributor

…r Variant #11479

Sanitize the Variant fields based on its type.

@github-actions github-actions bot added the API label Mar 22, 2025
@manirajv06
Copy link
Contributor Author

manirajv06 commented Mar 22, 2025

@aihuaxu

Since this is first PR, need to discuss few things before in hand to ensure the direction is correct. Hence, marking the pr as draft for now.

Added a new method sanitizeVariant(VariantValue value, long now, int today) to sanitize the variant fields only for BoundPredicate expression code path by calling from sanitize(Type type, Object value, long now, int today). Need to do the same for UnboundPredicate by calling another (not yet added) variant of sanitize variant sanitizeVariant(Literal<?> literal, long now, int today) by calling from https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java#L551 . It requires us to add a new class VariantLiteral similar to any other Literal for other types like int, float etc. Is this correct? Can you confirm this?

Also, need to add the test as well.

@manirajv06
Copy link
Contributor Author

@RussellSpitzer @rdblue Can you please provide pointers on this? Infact, this is a follow up work triggered from this review discussion.

@RussellSpitzer
Copy link
Member

@manirajv06 Can you explain a little more why we need a Variant literal? That doesn't sound right to me as users shouldn't be able to make a Variant literal

@manirajv06
Copy link
Contributor Author

manirajv06 commented Mar 25, 2025

@RussellSpitzer Thanks for the quick reply. We do have literal for every data type being supported. Not only this, there are two variants of sanitize() in ExpressionUtil to handle more or less same data types, with one being responsible to handle Literal based argument. So, I thought we might to do the same even for Variant type as well. Please help me to understand if i am wrong.

If not required, then are we good with sanitizeVariant(VariantValue value, long now, int today) itself? Can I go ahead and add test for the same?

@RussellSpitzer
Copy link
Member

I think that's good for now unless there is a use case I haven't thought of ... I only just starting thinking about this today

@aihuaxu
Copy link
Contributor

aihuaxu commented Mar 26, 2025

@aihuaxu

Since this is first PR, need to discuss few things before in hand to ensure the direction is correct. Hence, marking the pr as draft for now.

Added a new method sanitizeVariant(VariantValue value, long now, int today) to sanitize the variant fields only for BoundPredicate expression code path by calling from sanitize(Type type, Object value, long now, int today). Need to do the same for UnboundPredicate by calling another (not yet added) variant of sanitize variant sanitizeVariant(Literal<?> literal, long now, int today) by calling from https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java#L551 . It requires us to add a new class VariantLiteral similar to any other Literal for other types like int, float etc. Is this correct? Can you confirm this?

Also, need to add the test as well.

Sorry for the late reply. I'm not exactly sure if unbound is needed. Agree that we can start with the simple case. Note in sanitize(Type type, Object value, long now, int today), the variant can be primitive, array or object. I guess it would be cleaner to use visitor pattern to handle that (update: I was referring to something like ParquetVariantVisitor.java but we can't use that since it's for Parquet module and this is in API. Then that could be too much to add a new one. Recursive implementation also works.)

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 27, 2025
@github-actions
Copy link

github-actions bot commented May 4, 2025

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this May 4, 2025
@manirajv06
Copy link
Contributor Author

@aihuaxu @RussellSpitzer Sorry for the delay.
Have changes ready sitting locally, need to push forward. Can you reopen this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants