-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input #11609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input #11609
Conversation
|
I have a couple of questions regarding the behaviour of logical types:
|
eab9420 to
5687c87
Compare
|
Hm so there are several ways of manually building a Row instance that provide different levels of runtime type-checking. Lines 354 to 357 in 34c58c4
So we can have runtime type-checking for debugging, but then turn it off for performance. I'm not sure how |
|
@TheNeuralBit |
|
@TheNeuralBit withFieldValue should replace addValues for most users. addValues is difficult and error prone and withFieldValues allows building a row based on named fields instead of positional fields. |
|
LGTM |
|
@reuvenlax @TheNeuralBit |
We always convert logical types to their base type when serializing with SchemaCoder, and convert back to the input type when deserializing. Other than that I think the only time it should get called is when constructing a Row instance (unless you use attachValues).
Would this just be so that we're guaranteed to call |
TheNeuralBit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM other than a couple of minor comments.
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these tests are really checking Row's verification, I think they would be better in RowTest. Could you move them there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the tests to RowTest.java
case in point!
How can I write FixedBytes test which tests the behaviour of appending zeros? To test this behaviour, the input value should have length < expectedLength. But, if the input value's length is less than expected length, an IllegalArgumentException is thrown while building the Row.
In that case, there is no need to handle this beam/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/FixedBytes.java Line 77 in 5e15717
Even if attachValues is used while building the Row and the provided input value is invalid(invalid length), during serialization in SchemaCoder, the input value cannot be converted to base type as it doesn't have expected length and an IllegalArgumentException will be thrown.
Can we support this feature: depending on the type of the input value provided while building the Row, we can call |
5687c87 to
7e4e266
Compare
7e4e266 to
4566de4
Compare
|
|
Whoops sorry I completely misread that. We accept short byte arrays, and zero-pad them. We do not accept long byte arrays so that we don't have to truncate them. |
|
Ahh ok. I'm sorry for being so dense, I see what you're saying now. In beam/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/FixedBytes.java Lines 67 to 70 in 5e15717
This is where the exception checked in your new tests is thrown. So we never actually get into This seems to be a holdover. Previously Row stored logical type values as their base type, so we probably called |
|
I'd be +1 for just dropping the padding logic. I don't think it should be the responsibility of the LogicalType to coerce values like this. What do you think @reuvenlax? |
Even before the code which changed beam/sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java Lines 658 to 660 in 9f0cb64
I think this issue is present since the introduction of FixedBytes.
|
|
If there are no comments, can we close this PR? |
@TheNeuralBit @reuvenlax I have taken care of this for |
…ical types with Invalid input (apache#11609) * [BEAM-9887] Expected Exception when building Row with logical types with Invalid input * Fix failed BeamComplexTypeTest.testNullDatetimeFields Test to handle null values
schema.logicaltypes.FixedByteslogical type expects an argument - the length of the byte[].When an invalid input value (with length < expectedLength) is provided while building the Row with FixedBytes logical type,
IllegalArgumentExceptionis expected. But, the Exception is not thrown. The below code illustrates the behaviour:The above code prints "[1, 2, 3, 4, 5]" with length 5 to the console, whereas the expected length of FixedBytes, is 10.
This PR fixes the issue.
Negative and Positive Test for FixedBytes Logical type are added.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.