-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1837: [Java][Integration] Fix unsigned round trip integration tests #4432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4432 +/- ##
==========================================
+ Coverage 88.26% 89.46% +1.19%
==========================================
Files 846 645 -201
Lines 103357 89543 -13814
Branches 1253 0 -1253
==========================================
- Hits 91233 80110 -11123
+ Misses 11877 9433 -2444
+ Partials 247 0 -247
Continue to review full report at Codecov.
|
|
Awesome, thanks @emkornfield! @pravindra or @siddharthteotia could you review the Java changes? |
dcea791 to
fb29381
Compare
|
Rebased. Java reviewers (@jacques-n, @pravindra, or @siddharthteotia), could you take a look? Thanks |
praveenbingo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emkornfield
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not different from the behavior earlier - we would have thrown exception earlier for zero batches vs what it looks like proceeding silently now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intentional. This was discussed on the ML, we think it is reasonable to have a schema without batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it not be better to do that in a different method? I am not sure if existing clients depend on this for validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. Given the fact that there are no unit tests for this code path makes me think this this is only used in integration tests. Are you familiar with any additional use cases? Also this change seems like the correct behavior (I don't see why no batches is an exception) but I suppose it would be better to validate at the file level that there wasn't any corruption (but I don't think there are provisions for this in the spec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure of exact use cases, but have been dealing with a lot of breaking changes while upgrading dremio to use latest arrow and am wary of anything that looks like change in behavior :)
Agree the new behavior seems like the right one to implement. Sounds good for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood at some point we should discuss on the ML about API/behavior stability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure what this means :) why is the method public but not usable externally..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was copy and paste the real intent was only to use for integration tests. I've updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why public if this should not be used externally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was copy and paste the real intent was only to use for integration tests. I've updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there some new pattern about these where we optionally check isSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is for the primitive types. getObject is supposed to return null if the bitmap isn't set (and wasn't changed with the PR you are referring to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can return char here. it is an unsigned 2 byte value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Removed the method entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually just changed it to get
fb29381 to
94f30a2
Compare
|
@jacques-n @praveenbingo I think I responded/fixed all your feedback. Thank you for the review. |
- Show unsigned values can be round-tripped between java and C++ in integration tests. This doesn't fully fix the problem because the UInt* APIs are mostly wrong because they can't represent the full range of unsigned values (return types are all too small because java only has signed types). - While I was at it, I fixed the issue with no batches.
94f30a2 to
d8ad3d8
Compare
|
Will fix the python issue later today |
praveenbingo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment..looks good otherwise.
| * @param index position of the element. | ||
| * @return value stored at the index. | ||
| */ | ||
| public static long get(final ArrowBuf buffer, final int index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to not return char and returning a long..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy paste bug will fix
praveenbingo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
|
Thanks everybody |
…ests - Show unsigned values can be round-tripped between java and C++ in integration tests. This doesn't fully fix the problem because the UInt* APIs are mostly wrong because they can't represent the full range of unsigned values (return types are all too small because java only has signed types). - While I was at it, I fixed the issue with no batches. Author: Micah Kornfield <emkornfield@gmail.com> Author: emkornfield <emkornfield@gmail.com> Closes apache#4432 from emkornfield/fix_integration_tests and squashes the following commits: 226c4af <Micah Kornfield> fixes 27e4738 <emkornfield> Add missing comma to integration test d8ad3d8 <Micah Kornfield> Address PR feedback a6a23e9 <Micah Kornfield> ARROW-1837: Fix unsigned round trip integration tests
Show unsigned values can be round-tripped between java and C++
in integration tests. This doesn't fully fix the problem because
the UInt* APIs are mostly wrong because they can't represent the
full range of unsigned values (return types are all too small
because java only has signed types).
While I was at it, I fixed the issue with no batches.