Skip to content

Conversation

@emkornfield
Copy link
Contributor

  • Show unsigned values can be round-tripped between java and C++
    in integration tests. This doesn't fully fix the problem because
    the UInt* APIs are mostly wrong because they can't represent the
    full range of unsigned values (return types are all too small
    because java only has signed types).

  • While I was at it, I fixed the issue with no batches.

@codecov-io
Copy link

codecov-io commented May 31, 2019

Codecov Report

Merging #4432 into master will increase coverage by 1.19%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4432      +/-   ##
==========================================
+ Coverage   88.26%   89.46%   +1.19%     
==========================================
  Files         846      645     -201     
  Lines      103357    89543   -13814     
  Branches     1253        0    -1253     
==========================================
- Hits        91233    80110   -11123     
+ Misses      11877     9433    -2444     
+ Partials      247        0     -247
Impacted Files Coverage Δ
cpp/src/arrow/util/thread-pool-test.cc 97.66% <0%> (-0.94%) ⬇️
cpp/src/arrow/type.cc 95.72% <0%> (-0.24%) ⬇️
cpp/src/plasma/client.cc 96.96% <0%> (ø) ⬆️
cpp/src/arrow/type.h 94.37% <0%> (ø) ⬆️
cpp/src/plasma/protocol.cc 93.76% <0%> (ø) ⬆️
cpp/src/plasma/test/client_tests.cc 100% <0%> (ø) ⬆️
cpp/src/arrow/python/arrow_to_pandas.cc 91.59% <0%> (ø) ⬆️
cpp/src/arrow/type_traits.h 85.71% <0%> (ø) ⬆️
R/array.R
go/arrow/math/uint64_amd64.go
... and 201 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40f876b...226c4af. Read the comment docs.

@wesm
Copy link
Member

wesm commented May 31, 2019

Awesome, thanks @emkornfield! @pravindra or @siddharthteotia could you review the Java changes?

@wesm wesm force-pushed the fix_integration_tests branch from dcea791 to fb29381 Compare June 3, 2019 15:20
@wesm
Copy link
Member

wesm commented Jun 3, 2019

Rebased. Java reviewers (@jacques-n, @pravindra, or @siddharthteotia), could you take a look? Thanks

Copy link
Contributor

@praveenbingo praveenbingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emkornfield

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not different from the behavior earlier - we would have thrown exception earlier for zero batches vs what it looks like proceeding silently now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional. This was discussed on the ML, we think it is reasonable to have a schema without batches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it not be better to do that in a different method? I am not sure if existing clients depend on this for validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. Given the fact that there are no unit tests for this code path makes me think this this is only used in integration tests. Are you familiar with any additional use cases? Also this change seems like the correct behavior (I don't see why no batches is an exception) but I suppose it would be better to validate at the file level that there wasn't any corruption (but I don't think there are provisions for this in the spec)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of exact use cases, but have been dealing with a lot of breaking changes while upgrading dremio to use latest arrow and am wary of anything that looks like change in behavior :)

Agree the new behavior seems like the right one to implement. Sounds good for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood at some point we should discuss on the ML about API/behavior stability

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure what this means :) why is the method public but not usable externally..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was copy and paste the real intent was only to use for integration tests. I've updated it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why public if this should not be used externally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was copy and paste the real intent was only to use for integration tests. I've updated it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there some new pattern about these where we optionally check isSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is for the primitive types. getObject is supposed to return null if the bitmap isn't set (and wasn't changed with the PR you are referring to).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can return char here. it is an unsigned 2 byte value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed the method entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually just changed it to get

@emkornfield emkornfield force-pushed the fix_integration_tests branch from fb29381 to 94f30a2 Compare June 4, 2019 04:49
@emkornfield
Copy link
Contributor Author

@jacques-n @praveenbingo I think I responded/fixed all your feedback. Thank you for the review.

- Show unsigned values can be round-tripped between java and C++
  in integration tests.  This doesn't fully fix the problem because
  the UInt* APIs are mostly wrong because they can't represent the
  full range of unsigned values (return types are all too small
  because java only has signed types).

- While I was at it, I fixed the issue with no batches.
@emkornfield emkornfield force-pushed the fix_integration_tests branch from 94f30a2 to d8ad3d8 Compare June 4, 2019 04:53
@emkornfield
Copy link
Contributor Author

Will fix the python issue later today

Copy link
Contributor

@praveenbingo praveenbingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment..looks good otherwise.

* @param index position of the element.
* @return value stored at the index.
*/
public static long get(final ArrowBuf buffer, final int index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not return char and returning a long..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy paste bug will fix

Copy link
Contributor

@praveenbingo praveenbingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@wesm wesm closed this in 746805c Jun 5, 2019
@wesm
Copy link
Member

wesm commented Jun 5, 2019

Thanks everybody

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…ests

- Show unsigned values can be round-tripped between java and C++
  in integration tests.  This doesn't fully fix the problem because
  the UInt* APIs are mostly wrong because they can't represent the
  full range of unsigned values (return types are all too small
  because java only has signed types).

- While I was at it, I fixed the issue with no batches.

Author: Micah Kornfield <emkornfield@gmail.com>
Author: emkornfield <emkornfield@gmail.com>

Closes apache#4432 from emkornfield/fix_integration_tests and squashes the following commits:

226c4af <Micah Kornfield> fixes
27e4738 <emkornfield> Add missing comma to integration test
d8ad3d8 <Micah Kornfield> Address PR feedback
a6a23e9 <Micah Kornfield> ARROW-1837:  Fix unsigned round trip integration tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants