Skip to content

Conversation

@yuruiz
Copy link

@yuruiz yuruiz commented May 20, 2019

  • setup necessary dev environment for JNI development on JAVA and C++ codebase
  • implemented JNI interface to enable reading arrow record batch from ORC files
  • implemented a naive arrow buffer reference manager to ensure c++ memory release

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial comments, I'm not super familiar with JNI so a I apologize if some of them don't make sense.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple questions about JNI code organization and build flags for C++

@yuruiz yuruiz changed the title ARROW-4714:[WIP][C++][JAVA]Providing JNI interface to Read ORC file via Arrow C++ ARROW-4714:[C++][JAVA]Providing JNI interface to Read ORC file via Arrow C++ May 29, 2019
@yuruiz
Copy link
Author

yuruiz commented May 30, 2019

Hi @praveenbingo @emkornfield , all your comments have been addressed. Do you guys have any other comments regarding the latest changes?

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass through, this seems much cleaner. Will try to take a more careful look tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to throw when we truncate memory size (at least have a debug mode that does this). Maybe not for this CL

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully understand you here. Under what circumstances would we truncate the memory size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if getSize() returns a value greater than MAX_INT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should solve such an issue here. If an arrow buffer size could go beyond MAX_INT, it will also cause issue on IPC ser/deser.

@wesm wesm changed the title ARROW-4714:[C++][JAVA]Providing JNI interface to Read ORC file via Arrow C++ ARROW-4714: [C++][JAVA] Providing JNI interface to Read ORC file via Arrow C++ May 30, 2019
@emkornfield
Copy link
Contributor

@praveenbingo or @pravindra can one of you take another look at the JNI stuff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change back, the kInitiModuleId is what should be used for constants (also static shouldn't be required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-static data member cannot be constexpr
and here is just changing the name to make it comply with Arrow C++ naming convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which naming convention are you referring to? As as I know constants are generally of the form kInitModuleId (non-static members follow the convention you have here).

Copy link
Author

@yuruiz yuruiz Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused about the naming convention in ARROW c++. In type.h I can find a lot static constexpr members follow my current convention like "type_id" so I thought this maybe a more consistent convention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think at times were weren't consistent with our own conventions.

Theoretically we follow the google style guide with only a few exceptions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for now let's follow the way like init_module_id_ to be consistent with rest of the codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have confused the point. Old code has both styles, New code should (and mostly does) follow the kInitModuleId style. At this point it would be counter-productive to change it back (I'd like to merge once CI passes) but in the future please use what is proscribed in the style guide.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Thank you!

@praveenbingo
Copy link
Contributor

@praveenbingo or @pravindra can one of you take another look at the JNI stuff?

will do a pass tomorrow my time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this is right implementation of this method..can you please check the impl in BufferLedger..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I know, currently this is just a naive implementation. Currently the bufferLedger implementation is relied on the allocator and allocationManager. Some minor refactor is required to have the native allocated memory fully integrated into the current system. I would make the changes in separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unsupported operation might be better then? Till we implement fully.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we throw exception here. We couldn't pass the basic unittests since retain() will be called when loading orc memory into ArrowRecordBatch. So I would prefer to leave it here to allow basic validation and someone who interested to play around. I can add a warning here stating the api here is not stable and will subject to change in future.

@codecov-io
Copy link

codecov-io commented Jun 4, 2019

Codecov Report

Merging #4348 into master will increase coverage by 0.78%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4348      +/-   ##
=========================================
+ Coverage   88.42%   89.2%   +0.78%     
=========================================
  Files         784     696      -88     
  Lines      100287   91686    -8601     
  Branches     1251       0    -1251     
=========================================
- Hits        88678   81793    -6885     
+ Misses      11373    9893    -1480     
+ Partials      236       0     -236
Impacted Files Coverage Δ
python/pyarrow/compat.py 57.14% <0%> (-32.97%) ⬇️
python/pyarrow/flight.py 80% <0%> (-20%) ⬇️
cpp/src/arrow/json/chunked-builder.cc 79.91% <0%> (-9.63%) ⬇️
python/pyarrow/util.py 47.45% <0%> (-8.48%) ⬇️
python/pyarrow/error.pxi 53.19% <0%> (-6.39%) ⬇️
cpp/src/arrow/compute/kernels/filter.h 93.75% <0%> (-6.25%) ⬇️
cpp/src/arrow/compute/kernels/filter.cc 95.23% <0%> (-4.77%) ⬇️
cpp/src/parquet/schema-internal.h 96.55% <0%> (-3.45%) ⬇️
cpp/src/arrow/status.cc 33.69% <0%> (-3.27%) ⬇️
python/pyarrow/tests/conftest.py 71.42% <0%> (-3.07%) ⬇️
... and 336 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2cfca1...1bbb4c7. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not these methods declare that they can throw exceptions (since it looks like we are throwing exceptions on invalid id).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exceptions are declared in OrcReader.java. I am not sure if I can define throw exception signature on native methods.

In any case, this methods are package private and should not be directly invoked by external user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native methods can declare exceptions, i am not sure of the behavior when a native method throws a checked exception when it does not declare in its signature, again i am ok to follow up on this later with a test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah looks like you are throwing runtime exceptions now - which is better, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a lot of local references being created to java objects - can you please test with a large file say of 100 columns or more and see if this goes through..from what i remember jvm needed native methods to inform it if more than 16 references were needed using ensureLocalCapacity()..

Reference : https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html

I am ok if you want to create a followup JIRA for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@praveenbingo praveenbingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a ton @yuruiz for all of the work !

@emkornfield Please go ahead if this looks good to you.

@emkornfield
Copy link
Contributor

@yuruiz this looks ok to me with two small comments. Could you also open up JIRAs to add this code to CI?

@yuruiz
Copy link
Author

yuruiz commented Jun 6, 2019

@yuruiz this looks ok to me with two small comments. Could you also open up JIRAs to add this code to CI?

https://issues.apache.org/jira/browse/ARROW-5519

@emkornfield
Copy link
Contributor

+1, LGTM. thank you .
@praveenbingo thank you for your help reviewing

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…Arrow C++

- setup necessary dev environment for JNI development on JAVA and C++ codebase
- implemented JNI interface to enable reading arrow record batch from ORC files
- implemented a naive arrow buffer reference manager to ensure c++ memory release

Author: Yurui Zhou <yurui.zyr@alibaba-inc.com>

Closes apache#4348 from yuruiz/JniOrcReader and squashes the following commits:

41592bf <Yurui Zhou> minor doc fix
44b5420 <Yurui Zhou> make sure lookup operation are performed under lock
706c8dc <Yurui Zhou> resolve comments
de8529c <Yurui Zhou> resolve comments
fc80175 <Yurui Zhou> resolve comments
9b04b76 <Yurui Zhou> fix style issues and add proper docs
9b13d7f <Yurui Zhou> replace nullptr with NULLPTR macro
dd981af <Yurui Zhou> fix lint and clang-format
44505df <Yurui Zhou> Fix cmake format
f2a0c04 <Yurui Zhou> destruct schema reader when finish reading
4f89e34 <Yurui Zhou> Make sure resources are properly released.
26d74db <Yurui Zhou> fix minor style check error
ce30933 <Yurui Zhou> Add Arrow Jni Reader Unittests
7a80fbd <Yurui Zhou> Minor refactor
e4c0630 <Yurui Zhou> remove redundant code
e932aa8 <Yurui Zhou> Move jni code to src/jni and change build flag to arrow_jni
1b6a704 <Yurui Zhou> Interface refactor and performance optimization
3604c24 <Yurui Zhou> Resolve merge conflicts
1c0e0b2 <Yurui Zhou> Fix minor build errors
e0d9c1f <Yurui Zhou> implement JNI interface on both size
a1e80a6 <Yurui Zhou> Add arrow-orc setup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants