Skip to content

Conversation

@xhochy
Copy link
Member

@xhochy xhochy commented May 17, 2018

jpype is used for the tests but the actual implementation does not reference any specific Python<->Java bridge. It may work with others but this is not tested.

@xhochy
Copy link
Member Author

xhochy commented May 17, 2018

@wesm @pitrou I'm wondering a bit on which CI entry to use to test this. Currently it is not part the CI matrix and the path to the Java JAR is hardcoded. This needs to be fixed before we merge anything.

@pitrou
Copy link
Member

pitrou commented May 17, 2018

What exactly is "arrow-tools-0.10.0-SNAPSHOT-jar-with-dependencies.jar"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my enlightenment, this works only with JPype or with any Java/Python bridge (are there any others?)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also work with https://www.py4j.org/ but I have not tested this yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more, it might not work with Py4J as the JVM memory addresses returned there are probably in another process.

@xhochy
Copy link
Member Author

xhochy commented May 17, 2018

What exactly is "arrow-tools-0.10.0-SNAPSHOT-jar-with-dependencies.jar"?

We currently build several Java archives in the build process. All except the mentioned one only carry their code. This one is also built by default and contains some basic tooling to handle arrow files but it also is self-contained with all dependencies, i.e. loading this JAR, you have all Arrow Java libraries and their dependencies in RAM. This saves a lot of work that would otherwise go into setting up the Java classpath for the tests.

This means that these tests here must run after we have run the Java build.

@xhochy
Copy link
Member Author

xhochy commented May 17, 2018

My main motivation for this is that I would like to use #1759 from Python to have a fast JDBC access.

@xhochy xhochy force-pushed the ARROW-2608 branch 2 times, most recently from 79ca4fd to a4e6ba2 Compare May 19, 2018 16:17
@codecov-io
Copy link

codecov-io commented May 19, 2018

Codecov Report

Merging #2062 into master will increase coverage by 0.02%.
The diff coverage is 91.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2062      +/-   ##
==========================================
+ Coverage   86.28%   86.31%   +0.02%     
==========================================
  Files         242      243       +1     
  Lines       41041    41208     +167     
==========================================
+ Hits        35412    35567     +155     
- Misses       5629     5641      +12
Impacted Files Coverage Δ
python/pyarrow/__init__.py 57.89% <ø> (ø) ⬆️
python/pyarrow/array.pxi 68.2% <58.33%> (-0.27%) ⬇️
python/pyarrow/io.pxi 62.85% <75%> (+0.09%) ⬆️
python/pyarrow/types.pxi 63.18% <92.85%> (+4.18%) ⬆️
python/pyarrow/tests/test_jvm.py 97.01% <97.01%> (ø)
cpp/src/arrow/util/thread-pool-test.cc 98.87% <0%> (-0.57%) ⬇️
cpp/src/arrow/python/builtin_convert.cc 92.05% <0%> (+0.24%) ⬆️
cpp/src/plasma/store.cc 89.17% <0%> (+0.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd25a65...8192624. Read the comment docs.

@wesm
Copy link
Member

wesm commented May 20, 2018

What do you think about segregating JVM interop into a pyarrow.jvm module (vs. adding symbols to the Array.* namespace)? I will endeavor to review other aspects of this patch more closely this coming week

@xhochy
Copy link
Member Author

xhochy commented May 21, 2018

Moving it to a separate module makes sense. At the moment, we don't have an explicit runtime dependency on jpype but I'm not sure if we will be able to keep this forever.

@BryanCutler
Copy link
Member

This looks really cool! I'm not an expert in this area, but I thought processes had different address spaces, so is jpype running the JVM in a thread? Just wondering if this kind of technique is possible with py4j.. Thanks!

@wesm
Copy link
Member

wesm commented May 24, 2018

@BryanCutler py4j is out-of-process / socket-based whereas jpype is in-progress / JNI-based, which is why this is possible

@icexelloss
Copy link
Contributor

I am curious about this too ... :)

@wesm My understanding is that we are just communicating memory address between Python and Java process, would that still not work if we are communicating the memory address through py4j socket?

@wesm
Copy link
Member

wesm commented May 24, 2018

With py4j, the Java heap is in a different process; it's not possible for another process to get access to those bytes. With jpype, the Java heap (in an embedded JVM) is in the same process / virtual address space as Python

@icexelloss
Copy link
Contributor

Ahh I see. Thanks for the explanation!

@xhochy
Copy link
Member Author

xhochy commented Jun 22, 2018

@wesm @pitrou Any requirements here on doing a first merge? I would like to develop this further but first bring in this PR.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this looks like a good start. I left a comment about documentation but that can be handled more generally later, so feel free to merge

# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JPype is not mentioned in this file; this might bear mentioning in one or more of the function docstrings

@xhochy
Copy link
Member Author

xhochy commented Jun 26, 2018

+1, Travis failure is in Plasma and thus unrelated.

@xhochy xhochy closed this in 4481b07 Jun 26, 2018
@xhochy xhochy deleted the ARROW-2608 branch June 26, 2018 08:20
@wesm
Copy link
Member

wesm commented Jun 26, 2018

@xhochy so now that Java is being built in our C++ builds, our logs are full of linter warnings in Java. I've brought this up on the mailing list and JIRA several times; I'm inclined to start fixing the warnings myself. Any ideas?

@xhochy
Copy link
Member Author

xhochy commented Jun 26, 2018

@wesm you mean the warnings like

[WARNING] src/main/java/org/apache/arrow/tools/Integration.java:[26] (imports) CustomImportOrder: Wrong lexicographical order for 'org.apache.arrow.vector.ipc.ArrowFileReader' import. Should be before 'org.apache.arrow.vector.ipc.message.ArrowBlock'.

We should definitely fix them all and set them to error once new things will be introduced in new PRs. This will be hard work for us, can you bring this up on the ML first, maybe some Java-savy folks can pick this up?

@wesm
Copy link
Member

wesm commented Jun 26, 2018

Yes, I'll start a thread now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants