-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-2608: [Java/Python] Add pyarrow.{Array,Field}.from_jvm / jvm_buffer #2062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
What exactly is "arrow-tools-0.10.0-SNAPSHOT-jar-with-dependencies.jar"? |
python/pyarrow/io.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my enlightenment, this works only with JPype or with any Java/Python bridge (are there any others?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also work with https://www.py4j.org/ but I have not tested this yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more, it might not work with Py4J as the JVM memory addresses returned there are probably in another process.
We currently build several Java archives in the build process. All except the mentioned one only carry their code. This one is also built by default and contains some basic tooling to handle arrow files but it also is self-contained with all dependencies, i.e. loading this JAR, you have all Arrow Java libraries and their dependencies in RAM. This saves a lot of work that would otherwise go into setting up the Java classpath for the tests. This means that these tests here must run after we have run the Java build. |
|
My main motivation for this is that I would like to use #1759 from Python to have a fast JDBC access. |
79ca4fd to
a4e6ba2
Compare
Codecov Report
@@ Coverage Diff @@
## master #2062 +/- ##
==========================================
+ Coverage 86.28% 86.31% +0.02%
==========================================
Files 242 243 +1
Lines 41041 41208 +167
==========================================
+ Hits 35412 35567 +155
- Misses 5629 5641 +12
Continue to review full report at Codecov.
|
|
What do you think about segregating JVM interop into a |
|
Moving it to a separate module makes sense. At the moment, we don't have an explicit runtime dependency on |
|
This looks really cool! I'm not an expert in this area, but I thought processes had different address spaces, so is jpype running the JVM in a thread? Just wondering if this kind of technique is possible with py4j.. Thanks! |
|
@BryanCutler py4j is out-of-process / socket-based whereas jpype is in-progress / JNI-based, which is why this is possible |
|
I am curious about this too ... :) @wesm My understanding is that we are just communicating memory address between Python and Java process, would that still not work if we are communicating the memory address through py4j socket? |
|
With py4j, the Java heap is in a different process; it's not possible for another process to get access to those bytes. With jpype, the Java heap (in an embedded JVM) is in the same process / virtual address space as Python |
|
Ahh I see. Thanks for the explanation! |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this looks like a good start. I left a comment about documentation but that can be handled more generally later, so feel free to merge
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JPype is not mentioned in this file; this might bear mentioning in one or more of the function docstrings
|
+1, Travis failure is in Plasma and thus unrelated. |
|
@xhochy so now that Java is being built in our C++ builds, our logs are full of linter warnings in Java. I've brought this up on the mailing list and JIRA several times; I'm inclined to start fixing the warnings myself. Any ideas? |
|
@wesm you mean the warnings like We should definitely fix them all and set them to error once new things will be introduced in new PRs. This will be hard work for us, can you bring this up on the ML first, maybe some Java-savy folks can pick this up? |
|
Yes, I'll start a thread now |
jpypeis used for the tests but the actual implementation does not reference any specific Python<->Java bridge. It may work with others but this is not tested.