-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-12965: [Java] C Data Interface implementation #11067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
I skimmed through it and it looks good, thanks a lot for this! What we did in Rust was to use the c data interface that C++ exposes in Python to make calls from within the process, see here. I think it would be beneficial to add integration tests against pyarrow. In Rust we found a couple of memory leaks and double frees during development by testing against pyarrow / c++. I would also setup an environment to run those tests against e.g. valgrind, since in FFI is very easy to trigger UB. |
Thanks Jorge. We have @tomersolomon1 working on integration tests against pyarrow as a follow-up to this PR. The approach that we are trying with integration tests is to use jpype (as suggested by @pitrou) and run the tests against the same datasets used in the IPC integration tests. It's too early to say if that makes sense. I think that eventually the same set of tests should be used for all languages.
I'm not familiar with using valgrind in Java but I will definitely check. FWIW the tests do fail in case of a memory leak for memory allocated with a buffer allocator ( Somewhat unrelated but I know that you ran tests with valgrind for arrow2, did you also ran it for arrow-rs? I thought that there is a memory leak in arrow-rs because no one ported your export API fixes yet. Just out of curiosity about my memory leak assumption. |
cd8db17 to
09aee82
Compare
7de8004 to
3fed9e0
Compare
java/ffi/README.md
Outdated
| install: | ||
| - Java 8 or later | ||
| - Maven 3.3 or later | ||
| - A C++ compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a requirement for the C++ compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to A C++11 enabled compiler. I don't know of any specific requirements and unsure how to check. My setup has gcc 9.3.0.
java/ffi/pom.xml
Outdated
| </parent> | ||
| <modelVersion>4.0.0</modelVersion> | ||
|
|
||
| <artifactId>arrow-ffi</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I am a little curious what does ffi stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been considering if abi could be able to describe what happens there better than ffi. I am not sure but in Java it seems that when it comes to foreign it might implie something related to project panama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another naming candidate is bridge (from the cpp implementation). I will change to whatever is chosen.
Personally I don't associate FFI with project panama, it's a well known term way before project panama started. But I understand your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep the naming doesn't seem to be blocking anyway. Though I was thinking that we might be interested in integrating functionalities from Panama into Arrow once we decide to upgrade to higher Java version (currently we are on 1.8). At that time it might be clearer to have distinction between the module ffi or other parts using Panama FFI. Another point is that we already have other modules using JNI which is also literally just a form of FFI. I think if we are introducing C data interface we may want users aim to data rather than function. Anyway, didn't want to block anything, just my 2 cents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that arrow-ffi may not be good. It seems that FFI is too generic. If we use FFI, users may think that they can call any functions implemented in not Java. (General FFI library can do them.)
How about arrow-c-data and org.apache.arrow.c.Data?
java/ffi/pom.xml
Outdated
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.arrow</groupId> | ||
| <artifactId>arrow-memory-netty</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use arrow-memory-unsafe instead? as we are in the process of reducing the dependency on netty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to have a compile time dependency on a specific implementation at all? I think netty is still the more mature allocator, but trying to keep as much code independent would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emkornfield it's a test dependency (test scope)
That sounds reasonable to me. You can start from hand-written data generation in PyArrow (or Java) if that's easier, though. |
| ArrowArray.Snapshot snapshot = src.snapshot(); | ||
| checkState(snapshot.release != NULL, "Cannot import released ArrowArray"); | ||
| recursionLevel = parent.recursionLevel + 1; | ||
| checkState(recursionLevel < MAX_IMPORT_RECURSION_LEVEL, "Recursion level in ArrowArray struct exceeded"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be <=?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| * </pre> | ||
| */ | ||
| public class ArrowArray implements BaseStruct { | ||
| private static final int SIZE_OF = 80; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size is 80 only in 64-bit system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
The PR message lists some questions and I would really appreciate it if someone could answer them explicitly. Support for 32bit is one of the questions. I wasn't sure if arrow targets it. If required in this PR then I will make the required changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. Maybe we need some comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for sanity the import/export functions should error out on 32-bit systems (I assume this shouldn't be difficult to do?).
| * @return A new ArrowSchema instance | ||
| */ | ||
| public static ArrowSchema allocateNew(BufferAllocator allocator) { | ||
| return new ArrowSchema(allocator.buffer(ArrowSchema.SIZE_OF)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to fill zeros to the newly allocated buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou any guidelines here? I think that at least the release callback should be zeroed for correctness in case of failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Is there a finalizer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a newly allocated ArrowSchema at consumer side. On the ImportField/ImportSchema methods it is released (similar to the cpp implementation).
The question is whether the consumer should zero all fields before the producer fills this structure. My assumption was that it's not a requirement.
However, for the case of a failed producer it might be better to zero the release field before the producer fills it to avoid crash on release at consumer side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the producer returns an error, then the consumer shouldn't release the half-filled ArrowSchema. So you shouldn't worry about this case.
|
|
||
| void put(Dictionary dictionary) { | ||
| Dictionary previous = map.put(dictionary.getEncoding().getId(), dictionary); | ||
| if (previous != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior is a little different from the C++ API?
/// \brief Add a dictionary to the memo with a particular id. Returns
/// KeyError if that dictionary already exists
Status AddDictionary(int64_t id, const std::shared_ptr<ArrayData>& dictionary);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dictionaries in Java are implemented very differently than the cpp implementation. The approach we took (after struggling with alternatives) is having this FFIDictionatyProvider as a reusable owner of imported dictionaries. That is, you can create it once and import ArrowArray instances to it. On import of batches it is actually expected to import the same IDs multiple times (possibly with slightly different values due to deltas I believe but unsure).
See the mailing list thread for background https://lists.apache.org/x/thread.html/rd2aecfe5ad71a6f81240ed5c6f706b1a6b2f4a95b8dd5db515e5fceb@%3Cdev.arrow.apache.org%3E
| } | ||
| } | ||
| // the new ref count should be >= 0 | ||
| Preconditions.checkState(refCnt >= 0, "RefCnt has gone negative"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check can be moved forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Please check if that's what you meant.
| Preconditions.checkState(decrement >= 1, "ref count decrement should be greater than or equal to 1"); | ||
| // decrement the ref count | ||
| final int refCnt; | ||
| synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better performance, the scope of this lock can be reduced?
Only the following statements need the lock?
struct.release();
struct.close();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| @Override | ||
| public ArrowBuf deriveBuffer(ArrowBuf sourceBuffer, long index, long length) { | ||
| final long derivedBufferAddress = sourceBuffer.memoryAddress() + index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we directly use ArrowBuf#slice(long index, long length)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like slice is using deriveBuffer so it's not possible
|
|
||
| static String asString(ArrowType arrowType) { | ||
| if (arrowType instanceof ExtensionType) { | ||
| arrowType = ((ExtensionType) arrowType).storageType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need a recursive call, if the storage type of the extension type is another extension type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
java/ffi/src/main/cpp/jni_wrapper.cc
Outdated
|
|
||
| jint JNI_VERSION = JNI_VERSION_1_6; | ||
|
|
||
| class JniPendingException : public std::runtime_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting error arrow/java/ffi/src/main/cpp/jni_wrapper.cc:49:78: error: expected class-name before ‘(’ tokenuntil explicitly including stdexcept in this file. (gcc 10.1.1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| #define ARROW_FLAG_NULLABLE 2 | ||
| #define ARROW_FLAG_MAP_KEYS_SORTED 4 | ||
|
|
||
| struct ArrowSchema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to duplicate this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to duplicate or copy these declarations around. They are in the spec:
https://arrow.apache.org/docs/format/CDataInterface.html#structure-definitions
| ``` | ||
| mkdir -p ./target/build/ | ||
| pushd ./target/build/ | ||
| cmake ../.. | ||
| make | ||
| popd | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to previous comment... I am worried about if we introduce more complexity for keeping the independency of ffi cpp code. Would it be better to somehow reorganize related source files into cpp folder (maybe under jni/)? We can still include ffi's own lib into its jar file during mvn install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the questions in the PR message. I am looking for guidance on where this code should be, should it be enabled by default and if not which flag should be used, should it be built as part of arrow cpp or independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is entirely independent from Arrow C++, so it should live in the Arrow Java source tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to integrate it into the CI. Any guidelines on whether this should be behind a feature flag or not and whether the flag should be enabled by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is entirely independent from Arrow C++, so it should live in the Arrow Java source tree.
@pitrou I believe (after trying) that this adds a lot of complexity that is already solved in the cpp side of the project (cross-platform builds, linting, packaging, etc.). I would like to propose adopting the suggestion by @zhztheplayer to move it to cpp/jni.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how you intend to do it exactly, but please kind in mind that the whole point of the C data interface is that you don't need Arrow C++ to use it. So even if this were in the Arrow C++ source tree, it should be completely standalone and buildable independently.
java/ffi/README.md
Outdated
|
|
||
| ``` | ||
| cd java | ||
| mvn -Parrow-ffi install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So clean build is not allowed here right? As we already put all cpp build files to target/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to be a separate build directory outside target so it should be fine now. I had to add a .gitignore file with that folder otherwise the license check fails.
java/ffi/src/main/cpp/jni_wrapper.cc
Outdated
| if (private_data->vm_->GetEnv(reinterpret_cast<void**>(&env), JNI_VERSION) != JNI_OK) { | ||
| ThrowPendingException("JNIEnv was not attached to current thread"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use vm_->AttachCurrentThread instead? As the exported data might be imported and released from other threads that are not attached to JVM. We may allow this GetEnv routine in other modules but that was because in those modules we don't have to deal with native-created threads.
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#AttachCurrentThread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by adding a new JNIEnvGuard class.
| @Override | ||
| public ArrowBuf retain(ArrowBuf srcBuffer, BufferAllocator targetAllocator) { | ||
| retain(); | ||
| return srcBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we make a trade-off between returning the source buffer or creating a new buffer here? I am not sure which way is better but it seems that in this way the source buffer will be shared to the target vector along with its internal states (read / write indexes) during buffer loading, although an allocator was assigned to that vector.
Also, here we may fail to register any buffer to the targetAllocator. But maybe it doesn't really hurt and be too tricky to improve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, here we may fail to register any buffer to the targetAllocator. But maybe it doesn't really hurt and be too tricky to improve.
I derived a new ArrowBuf but I can't think of any way to associate it with the given targetAllocator given that this is foreign memory and given the release management in the C Data Interface. FYI this was originally copied from the ORC implementation in arrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough to me. We don't have to associate it to the targetAllocator here.
| * @return Array of pointer values as longs | ||
| */ | ||
| public static long[] toJavaArray(long arrayPtr, int size) { | ||
| if (size == 0 || arrayPtr == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please note that an empty array and a null are different things
|
For the record, here's an example of testing against PyArrow in the Go PR: |
| if (refCnt == 0) { | ||
| // refcount of this reference manager has dropped to 0 | ||
| // release the underlying memory | ||
| synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we need to check refCnt == 0 again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the counter is atomic, in what scenario is it needed?
I believe that if the counter is decremented to zero then it must be released. Incrementing the counter after (or during) shouldn't change the decision to release. I saw no special code for handling the increment-after-release case in other reference managers so I assume it's not possible by design. However, if desired I can add it just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we need to check refCnt == 0 again here?
Literally speaking refCnt is a local variable, it doesn't change.
I think even we check using bufRefCnt.get(), that is still not enough. We don't lock writes from other places so the value might happen to be written from 0 to 1 to 0 during this block. That is an ABA problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roee88 BufferLedger has a check at line 185 and it seems that we don't have it here
arrow/java/memory/memory-core/src/main/java/org/apache/arrow/memory/BufferLedger.java
Lines 178 to 186 in 3bbec3f
| @Override | |
| public void retain(int increment) { | |
| Preconditions.checkArgument(increment > 0, "retain(%s) argument is not positive", increment); | |
| if (BaseAllocator.DEBUG) { | |
| historicalLog.recordEvent("retain(%d)", increment); | |
| } | |
| final int originalReferenceCount = bufRefCnt.getAndAdd(increment); | |
| Preconditions.checkArgument(originalReferenceCount > 0); | |
| } |
Since release(int decrement) and retain(int increment) are both public so I think one could call retain after calling release manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@zhztheplayer can you please have a quick look at the OrcReferenceManager implementation to see if a ticket needs to be opened for a fix there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we may be able to move all the checks to a debug option (as well as in BufferLedger)? I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If by "check" you mean it writes a nice error message, then that sounds reasonable indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Error message is not created here. We may need some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a message. Can you check if the current implementation is correct? I think that we can omit the synchronized block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now looks correct to me. We don't seem to need the synchronization block so long as the block will be entered only once. Although in my opinion people can catch exceptions from #retain by their own. That's a rare case so we either remove the sync block or expand it to the same scope as in 'BufferLedger'.
a787a54 to
cca1a84
Compare
|
Is this PR waiting for review or are updates needed from the author? |
|
@emkornfield We already addressed all of the review comments provided so far, but this PR was waiting for CI integration for unit testing and jar packaging. These were added in the last commit so a review of that part is required. |
Signed-off-by: roee88 <roee88@gmail.com>
Signed-off-by: roee88 <roee88@gmail.com>
Signed-off-by: roee88 <roee88@gmail.com>
Signed-off-by: roee88 <roee88@gmail.com>
* Add testing of the ffi module to the JNI tests * Add packaging of ffi module to java jars packaging Signed-off-by: roee88 <roee88@gmail.com> Co-authored-by: Doron Chen <CDORON@il.ibm.com>
Signed-off-by: roee88 <roee88@gmail.com>
69016b3 to
99d7556
Compare
|
@github-actions crossbow submit java-jars |
|
Revision: 99d7556 Submitted crossbow builds: ursacomputing/crossbow @ actions-913
|
java/ffi/pom.xml
Outdated
| </parent> | ||
| <modelVersion>4.0.0</modelVersion> | ||
|
|
||
| <artifactId>arrow-ffi</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that arrow-ffi may not be good. It seems that FFI is too generic. If we use FFI, users may think that they can call any functions implemented in not Java. (General FFI library can do them.)
How about arrow-c-data and org.apache.arrow.c.Data?
ci/scripts/java_build.sh
Outdated
| fi | ||
|
|
||
| if [ "${ARROW_JAVA_FFI}" = "ON" ]; then | ||
| ${mvn} -Darrow.ffi.cpp.build.dir=${ffi_build_dir} -Parrow-ffi install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpp confuses me with C++ implementation.
How about jni instead of cpp such as -Darrow.ffi.jni.build.dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the previous suggestion about also renaming the package to c and the class to Data, I thought that this property should now be called arrow.c.jni.dist.dir. (arrow.c to be similar to arrow.vector.* and arrow.memory.* properties that start with the package/module name and jni.dist.dir to indicate that this is the directory with the shared libs). Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, c may mislead C GLib implementation.
Generally, we use "GLib" or "C GLib" for C GLib implementation. Most of people will not think about C GLib implementation by c here.
And c means Arrow's C data interface here. (It doesn't mean C language is used for implementation.) It's natural that we use c here.
So I'm OK with arrow.c.jni.dist.dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Package name:
org.apache.arrow.c - Class name:
Data - Maven profile name:
arrow-c-data - Shared library name:
libarrow_cdata_jni - Maven property for dir with the shared library:
arrow.c.jni.dist.dir - CI script for building the shared library:
java_cdata_build.sh - CI flag to enable building the shared library:
ARROW_JAVA_CDATA=ON
Anything that should be different before I make the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed accordingly
docker-compose.yml
Outdated
| /arrow/ci/scripts/java_ffi_build.sh /arrow /build/java/ffi/build /build/java/ffi && | ||
| /arrow/ci/scripts/cpp_build.sh /arrow /build && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /arrow/ci/scripts/java_ffi_build.sh /arrow /build/java/ffi/build /build/java/ffi && | |
| /arrow/ci/scripts/cpp_build.sh /arrow /build && | |
| /arrow/ci/scripts/cpp_build.sh /arrow /build && | |
| /arrow/ci/scripts/java_ffi_build.sh /arrow /build/java/ffi/build /build/java/ffi && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
java/ffi/CMakeLists.txt
Outdated
| project(arrow_ffi_java) | ||
|
|
||
| # Find java/jni | ||
| include(FindJava) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
I think that find_package(Java) read the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
java/ffi/CMakeLists.txt
Outdated
| # Find java/jni | ||
| include(FindJava) | ||
| include(UseJava) | ||
| include(FindJNI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
I think that find_package(JNI) read the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
java/ffi/CMakeLists.txt
Outdated
| target_link_libraries(arrow_ffi_jni ${JAVA_JVM_LIBRARY}) | ||
| add_dependencies(arrow_ffi_jni ${PROJECT_NAME}) | ||
|
|
||
| install(TARGETS arrow_ffi_jni DESTINATION ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ${CMAKE_INSTALL_PREFIX}/ is needless here:
| install(TARGETS arrow_ffi_jni DESTINATION ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}) | |
| install(TARGETS arrow_ffi_jni DESTINATION ${CMAKE_INSTALL_LIBDIR}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
java/ffi/README.md
Outdated
| mkdir -p build | ||
| pushd build | ||
| cmake .. | ||
| make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use cmake --build . here. It doesn't depend on make nor ninja.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: roee88 <roee88@gmail.com>
* Package name: org.apache.arrow.c * Class name: Data * Maven profile name: arrow-c-data * Shared library name: libarrow_cdata_jni * Maven property for dir with the shared library: arrow.c.jni.dist.dir * CI script for building the shared library: java_cdata_build.sh * CI flag to enable building the shared library: ARROW_JAVA_CDATA=ON Signed-off-by: roee88 <roee88@gmail.com>
Signed-off-by: roee88 <roee88@gmail.com>
Signed-off-by: roee88 <roee88@gmail.com>
Signed-off-by: roee88 <roee88@gmail.com>
|
@kou can you please crossbow submit java-jars again? |
|
@github-actions crossbow submit java-jars |
|
Revision: 4fbb16f Submitted crossbow builds: ursacomputing/crossbow @ actions-918
|
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
CI failures are unrelated.
|
@roee88 Could you update description of this pull request for the latest code? Our merge tool uses the description as commit message. I'll merge this after you update the description. |
| public static VectorSchemaRoot importVectorSchemaRoot(BufferAllocator allocator, ArrowSchema schema, ArrowArray array, | ||
| CDataDictionaryProvider provider) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all other methods the array parameter is before the schema parameter. In this method it's different and the original reason is that array is optional. However, while writing code that uses it I realized that it's easy to get it wrong due to the inconsistency. @kou is it okay if I swap it for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
Please ping me when you think that this pull request is ready to merge. I'll merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou
Ready. Thanks 👍
Signed-off-by: Doron Chen <cdoron@il.ibm.com> Co-authored-by: CDORON@il.ibm.com <cdoron@lnx-arrow.sl.cloud9.ibm.com>
|
@github-actions crossbow submit java-jars |
|
Revision: bff6410 Submitted crossbow builds: ursacomputing/crossbow @ actions-924
|
|
Benchmark runs are scheduled for baseline = e2b1dd9 and contender = e379ee1. e379ee1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Experimental C Data Interface support in Arrow Java for 64bit systems.
Roundtrip example with dictionaries:
```java
void roundtrip(FieldVector vector, DictionaryProvider provider) {
// Consumer allocates empty structures
try (ArrowSchema consumerArrowSchema = ArrowSchema.allocateNew(allocator);
ArrowArray consumerArrowArray = ArrowArray.allocateNew(allocator)) {
// Producer creates structures from existing memory pointers
try (ArrowSchema arrowSchema = ArrowSchema.wrap(consumerArrowSchema.memoryAddress());
ArrowArray arrowArray = ArrowArray.wrap(consumerArrowArray.memoryAddress())) {
// Producer exports vector into the C Data Interface structures
Data.exportVector(allocator, vector, provider, arrowArray, arrowSchema);
}
// Consumer imports vector (and dictionaries)
try (CDataDictionaryProvider dictionaries = new CDataDictionaryProvider();
FieldVector imported = Data.importVector(allocator, consumerArrowArray, consumerArrowSchema, dictionaries)) {
// Do something with the imported vector
}
}
}
```
Closes apache#11067 from roee88/java-c-data-interface
Lead-authored-by: Roee Shlomo <roee88@gmail.com>
Co-authored-by: roee88 <roee88@gmail.com>
Co-authored-by: Doron Chen <cdoron@il.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Experimental C Data Interface support in Arrow Java for 64bit systems.
Roundtrip example with dictionaries: