-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
The DisposableScannerAdaptor is a JNI bridge from Java to the C++ datasets API. When it scans record batches it collects all of the buffers from all of the arrays and returns a list of buffer handles to Java which puts these into an ArrowRecordBatch on the Java end.
Unfortunately, if the array has offsets then the bridge does not return the offset buffer but it returns the entire buffer. The Java record batch is then incorrect. The length is wrong (and so it doesn't fully free the memory) and the values are incorrect.
I'm not familiar enough with the Java implementation to suggest a good fix. Figuring out the buffer offsets from array offsets is a bit tricky since the logic depends on the data type. Also, I'm pretty sure the Java side now has to take ownership of the entire buffer which could be tricky because multiple batches could share ownership of the buffer.
As a temporary fix for ARROW-13554 I am going to copy the array if it has an offset. This means the transfer is not zero-copy so I'm creating this issue to solve this properly.
Reporter: Weston Pace / @westonpace
Related issues:
- [Java] Dataset JNI bridge should use the C data interface (is superceded by)
Note: This issue was originally created as ARROW-15275. Please see the migration documentation for further details.