Skip to content

[C++][JNI] DisposableScannerAdaptor does not handle arrays with offsets #30767

@asfimport

Description

@asfimport

The DisposableScannerAdaptor is a JNI bridge from Java to the C++ datasets API. When it scans record batches it collects all of the buffers from all of the arrays and returns a list of buffer handles to Java which puts these into an ArrowRecordBatch on the Java end.

Unfortunately, if the array has offsets then the bridge does not return the offset buffer but it returns the entire buffer. The Java record batch is then incorrect. The length is wrong (and so it doesn't fully free the memory) and the values are incorrect.

I'm not familiar enough with the Java implementation to suggest a good fix. Figuring out the buffer offsets from array offsets is a bit tricky since the logic depends on the data type. Also, I'm pretty sure the Java side now has to take ownership of the entire buffer which could be tricky because multiple batches could share ownership of the buffer.

As a temporary fix for ARROW-13554 I am going to copy the array if it has an offset. This means the transfer is not zero-copy so I'm creating this issue to solve this properly.

Reporter: Weston Pace / @westonpace

Related issues:

Note: This issue was originally created as ARROW-15275. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions