[Python, Java] UnionArray round trip not working

I'm currently working on making pyarrow.serialization data available from the Java side, one problem I was running into is that it seems the Java implementation cannot read UnionArrays generated from C++. To make this easily reproducible I created a clean Python implementation for creating UnionArrays: https://github.com/apache/arrow/pull/1216

The data is generated with the following script:

```Java

import pyarrow as pa

binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
int64 = pa.array([1, 2, 3], type='int64')
types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')

result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)

batch = pa.RecordBatch.from_arrays([result], ["test"])

sink = pa.BufferOutputStream()
writer = pa.RecordBatchStreamWriter(sink, batch.schema)

writer.write_batch(batch)

sink.close()

b = sink.get_result()

with open("union_array.arrow", "wb") as f:
    f.write(b)

# Sanity check: Read the batch in again

with open("union_array.arrow", "rb") as f:
    b = f.read()
    reader = pa.RecordBatchStreamReader(pa.BufferReader(b))

batch = reader.read_next_batch()

print("union array is", batch.column(0))
```

I attached the file generated by that script. Then when I run the following code in Java:

```Java

RootAllocator allocator = new RootAllocator(1000000000);

ByteArrayInputStream in = new ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));

ArrowStreamReader reader = new ArrowStreamReader(in, allocator);

reader.loadNextBatch()
```

I get the following error:

```Java

|  java.lang.IllegalArgumentException thrown: Could not load buffers for field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error message: can not truncate buffer to a larger size 7: 0
|        at VectorLoader.loadBuffers (VectorLoader.java:83)
|        at VectorLoader.load (VectorLoader.java:62)
|        at ArrowReader$1.visit (ArrowReader.java:125)
|        at ArrowReader$1.visit (ArrowReader.java:111)
|        at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
|        at ArrowReader.loadNextBatch (ArrowReader.java:137)
|        at (#7:1)
```

It seems like Java is not picking up that the UnionArray is Dense instead of Sparse. After changing the default in java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, I get this:

```Java

jshell> reader.getVectorSchemaRoot().getSchema()
$9 ==> Schema<list: Union(Dense, [0])<: Struct<list: List<item: Union(Dense, [0])<: Int(64, true)>>>>>
```

but then reading doesn't work:

```Java

jshell> reader.loadNextBatch()
|  java.lang.IllegalArgumentException thrown: Could not load buffers for field list: Union(Dense, [1])<: Struct<list: List<$data$: Union(Dense, [5])<: Int(64, true)>>>>. error message: can not truncate buffer to a larger size 1: 0
|        at VectorLoader.loadBuffers (VectorLoader.java:83)
|        at VectorLoader.load (VectorLoader.java:62)
|        at ArrowReader$1.visit (ArrowReader.java:125)
|        at ArrowReader$1.visit (ArrowReader.java:111)
|        at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
|        at ArrowReader.loadNextBatch (ArrowReader.java:137)
|        at (#8:1)
```

Any help with this is appreciated!

**Reporter**: [Philipp Moritz](https://issues.apache.org/jira/browse/ARROW-1692) / @pcmoritz
**Assignee**: [Ryan Murray](https://issues.apache.org/jira/browse/ARROW-1692) / @rymurr
#### Related issues:
- [[Integration] Add integration tests for Union types](https://github.com/apache/arrow/issues/16222) (is blocked by)
- [[Java] getMinorTypeForArrowType returns sparse minor type for dense union types](https://github.com/apache/arrow/issues/25377) (is duplicated by)
#### Original Issue Attachments:
- [union_array.arrow](https://issues.apache.org/jira/secure/attachment/12893161/union_array.arrow)
#### PRs and other links:
- [GitHub Pull Request #7290](https://github.com/apache/arrow/pull/7290)

<sub>**Note**: *This issue was originally created as [ARROW-1692](https://issues.apache.org/jira/browse/ARROW-1692). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python, Java] UnionArray round trip not working #17700

Related issues:

Original Issue Attachments:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python, Java] UnionArray round trip not working #17700

Description

Related issues:

Original Issue Attachments:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions