Skip to content

[SPARK-55056][SQL][PYTHON][TEST] Add tests using Arrow to deserialize nested array with empty outer array#54880

Closed
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-55056-test
Closed

[SPARK-55056][SQL][PYTHON][TEST] Add tests using Arrow to deserialize nested array with empty outer array#54880
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-55056-test

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented Mar 18, 2026

What changes were proposed in this pull request?

Add tests to verify that writing triple-nested arrays (and nested arrays with maps) with an empty outer array no longer triggers a SIGSEGV.

Why are the changes needed?

SPARK-55056 reported a segmentation fault when deserializing triple-nested arrays with an empty outer array via Arrow IPC. The root cause was in arrow-java: ListVector.getBufferSizeFor(0) returned 0, causing the offset buffer to be omitted for empty vectors, which violates the Arrow spec (offset buffer must have N+1 entries even when N=0).

This has been fixed upstream in arrow-java 19.0.0 (apache/arrow-java#343), which Spark adopted in SPARK-56000 (PR #54820). These tests confirm the fix works correctly without any Spark-side workaround.

Does this PR introduce any user-facing change?

No (test only).

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@Yicong-Huang Yicong-Huang changed the title [SPARK-55056][SQL][PYTHON][TEST] Add tests for nested array with empty outer array [SPARK-55056][SQL][PYTHON][TEST] Add tests using Arrow to deserialize nested array with empty outer array Mar 18, 2026
@Yicong-Huang
Copy link
Copy Markdown
Contributor Author

cc @viirya @ueshin

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

terana pushed a commit to terana/spark that referenced this pull request Mar 23, 2026
… nested array with empty outer array

### What changes were proposed in this pull request?

Add tests to verify that writing triple-nested arrays (and nested arrays with maps) with an empty outer array no longer triggers a SIGSEGV.

### Why are the changes needed?

SPARK-55056 reported a segmentation fault when deserializing triple-nested arrays with an empty outer array via Arrow IPC. The root cause was in arrow-java: `ListVector.getBufferSizeFor(0)` returned 0, causing the offset buffer to be omitted for empty vectors, which violates the Arrow spec (offset buffer must have N+1 entries even when N=0).

This has been fixed upstream in arrow-java 19.0.0 ([apache/arrow-java#343](apache/arrow-java#343)), which Spark adopted in SPARK-56000 (PR apache#54820). These tests confirm the fix works correctly without any Spark-side workaround.

### Does this PR introduce _any_ user-facing change?

No (test only).

### How was this patch tested?

New unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54880 from Yicong-Huang/SPARK-55056-test.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants