We use Java Arrow stream reader to reads Arrow-format shuffle data. But if there is struct vector with duplicate field name, Java Arrow will throw the following error:
[info] Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 311.0 failed 1 times, most recent failure: Lost task 1.0 in stage 311.0 (TID 882) (192.168.86.44 executor driver): java.lang.Illegal
ArgumentException: not all nodes and buffers were consumed. nodes: [ArrowFieldNode [length=4, nullCount=0]] buffers: [ArrowBuf[9855], address:4929620864, capacity:28, ArrowBuf[9857], address:4929620928, capacity:1, ArrowBuf[9859], address:4929620992, capacity:32] [info] at org.apache.comet.shaded.arrow.vector.VectorLoader.load(VectorLoader.java:89)
[info] at org.apache.comet.shaded.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:220)
[info] at org.apache.comet.shaded.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:161)
[info] at org.apache.comet.vector.StreamReader.nextBatch(StreamReader.scala:41)
Describe the bug
Found this bug when fixing Spark SQL test failures for #651.
We use Java Arrow stream reader to reads Arrow-format shuffle data. But if there is struct vector with duplicate field name, Java Arrow will throw the following error:
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response