Skip to content

[iceberg] testMergeOnReadMerge failure creating empty StructArray #2086

@andygrove

Description

@andygrove

Describe the bug

We see this failure in https://github.com/apache/datafusion-comet/actions/runs/16789537430/job/47548305180?pr=1987

TestSparkExecutorCache > testMergeOnReadMerge() > catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hive, io-impl=org.apache.iceberg.spark.TestSparkExecutorCache$CustomFileIO, default-namespace=default} FAILED
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: Lost task 0.0 in stage 16.0 (TID 20) (localhost executor driver): org.apache.comet.CometNativeException: Invalid argument error: use StructArray::try_new_with_length or StructArray::new_empty_fields to create a struct array with no fields so that the length can be set correctly
    	at org.apache.comet.Native.executePlan(Native Method)
    	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2(CometExecIterator.scala:155)
    	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2$adapted(CometExecIterator.scala:154)
    	at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
    	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:154)
    	at org.apache.comet.Tracing$.withTrace(Tracing.scala:31)
    	at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:152)
    	at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:203)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.cometcolumnartorow_nextBatch_0$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
    	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
    	at org.apache.spark.sql.execution.datasources.v2.MergeRowsExec$MergeRowIterator.hasNext(MergeRowsExec.scala:188)
    	at scala.collection.Iterator$$anon$6.hasNext(Iterator.scala:477)
    	at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
    	at org.apache.spark.sql.comet.execution.shuffle.CometUnsafeShuffleWriter.write(CometUnsafeShuffleWriter.java:218)
    	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions