Skip to content

[Bug] ArrayIndexOutOfBoundsException after switching to parquet format from orc #4796

@xccui

Description

@xccui

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

b508c19

Compute Engine

Flink 1.18.1

Minimal reproduce step

Use parquet format to write data with array-row nested schemas. It worked well with the orc file format before.

What doesn't meet your expectations?

java.io.IOException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
	at org.apache.paimon.flink.sink.StoreSinkWriteImpl.prepareCommit(StoreSinkWriteImpl.java:234)
	at org.apache.paimon.flink.sink.TableWriteOperator.prepareCommit(TableWriteOperator.java:127)
	at org.apache.paimon.flink.sink.RowDataStoreWriteOperator.prepareCommit(RowDataStoreWriteOperator.java:198)
	at org.apache.paimon.flink.sink.PrepareCommitOperator.emitCommittables(PrepareCommitOperator.java:104)
	at org.apache.paimon.flink.sink.PrepareCommitOperator.endInput(PrepareCommitOperator.java:92)
	at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:96)
	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.endInput(RegularOperatorChain.java:97)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
	at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
	at org.apache.paimon.compact.CompactFutureManager.obtainCompactResult(CompactFutureManager.java:67)
	at org.apache.paimon.compact.CompactFutureManager.innerGetCompactionResult(CompactFutureManager.java:53)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactManager.getCompactionResult(MergeTreeCompactManager.java:223)
	at org.apache.paimon.mergetree.MergeTreeWriter.trySyncLatestCompaction(MergeTreeWriter.java:328)
	at org.apache.paimon.mergetree.MergeTreeWriter.prepareCommit(MergeTreeWriter.java:276)
	at org.apache.paimon.operation.AbstractFileStoreWrite.prepareCommit(AbstractFileStoreWrite.java:218)
	at org.apache.paimon.operation.MemoryFileStoreWrite.prepareCommit(MemoryFileStoreWrite.java:155)
	at org.apache.paimon.table.sink.TableWriteImpl.prepareCommit(TableWriteImpl.java:253)
	at org.apache.paimon.flink.sink.StoreSinkWriteImpl.prepareCommit(StoreSinkWriteImpl.java:229)
	... 16 more
Caused by: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
	at org.apache.paimon.reader.RecordReaderIterator.<init>(RecordReaderIterator.java:40)
	at org.apache.paimon.reader.RecordReader.toCloseableIterator(RecordReader.java:210)
	at org.apache.paimon.mergetree.compact.ChangelogMergeTreeRewriter.rewriteOrProduceChangelog(ChangelogMergeTreeRewriter.java:133)
	at org.apache.paimon.mergetree.compact.ChangelogMergeTreeRewriter.upgrade(ChangelogMergeTreeRewriter.java:200)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactTask.upgrade(MergeTreeCompactTask.java:124)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactTask.rewrite(MergeTreeCompactTask.java:146)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactTask.doCompact(MergeTreeCompactTask.java:105)
	at org.apache.paimon.compact.CompactTask.call(CompactTask.java:49)
	at org.apache.paimon.compact.CompactTask.call(CompactTask.java:34)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	... 1 more
Caused by: java.lang.ArrayIndexOutOfBoundsException

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions