Skip to content

[Bug]: Optimizing Status is incorrectly set #951

@zhongqishang

Description

@zhongqishang

What happened?

When flink encounters the following exception, the Optimizing Status of the AMS does not change, which means that no further optimizations are made.

2022-12-19 16:59:35,458 ERROR com.netease.arctic.table.TableMetaStore                      [] - run with catalog ugi request failed. UGI is hive/hiveserver.ld-hadoop.com@LD-HADOOP.COM (auth:KERBEROS)
java.lang.IllegalStateException: Cannot return length while appending to an open file.
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.iceberg.orc.OrcFileAppender.length(OrcFileAppender.java:102) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.iceberg.io.DataWriter.length(DataWriter.java:76) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.operator.executor.IcebergExecutor.optimizeDataFiles(IcebergExecutor.java:156) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.table.TableMetaStore.lambda$doAs$0(TableMetaStore.java:315) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141]
	at javax.security.auth.Subject.doAs(Subject.java:360) [?:1.8.0_141]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.table.TableMetaStore.doAs(TableMetaStore.java:313) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.io.ArcticHadoopFileIO.doAs(ArcticHadoopFileIO.java:177) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.operator.executor.IcebergExecutor.execute(IcebergExecutor.java:74) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.operator.BaseTaskExecutor.execute(BaseTaskExecutor.java:137) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.flink.FlinkExecuteFunction.processElement(FlinkExecuteFunction.java:74) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.flink.FlinkExecuteFunction.processElement(FlinkExecuteFunction.java:40) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:418) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:513) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at com.netease.arctic.optimizer.flink.FlinkConsumer.run(FlinkConsumer.java:54) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:323) [flink-dist_2.12-1.14.6.jar:1.14.6]
2022-12-19 16:59:35,459 ERROR com.netease.arctic.optimizer.operator.BaseTaskExecutor       [] - failed to execute task OptimizeTaskId(type:Minor, traceId:2196e765-6109-4f8f-98a8-cb01bdf8f8d3)
java.lang.IllegalStateException: Cannot return length while appending to an open file.
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.iceberg.orc.OrcFileAppender.length(OrcFileAppender.java:102) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.iceberg.io.DataWriter.length(DataWriter.java:76) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.operator.executor.IcebergExecutor.optimizeDataFiles(IcebergExecutor.java:156) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.table.TableMetaStore.lambda$doAs$0(TableMetaStore.java:315) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141]
	at javax.security.auth.Subject.doAs(Subject.java:360) ~[?:1.8.0_141]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.table.TableMetaStore.doAs(TableMetaStore.java:313) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.io.ArcticHadoopFileIO.doAs(ArcticHadoopFileIO.java:177) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.operator.executor.IcebergExecutor.execute(IcebergExecutor.java:74) ~[arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.operator.BaseTaskExecutor.execute(BaseTaskExecutor.java:137) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.flink.FlinkExecuteFunction.processElement(FlinkExecuteFunction.java:74) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at com.netease.arctic.optimizer.flink.FlinkExecuteFunction.processElement(FlinkExecuteFunction.java:40) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:418) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:513) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at com.netease.arctic.optimizer.flink.FlinkConsumer.run(FlinkConsumer.java:54) [arctic-optimizer-0.4.0-SNAPSHOT-jar-with-dependencies.jar:?]
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) [flink-dist_2.12-1.14.6.jar:1.14.6]
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:323) [flink-dist_2.12-1.14.6.jar:1.14.6]
2022-12-19 16:59:35,475 INFO  com.netease.arctic.optimizer.operator.BaseTaskReporter       [] - start reporting result: OptimizeTaskId(type:Minor, traceId:2196e765-6109-4f8f-98a8-cb01bdf8f8d3), table=TableIdentifier(catalog:hive_prod, database:ods_iceberg, tableName:ods_xxxxx), status=Failed, attemptId=1609883167, newFileSize=0, reportTime=0, costTime=0
2022-12-19 16:59:35,484 INFO  com.netease.arctic.optimizer.flink.FlinkReporter             [] - report success OptimizeTaskId(type:Minor, traceId:2196e765-6109-4f8f-98a8-cb01bdf8f8d3)

Affects Versions

master

What engines are you seeing the problem on?

AMS

How to reproduce

No response

Relevant log output

No response

Anything else

image

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions