Skip to content

Middlemanager fails startup due to corrupt task files #7886

@xvrl

Description

@xvrl

Affected Version

0.15.0-SNAPSHOT (git sha d99f77a)

Description

A middle-manager shutdown may leave empty task files in
${druid.indexer.task.baseTaskDir}/completedTasks/. This may be an issue on it's own, but it could also happen for reasons beyond our control.

Those empty (corrupt) files cause the middlemanager to fail on a subsequent startup, due to https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/worker/WorkerTaskManager.java#L430 re-throwingh a JsonProcessingException

The exception message also looks incorrect, saying the files would be ignored, but instead it causes the entire startup sequence to interrupt, requiring user intervention to remove corrupt files in order to resume startup.

java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_212]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_212]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
	at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:443) ~[druid-core-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:339) ~[druid-core-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:140) ~[druid-core-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:106) [druid-services-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:57) [druid-services-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.cli.Main.main(Main.java:118) [druid-services-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
Caused by: org.apache.druid.java.util.common.ISE: Failed to read completed task from disk at [/var/druid/task/completedTasks/index_kafka_metrics_opencensus_ebffe52caf33afb_lcgaddpn]. Ignored.
	at org.apache.druid.indexing.worker.WorkerTaskManager.initCompletedTasks(WorkerTaskManager.java:430) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.indexing.worker.WorkerTaskManager.start(WorkerTaskManager.java:135) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.indexing.worker.WorkerTaskMonitor.start(WorkerTaskMonitor.java:94) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	... 10 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
 at [Source: var/druid/task/completedTasks/index_kafka_<datasource>_ebffe52caf33afb_lcgaddpn; line: 1, column: 1]
	at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) ~[jackson-databind-2.6.7.jar:2.6.7]
	at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3781) ~[jackson-databind-2.6.7.jar:2.6.7]
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3721) ~[jackson-databind-2.6.7.jar:2.6.7]
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2620) ~[jackson-databind-2.6.7.jar:2.6.7]
	at org.apache.druid.indexing.worker.WorkerTaskManager.initCompletedTasks(WorkerTaskManager.java:421) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.indexing.worker.WorkerTaskManager.start(WorkerTaskManager.java:135) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
	at org.apache.druid.indexing.worker.WorkerTaskMonitor.start(WorkerTaskMonitor.java:94) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions