Affected Version
0.15.0-SNAPSHOT (git sha d99f77a)
Description
A middle-manager shutdown may leave empty task files in
${druid.indexer.task.baseTaskDir}/completedTasks/. This may be an issue on it's own, but it could also happen for reasons beyond our control.
Those empty (corrupt) files cause the middlemanager to fail on a subsequent startup, due to https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/worker/WorkerTaskManager.java#L430 re-throwingh a JsonProcessingException
The exception message also looks incorrect, saying the files would be ignored, but instead it causes the entire startup sequence to interrupt, requiring user intervention to remove corrupt files in order to resume startup.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:443) ~[druid-core-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:339) ~[druid-core-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:140) ~[druid-core-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:106) [druid-services-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:57) [druid-services-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.cli.Main.main(Main.java:118) [druid-services-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
Caused by: org.apache.druid.java.util.common.ISE: Failed to read completed task from disk at [/var/druid/task/completedTasks/index_kafka_metrics_opencensus_ebffe52caf33afb_lcgaddpn]. Ignored.
at org.apache.druid.indexing.worker.WorkerTaskManager.initCompletedTasks(WorkerTaskManager.java:430) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.indexing.worker.WorkerTaskManager.start(WorkerTaskManager.java:135) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.indexing.worker.WorkerTaskMonitor.start(WorkerTaskMonitor.java:94) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
... 10 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
at [Source: var/druid/task/completedTasks/index_kafka_<datasource>_ebffe52caf33afb_lcgaddpn; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3781) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3721) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2620) ~[jackson-databind-2.6.7.jar:2.6.7]
at org.apache.druid.indexing.worker.WorkerTaskManager.initCompletedTasks(WorkerTaskManager.java:421) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.indexing.worker.WorkerTaskManager.start(WorkerTaskManager.java:135) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.indexing.worker.WorkerTaskMonitor.start(WorkerTaskMonitor.java:94) ~[druid-indexing-service-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
Affected Version
0.15.0-SNAPSHOT (git sha d99f77a)
Description
A middle-manager shutdown may leave empty task files in
${druid.indexer.task.baseTaskDir}/completedTasks/. This may be an issue on it's own, but it could also happen for reasons beyond our control.Those empty (corrupt) files cause the middlemanager to fail on a subsequent startup, due to https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/worker/WorkerTaskManager.java#L430 re-throwingh a JsonProcessingException
The exception message also looks incorrect, saying the files would be ignored, but instead it causes the entire startup sequence to interrupt, requiring user intervention to remove corrupt files in order to resume startup.