Skip to content

Druid Historical may never load specific segment again which is failed loading before. #10649

@zhangyue19921010

Description

@zhangyue19921010

Affected Version

All, using druid.coordinator.loadqueuepeon.type=http

Description

When Historical node loading a segment failed at first time, It may not load this segment again until LRU cache is Invalidation or Stream Index Task is failed because of completionTimeout limitation.

Here is coordinator logs :

2020-12-07T06:49:17,343 ERROR [Coordinator-Exec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083] Failed segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12] request[SegmentChangeRequestLoad] with cause [Stopping load queue peon.].

...
2020-12-07T06:52:49,509 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'primary' for segment [xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12] to server [druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083] in tier [_default_tier]

....

2020-12-07T06:52:53,515 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083] Failed segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12] request[SegmentChangeRequestLoad] with cause [Exception loading segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12]].

...

2020-12-07T06:53:24,647 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'primary' for segment [xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12] to server [druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083] in tier [_default_tier]

...

2020-12-07T06:53:24,652 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083] Failed segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12] request[SegmentChangeRequestLoad] with cause [Exception loading segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12]].

...

2020-12-07T06:53:59,732 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'primary' for segment [xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12] to server [druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083] in tier [_default_tier]

...

2020-12-07T06:53:59,737 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083] Failed segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12] request[SegmentChangeRequestLoad] with cause [Exception loading segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12]].

...

Here is Historical logs :

2020-12-07T06:52:53,393 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.storage.s3.S3DataSegmentPuller - Loaded 67610584 bytes from [CloudObjectLocation{bucket='pqm-druid-dev', path='rtstorage/segments/xxxx__load__segment__test/2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z/2020-12-07T05:39:35.003Z/13/affbed9a-c609-42f7-9c6a-6089ef5efac5/index.zip'}] to [/var/druid/segment-cache/xxxx__load__segment__test/2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z/2020-12-07T05:39:35.003Z/13]
2020-12-07T06:52:53,437 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_13] at existing path[/druid/segments/druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083/druid-dev-8-historical-0.druid-dev-8-historical.druid-dev-8.svc.cluster.local:8083_historical__default_tier_2020-12-07T06:52:52.295Z_f39ed4961cac496898fdbcacb6e922ed1693]
2020-12-07T06:52:53,447 INFO [SimpleDataSegmentChangeHandler-1] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12
2020-12-07T06:52:53,507 WARN [SimpleDataSegmentChangeHandler-1] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12]
2020-12-07T06:52:53,507 INFO [SimpleDataSegmentChangeHandler-1] org.apache.druid.server.SegmentManager - Told to delete a queryable on dataSource[xxxx__load__segment__test] for interval[2020-12-07T03:00:00.000Z/2020-12-07T04:00:00.000Z] and version[2020-12-07T05:39:35.003Z] that I don't have.
2020-12-07T06:52:53,507 INFO [SimpleDataSegmentChangeHandler-1] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/var/druid/segment-cache/xxxx__load__segment__test/2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z/2020-12-07T05:39:35.003Z/12]
2020-12-07T06:52:53,509 WARN [SimpleDataSegmentChangeHandler-1] org.apache.druid.segment.loading.StorageLocation - SegmentDir[/var/druid/segment-cache/xxxx__load__segment__test/2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z/2020-12-07T05:39:35.003Z/12] is not found under this location[/var/druid/segment-cache]
2020-12-07T06:52:53,509 WARN [SimpleDataSegmentChangeHandler-1] org.apache.druid.server.coordination.SegmentLoadDropHandler - Unable to delete segmentInfoCacheFile[/var/druid/segment-cache/info_dir/xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12]
2020-12-07T06:52:53,512 ERROR [SimpleDataSegmentChangeHandler-1] org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: xxxxx
org.apache.druid.segment.loading.SegmentLoadingException: Exception loading segment[xxxx__load__segment__test_2020-12-07T03:00:00.000Z_2020-12-07T04:00:00.000Z_2020-12-07T05:39:35.003Z_12]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:263) ~[druid-server-0.17.1.jar:0.17.1]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:307) ~[druid-server-0.17.1.jar:0.17.1]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler$1.lambda$addSegment$1(SegmentLoadDropHandler.java:513) ~[druid-server-0.17.1.jar:0.17.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_221]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_221]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_221]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_221]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_221]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_221]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]
Caused by: java.lang.NullPointerException
	at org.apache.druid.common.utils.SerializerUtils.readString(SerializerUtils.java:61) ~[druid-core-0.17.1.jar:0.17.1]
	at org.apache.druid.segment.IndexIO$V9IndexLoader.deserializeColumn(IndexIO.java:677) ~[druid-processing-0.17.1.jar:0.17.1]
	at org.apache.druid.segment.IndexIO$V9IndexLoader.load(IndexIO.java:617) ~[druid-processing-0.17.1.jar:0.17.1]
	at org.apache.druid.segment.IndexIO.loadIndex(IndexIO.java:194) ~[druid-processing-0.17.1.jar:0.17.1]
	at org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory.factorize(MMappedQueryableSegmentizerFactory.java:48) ~[druid-processing-0.17.1.jar:0.17.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:150) ~[druid-server-0.17.1.jar:0.17.1]
	at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:198) ~[druid-server-0.17.1.jar:0.17.1]
	at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:157) ~[druid-server-0.17.1.jar:0.17.1]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:259) ~[druid-server-0.17.1.jar:0.17.1]
	... 9 more
2020-12-07T06:52:53,518 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment xxxx__load__segment__test_2020-12-07T02:00:00.000Z_2020-12-07T03:00:00.000Z_2020-12-07T02:16:46.090Z_17
2020-12-07T06:52:53,519 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[CloudObjectLocation{bucket='pqm-druid-dev', path='rtstorage/segments/xxxx__load__segment__test/2020-12-07T02:00:00.000Z_2020-12-07T03:00:00.000Z/2020-12-07T02:16:46.090Z/17/587cf37e-73ca-4628-8c65-d90e290b65fc/index.zip'}] to outDir[/var/druid/segment-cache/xxxx__load__segment__test/2020-12-07T02:00:00.000Z_2020-12-07T03:00:00.000Z/2020-12-07T02:16:46.090Z/17]

Here is Kafka ingest tasks log
Keep Still waiting for Handoff for Segments and failed.

2020-12-07T05:54:06,004 INFO [[index_kafka_xxxx__load__segment__test_ed12482207579a5_mkdnhpfh]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Dropped segment[xxxx__load__segment__test_2020-12-07T02:00:00.000Z_2020-12-07T03:00:00.000Z_2020-12-07T02:16:46.090Z_28].
2020-12-07T05:55:05,951 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T05:56:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T05:57:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T05:58:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T05:59:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:00:05,947 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:01:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:02:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:03:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:04:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:05:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:06:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:07:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:08:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:09:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:10:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:11:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:12:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:13:05,949 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:14:05,949 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:15:05,950 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:16:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:17:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:18:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:19:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:20:05,948 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-12-07T02:00:00.000Z/2020-12-07T03:00:00.000Z, version='2020-12-07T02:16:46.090Z', partitionNumber=17}]]
2020-12-07T06:20:27,386 INFO [parent-monitor-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Triggering JVM shutdown.
2020-12-07T06:20:27,387 INFO [Thread-125] org.apache.druid.cli.CliPeon - Running shutdown hook
2020-12-07T06:20:27,387 INFO [Thread-125] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2020-12-07T06:20:27,388 INFO [Thread-125] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/announcements/druid-dev-8-middle-manager-medium-0.druid-dev-8-middle-manager-medium.druid-dev-8.svc.cluster.local:8100]
2020-12-07T06:20:27,398 INFO [Thread-125] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/segments/druid-dev-8-middle-manager-medium-0.druid-dev-8-middle-manager-medium.druid-dev-8.svc.cluster.local:8100/druid-dev-8-middle-manager-medium-0.druid-dev-8-middle-manager-medium.druid-dev-8.svc.cluster.local:8100_indexer-executor__default_tier_2020-12-07T04:50:06.819Z_6a488817791a4d8498ae15fedafe66dd0]
2020-12-07T06:20:27,400 INFO [Thread-125] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/listeners/lookups/__default/http:druid-dev-8-middle-manager-medium-0.druid-dev-8-middle-manager-medium.druid-dev-8.svc.cluster.local:8100]
2020-12-07T06:20:27,401 INFO [Thread-125] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/internal-discovery/PEON/druid-dev-8-middle-manager-medium-0.druid-dev-8-middle-manager-medium.druid-dev-8.svc.cluster.local:8100]
2020-12-07T06:20:27,403 INFO [Thread-125] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2020-12-07T06:20:27,407 INFO [Thread-125] org.eclipse.jetty.server.AbstractConnector - Stopped 

Here is what happens:
Hisotircal is download and unzip a segment but crashed and segmnet is damaged.
Historical re-started(lazy on start false).
Historical loads that segment again but failed because that segment is damaged.
Coordinator keep letting historical load this segment again and again and again.
Historical always responses failure loading current segment based on LRU cache but never try it again.
Ingest Task hangs and failed after completionTimeout.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions