Skip to content

Coordinators spinning in balancer #5981

@gianm

Description

@gianm

When rolling out the 0.12.2 branch to our test clusters, we noticed symptoms that suggest #5927 can hork up coordinators. They report a lot of time spent in these stack traces, and one of them has spent hours now without finishing a run:

"Coordinator-Exec--0" #120 daemon prio=5 os_prio=0 tid=0x00007f06802d4000 nid=0x20f7 runnable [0x00007f066a7d0000]
   java.lang.Thread.State: RUNNABLE
	at io.druid.server.coordinator.ReservoirSegmentSampler.getRandomBalancerSegmentHolder(ReservoirSegmentSampler.java:46)
	at io.druid.server.coordinator.CostBalancerStrategy.pickSegmentToMove(CostBalancerStrategy.java:224)
	at io.druid.server.coordinator.helper.DruidCoordinatorBalancer.balanceTier(DruidCoordinatorBalancer.java:128)
	at io.druid.server.coordinator.helper.DruidCoordinatorBalancer.lambda$run$0(DruidCoordinatorBalancer.java:84)
	at io.druid.server.coordinator.helper.DruidCoordinatorBalancer$$Lambda$52/955068914.accept(Unknown Source)
	at java.util.HashMap.forEach(HashMap.java:1289)
	at io.druid.server.coordinator.helper.DruidCoordinatorBalancer.run(DruidCoordinatorBalancer.java:83)
	at io.druid.server.coordinator.DruidCoordinator$CoordinatorRunnable.run(DruidCoordinator.java:677)
	at io.druid.server.coordinator.DruidCoordinator$2.call(DruidCoordinator.java:571)
	at io.druid.server.coordinator.DruidCoordinator$2.call(DruidCoordinator.java:564)
	at io.druid.java.util.common.concurrent.ScheduledExecutors$2.run(ScheduledExecutors.java:102)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

So for 0.12.2 we should either revert this patch, or try to achieve the same thing in some other way.

/cc @clintropolis

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions