Skip to content

Balancer can work incorrectly in case of slow historicals or large number of segments to move #10193

@a2l007

Description

@a2l007

When a cluster has thousands of segments to load, it is possible that historicals take more time to process all the requests showing up in the loadQueue. Consequently, the coordinator timeout druid.coordinator.load.timeout expires and after that the segment request is removed from the CuratorLoadQueuePeon object for this historical and the balancer attempts to balance this segment again.
However, since the zookeeper path for this segment still exists on the earlier historical, this historical will eventually load this segment even though the coordinator has decided to place the segment elsewhere.
This behavior can cause the balancer to make incorrect segment placement decisions since ServerHolder.getSizeUsed() will now be different from the actual used size on that historical. This is more evident when using the disk Normalized balancer strategy as the usageRatio plays an important role in that strategy.

Couple of ways to mitigate this problem would be to increase the druid.coordinator.load.timeout property or lower the maxSegmentsInNodeLoadingQueue so that the historicals aren't flooded with loadqueue entries when there are a large number of segments to move.

Investigating into this issue, it looks like CuratorLoadQueuePeon shouldn't be deleting the load/queue entries in the event of a load timeout, so that the balancer is aware of this activity.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions