Coordinator fix balancer stuck by clintropolis · Pull Request #5987 · apache/druid

clintropolis · 2018-07-09T20:35:44Z

This appears to happen when the balancer runs before the metadata manager polls the metadata database, resulting in the if statement inside of the for loop to never get satisfied. Added max iterations to give up and log about it. I was able to duplicate in local debug cluster, which is where I should've caught this in the first place, my bad.

Fixes #5981

It also fixes issues with correctly counting 'moved' and 'unmoved' segments and optimizes cost calculation by removing servers which already have a replica of a segment from being considered as a target server to move a segment to. Previously, sometimes servers which already had the segment would be selected as the 'best' destination to move the segment to, but then the move function would bail out and not do anything, incorrectly counting a segment as 'moved' but without any corresponding log.

Finally ImmutableDruidServer.equals was broken, but doesn't appear to be called anywhere other than in this change, so maybe not a big deal (most things deal with ServerHolder whose .equals
picks some properties of ImmutableDruidServer instead of calling it's .equals directly.

clintropolis · 2018-07-10T00:10:47Z

test failure appears related

clintropolis · 2018-07-11T20:16:58Z

Current failures seem unrelated

* this will fix it * filter destinations to not consider servers already serving segment * fix it * cleanup * fix opposite day in ImmutableDruidServer.equals * simplify

clintropolis added 2 commits July 9, 2018 13:30

this will fix it

6dd4533

filter destinations to not consider servers already serving segment

0253fd7

clintropolis added 4 commits July 9, 2018 17:52

fix it

b3bcfde

cleanup

a0c2714

fix opposite day in ImmutableDruidServer.equals

8e84ed0

simplify

7b509d2

jon-wei approved these changes Jul 12, 2018

View reviewed changes

jon-wei merged commit 31c2179 into apache:master Jul 12, 2018

clintropolis deleted the coordinator-move-stuck-fix branch August 6, 2018 21:25

dclim added this to the 0.13.0 milestone Oct 8, 2018

leventov mentioned this pull request Apr 12, 2019

New Coordinator segment balancing/loading algorithm #7458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coordinator fix balancer stuck#5987

Coordinator fix balancer stuck#5987
jon-wei merged 6 commits intoapache:masterfrom
clintropolis:coordinator-move-stuck-fix

clintropolis commented Jul 9, 2018 •

edited

Loading

Uh oh!

clintropolis commented Jul 10, 2018

Uh oh!

clintropolis commented Jul 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clintropolis commented Jul 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis commented Jul 10, 2018

Uh oh!

clintropolis commented Jul 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clintropolis commented Jul 9, 2018 •

edited

Loading