Skip to content

Fixed equal distribution strategy when exist disabled middleManagers.#2472

Merged
drcrallen merged 1 commit intoapache:masterfrom
redBorder:master
Feb 17, 2016
Merged

Fixed equal distribution strategy when exist disabled middleManagers.#2472
drcrallen merged 1 commit intoapache:masterfrom
redBorder:master

Conversation

@andresgomezfrr
Copy link
Copy Markdown
Contributor

Hi all,

This week working on our cluster we detected a little issue. When we have for example 5 middleManagers and all of them have the same currentCapacityUsed and some of them are disabled, maybe the new indexing tasks can't start.

MiddleManager Capacity CurrentCapacityUsed Status
AAA 10 5 Disabled
BBB 10 5 Enabled
CCC 10 5 Enabled
DDD 10 5 Enabled

On this scenario, maybe the current equal distribution return an unique middleManager if the middleManager that is returned is disabled. The new tasks can't start and go to pending status.

To fix the issue I changed the logic inside equal distribution, now sort the middleManager using the currentCapacityUsed but if two middleManagers have the same currentCapacityUsed, them are sorted using the IP address.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Feb 15, 2016

@andresgomezfrr hoe does sorting by ip will fix the issue ?

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Feb 15, 2016

thought that this code will filter out the disable host since it is less than the minWorkerVer

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

Let me try to explain it!

Is true that this code will filter out the disable host using the minWorkerVer. The problem is that when sortedWorkers are sorted using the method compare if two or more worker have the same getCurrCapacityUsed(), the Comparator only return one of them. So on the scenario that I explained sometimes the only worker that the Comparator return is the AAA and it is disabled.

On my pull request, if two middleManager have the same getCurrCapacityUsed, I will sort them using the IP address (I use the middleManager IP address because I suppose that it is unique inside druid cluster), doing that all the workers are returned on sortedWorkers and later, them will filter out using the minWorkerVer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how about comparation = zkWorker.getWorker().getVersion().compareTo(zkWorker2.getWorker().getVersion());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FillCapacityWorkerSelectStrategy compares workers with capacity(asc) + host base. Two comparators could be merged into one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both solutions works too. Other posible is compare second comparation using version or hostname. Yeah the solution on FillCapacityWorkerSelectStrategy is similar that i did. I'm going to change IP Address to Host to do two comparator more similar.

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

I updated my branch to the current druid master branch.

@nishantmonu51
Copy link
Copy Markdown
Member

👍 , nice catch

@nishantmonu51 nishantmonu51 added this to the 0.9.1 milestone Feb 16, 2016
@drcrallen
Copy link
Copy Markdown
Contributor

@andresgomezfrr Can you add a comment in the code about why the host sorting is needed?

The assumption that only one middle manager exists per ip address is one of the legacy assumptions that breaks in a lot of ways (like containerized services) and the "correct" assumption at the moment is that a middle manager is unique in its host/port combination.

Since the legacy assumption of one worker per ip is "ok" for now, can you please clarify why it was added so that future developers can know if it is an essential assumption or simply a work-around for an issue?

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

Yeah, you are right @drcrallen, the "correct" assumption is host/port combination. I can see that
zkWorker.getWorker().getHost()
is made using host and port (worker creation), so I suppose that this is fine.

I added some comments explaining the reason because is needed the host sorting. What do you think about them?

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Feb 16, 2016

@andresgomezfrr as mention @drcrallen using host is not the best option for the long term, why not using the version as i suggested before ?

zkWorker.getWorker().getVersion().compareTo(zkWorker2.getWorker().getVersion());

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

@drcrallen told that the "correct" assumption at the moment is that a middle manager is unique in its host/port combination. and the problem is that the version can be the same across all middle managers, isn't?

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Feb 16, 2016

@andresgomezfrr if one of the woker is disable the version is empty string so by construction it will be different from other worker unless both are disable.

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

ohh, it's true @b-slim ! so I think that this is other possible 👍 so .. I suppose that we can change FillCapacityWorkerSelectStrategy too What do you think @drcrallen ?

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Feb 16, 2016

@andresgomezfrr sure you can use it in both cases !

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Feb 16, 2016

@andresgomezfrr thanks for the contrib 👍

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

you're welcome! 😄

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Feb 16, 2016

@andresgomezfrr i am not sure if you have filled the CLA, general question to @fjy where we can check who is on the list of CLA ?

@drcrallen
Copy link
Copy Markdown
Contributor

@b-slim I think only PMC has access to that list right now. For privacy reasons it might stay that way for right now. I'd like to get some sort of github hook in so manual checking isn't needed. But for now bugging one of the PMC members is the best we have. @andresgomezfrr is not on the list so please do fill out the CLA at the link provided.

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

I have done the CLA. But @drcrallen , I think that my company did the Corporate CLA.

@drcrallen
Copy link
Copy Markdown
Contributor

@andresgomezfrr individual CLA found. Please squash and I think this should be good to go.

@andresgomezfrr
Copy link
Copy Markdown
Contributor Author

Ready!! 😄

@drcrallen
Copy link
Copy Markdown
Contributor

Cool, thanks for the contrib! 👍

drcrallen added a commit that referenced this pull request Feb 17, 2016
Fixed equal distribution strategy when exist disabled middleManagers.
@drcrallen drcrallen merged commit 5a69fe8 into apache:master Feb 17, 2016
@drcrallen drcrallen modified the milestones: 0.9.0, 0.9.1 Feb 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants