-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Optimize ThresholdShedder strategy: the low-load Broker cannot be fully utilized #12471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@lordcheng10:Thanks for your contribution. For this PR, do we need to update docs? |
No need to add documentation |
|
/pulsarbot run-failure-checks |
3 similar comments
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ThresholdShedder.java
Show resolved
Hide resolved
|
I made some changes. PTAL,thanks! @hangc0276 |
michaeljmarshall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make more sense to implement a new LoadSheddingStrategy to achieve your goals instead of updating the ThesholdShedder class. This class is only meant to unload bundles from a broker with above average (plus some threshold) usage.
Based on your PR's description, you'd like the ThresholdShedder algorithm to take under utilized brokers into special consideration. As @hangc0276 pointed out, it is not straightforward to consider these brokers because the only mechanism we have available in this class is to "unload" a bundle from a broker. Your proposed changes require the unloading of bundles from brokers that are not above the average (plus some threshold) usage, which seems like a new strategy to me.
Note that because we're using a simple average, clusters with many brokers might have a few under utilized brokers without any one broker being too much above the average utilization. An alternative strategy could give underutilized brokers more weight.
The calculated avgUsage=(80*100+10)/101=79 at this time, if the threshold is set to the default value of 10, then any broker will satisfy currentUsage <avgUsage + threshold, so that the balancing operation will not be triggered, so some load Low machines cannot be fully utilized.
Based on your example, an alternative solution is to configure a smaller threshold. Decreasing the threshold will lead to better distribution, but the trade off is that it will also lead to more frequent unloading, which has costs too.
If we do decide to update this class's strategy, we will need to document the new details in the Javadoc as well as add tests to ensure that the algorithm works correctly.
| final double currentUsage = brokerAvgResourceUsage.getOrDefault(broker, 0.0); | ||
|
|
||
| if (currentUsage < avgUsage + threshold) { | ||
| if (currentUsage > avgUsage - threshold && currentUsage < avgUsage + threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the new changes (the brokers being iterated over are all over average), there is no longer a need to update the logic for this conditional.
| } | ||
|
|
||
| //4. Calculate the percentage of traffic to be migrated by each broker in overAvgUsageBrokers; | ||
| double percentOfTrafficToOffload = ADDITIONAL_THRESHOLD_PERCENT_MARGIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we calculate this as a generic average for all brokers? The original calculation of percentOfTrafficToOffload is for a specific broker because it is used to determine which bundles to unload to decrease that broker's load to beneath the average. By using an averaged value from all brokers, we could unload too many bundles from some brokers and too few bundles from other brokers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we calculate this as a generic average for all brokers?
In the example I mentioned above, the load utilization rate of all brokers is lower than the average utilization rate. At this time, according to the original calculation method, there is no way to calculate percentoftraffictooffload.
With regard to this new change, I need to explain the following points:
- First, I define the equilibrium state as:
avgUsage - threshold < currentUsage < avgUsage + threshold;
- The sum of the traffic that can be received by all nodes lower than avgusage is the total traffic that we should unload from nodes higher than avgusage. Therefore, we only need to calculate the sum of the total traffic of all brokers higher than avgusage over brokertrafficsum, Then, sumbelowoavgtraffic / overbrokertrafficsum can get the information from each broker that exceeds avgusage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the load utilization rate of all brokers is lower than the average utilization rate.
Can you clarify this point? The average usage cannot be higher than all of the individual usages.
At this time, according to the original calculation method, there is no way to calculate percentoftraffictooffload.
The current algorithm isn't focused on calculating this value. It only unloads brokers that have utilization above a certain threshold. Moving bundles is not free, so some operators might be fine with a slightly underloaded broker as long as no one broker is too high above the average utilization.
The sum of the traffic that can be received by all nodes lower than avgusage is the total traffic that we should unload from nodes higher than avgusage. Therefore, we only need to calculate the sum of the total traffic of all brokers higher than avgusage over brokertrafficsum, Then, sumbelowoavgtraffic / overbrokertrafficsum can get the information from each broker that exceeds avgusage
I don't think this is correct. Since brokers in the overAvgUsageBrokers collection can have different usages, we need to calculate an amount that is right for each broker itself. Your algorithm might work for your example because all brokers above the average have the same usage.
I think the most productive way to show that your algorithm works as you're saying is to add tests that show how it will react to specific scenarios.
|
/pulsarbot run-failure-checks |
|
In order to facilitate understanding, I have shown the code ideas in the form of diagrams. PTAL,thanks! @hangc0276 @michaeljmarshall |
|
@lordcheng10 - instead of writing a diagram, I think it would be more productive to write tests that show how it will work and will also serve to verify your algorithm works the way your say it will. I am still of the opinion that this is a new algorithm. That does not mean you shouldn't implement it, it just means that updating the |
|
this solution would not work in heterogeneous hardware and not correct to make it default behavior. I have done some work to address this issue and need to complete the PR. |
|
also , don't change the existing implementation because there are two different approaches for load balancer a. utilize brokers fully b. distribute load uniformly across brokers. so, we should not change existing behavior and this behavior should be configurable. |
rdhabalia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current implementation is by design to fully utilize the broker. so, if needed we need to introduce new policy which can be configurable.
You are right, I will rewrite a balanced strategy class! |
You are right, I will rewrite a balanced strategy class |
|
@lordcheng10 can you please check if this PR addresses your concern #12902 |
|
Removing the |
11f4a81
|
The pr had no activity for 30 days, mark with Stale label. |

Motivation
Consider a large cluster with multiple nodes. If most of the nodes are relatively balanced, but only a few nodes have extremely low load, then according to the current logic, it may not be possible to trigger the balancing action, eg:
broke1 brokerAvgResourceUsage 80
......
broker100 brokerAvgResourceUsage 80
broker101 brokerAvgResourceUsage 10
The calculated avgUsage=(80*100+10)/101=79 at this time, if the threshold is set to the default value of 10, then any broker will satisfy currentUsage <avgUsage + threshold, so that the balancing operation will not be triggered, so some load Low machines cannot be fully utilized.
Modifications
Here I define the equilibrium state as:
avgUsage-threshold< currentUsage <avgUsage + threshold
In this way, some nodes with extremely low load will not appear.