-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Avoid infinite bundle unloading #20822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Avoid infinite bundle unloading #20822
Conversation
f953f89 to
bcbc1af
Compare
codelipenghui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title should be a fix for the infinite bundle unloading.
If I understand correctly, a breaking change will be introduced to the behavior of loadBalancerDistributeBundlesEvenlyEnabled because the current solution just filters out the previous owner, then the bundle can be moved to other brokers. I think the correct solution should be to skip the bundle unloading if the next owner is the same one as the current owner.
IMO, the main point is we don't care about the next owner broker when performing the bundle unloading. But the bundle unloading and the owner selection are based on the same load data. So is it worth checking the next owner at the unloading stage? @Demogorgon314 @heesung-sn
codelipenghui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I noticed no one put data to LoadData.brokerData and LoadData.bundleData except for the test. Is it still useful? or maybe I missed something.
@codelipenghui Yes, it is still in use. The |
ef68011 to
bb2ed0d
Compare
| pulsar.getAdminClient().namespaces() | ||
| .unloadNamespaceBundle(namespaceName, bundleRange, destBroker.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The destinationBroker parameter is added in this PR: #18663
...r-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Show resolved
Hide resolved
...r-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Outdated
Show resolved
Hide resolved
...r-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Outdated
Show resolved
Hide resolved
...r-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Show resolved
Hide resolved
| try { | ||
| pulsar.getAdminClient().namespaces().unloadNamespaceBundle(namespaceName, bundleRange); | ||
| pulsar.getAdminClient().namespaces() | ||
| .unloadNamespaceBundle(namespaceName, bundleRange, destBroker.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder, The destBroker is only available after 2.11.0. We are not able to cherry-pick to branch-2.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we don't need to ensure the bundle will actually go to the dest broker? We just want to avoid the infinite loop as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have many brokers, it still has a chance to select the same broker if we don't pass the destBroker parameter. I wonder if this is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it will only sometimes go to the same broker, not always. In 2.10, we don't have this API to support a strict owner assignment at the loading stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. After this PR is merged, I will push a PR for branch 2.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2.10 we don't have a good way for that. What will happen if the same broker is still selected after unloading? Will it retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BewareMyPower It will be selected to unload again. And hope next time it will be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. After this PR is merged, I will push a PR for branch 2.10.
To be clear, how are you going to push this fix to 2.10? According to @codelipenghui, the destBroker option in the unload admin is only available since 2.11.
...r-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Outdated
Show resolved
Hide resolved
...r-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Outdated
Show resolved
Hide resolved
…ce/impl/ModularLoadManagerImpl.java Co-authored-by: Yunze Xu <xyzinfernity@163.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20822 +/- ##
============================================
- Coverage 72.97% 72.64% -0.34%
- Complexity 32157 32227 +70
============================================
Files 1868 1856 -12
Lines 139164 139139 -25
Branches 15314 15329 +15
============================================
- Hits 101555 101077 -478
- Misses 29562 29999 +437
- Partials 8047 8063 +16
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@Demogorgon314 Could you please help cherry-pick the PR into release branches? |
(cherry picked from commit 3f63768)
(cherry picked from commit 3f63768)
(cherry picked from commit 3f63768)
|
Though this is another great improvement in ModularLoadManager, this is a behavior change too.
It is great that this logic can block unloading bundles to the same broker. However, could infinite unloading still happen if unloading bounces to multiple brokers? |
|
@heesung-sn
Yes, It's good to add a doc to notify the users.
I think we can leave config as is, since unloading to the same broker is not make sense.
Yes, this can happen, not sure if we have a good way to prevent it. |
|
cherry-picked branch-2.11 by #20878 |
|
Why not call |
| final ConcurrentOpenHashMap<String, ConcurrentOpenHashSet<String>> namespaceToBundleRange = | ||
| brokerToNamespaceToBundleRange | ||
| .computeIfAbsent(broker, | ||
| k -> ConcurrentOpenHashMap.<String, | ||
| ConcurrentOpenHashSet<String>>newBuilder() | ||
| .build()); | ||
| synchronized (namespaceToBundleRange) { | ||
| namespaceToBundleRange.computeIfAbsent(namespaceName, | ||
| k -> ConcurrentOpenHashSet.<String>newBuilder().build()) | ||
| .add(bundleRange); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The synchronized block here seems meaningless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... not sure why used synchronized block here before.
Motivation
The bundle ownership assignment logic doesn't know the previous unloaded broker when unloading happens. It might assign the bundle to the same broker. In this case, it might cause an infinite bundle unloading loop.
To resolve this issue, we can check the destination broker when doing the
doLoadSheddingstage. When the destination broker is the same as the current owner broker, we can skip this unload, and if it is different, we set the new owner in this stage.Modifications
Verifying this change
See the
ModularLoadManagerImplTest#testLoadSheddingunit test.Documentation
docdoc-requireddoc-not-neededdoc-complete