Fix Huge Number of Watches in ZooKeeper#17482
Conversation
* Tear down nodeAnnouncer * Remove useless Logger and ExecutorService * Init CuratorListener by lambda * Improve explicit type * Using CuratorMultiTransaction instead of CuratorTransaction * Add @GuardedBy("toAnnounce") for toUpdate field * Improve docs
|
Hi @kgyrtkirk, I have made changes according to your suggestions. The PR description is edited accordingly. |
|
sorry for forgotting about this ; I'll reply back today |
|
|
||
| @LifecycleStart | ||
| @Override | ||
| public void start() |
There was a problem hiding this comment.
NodeAnnouncer and PathChildrenAnnouncer share a lot of common pieces ; I wonder if it could be placed into a common abstract - or the old approach should be removed after the cache is proven to work correctly?
There was a problem hiding this comment.
I do not mind either choices, I can try and make an abstract by next week.
There was a problem hiding this comment.
Hi @kgyrtkirk, I have taken a look at making an abstract -- It is true that there's alot of shared methods, but it is not trivial to make the changes. I am considering this instead
- We give some time to use the NodeAnnouncer instead, if it proves to perform better, we can simply delete the PathChildrenCacheAnnouncer.
- If using NodeCache provides a trade-off of CPU for Memory, a PR can further work towards replacing the deprecated PathChildrenAnnouncer using CuratorCache. We can decide to make an Abstract class then, as some of the complexities (such as
listenershaving type ofConcurrentMap<String, PathChildrenCache>will be changed toConcurrentMap<String, CuratorCache>, which can be shared with NodeAnnouncer). This seems to be the direction where Apache Curator is trying to go with their caches.
| |`druid.zk.service.connectionTimeoutMs`|ZooKeeper connection timeout, in milliseconds.|`15000`| | ||
| |`druid.zk.service.compress`|Boolean flag for whether or not created Znodes should be compressed.|`true`| | ||
| |`druid.zk.service.acl`|Boolean flag for whether or not to enable ACL security for ZooKeeper. If ACL is enabled, zNode creators will have all permissions.|`false`| | ||
| |`druid.zk.service.pathChildrenCacheStrategy`|Dictates the underlying caching strategy for service announcements. Set true to let announcers to use Apache Curator's PathChildrenCache strategy, otherwise NodeCache strategy. Consider using NodeCache strategy when you are dealing with huge number of ZooKeeper watches in your cluster.|`true`| |
There was a problem hiding this comment.
I wonder why not make the NodeCache approach the default as that should work better - but retain the old approach if some issue happens?
cc: @cryptoe
There was a problem hiding this comment.
Pull Request Overview
This PR reduces the number of ZooKeeper watches by introducing a new NodeAnnouncer that leverages Curator’s NodeCache instead of the PathChildrenCache, along with refactoring the announcement classes and their dependency injections.
- Replaces Announcer with NodeAnnouncer (and conditionally with PathChildrenAnnouncer based on a config flag).
- Introduces a new ServiceAnnouncer interface, refactors tests and updates configuration accordingly.
- Updates related documentation and dependency injection modules.
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/test/java/org/apache/druid/server/coordination/coordination/BatchDataSegmentAnnouncerTest.java | Updated tests to use NodeAnnouncer and reformatted multi-line JOINER calls. |
| server/src/test/java/org/apache/druid/curator/discovery/CuratorDruidNodeAnnouncerAndDiscoveryTest.java | Replaced Announcer instantiation with NodeAnnouncer. |
| server/src/test/java/org/apache/druid/curator/announcement/PathChildrenAnnouncerTest.java | Renamed test cases and updated instantiation to PathChildrenAnnouncer. |
| server/src/test/java/org/apache/druid/curator/announcement/NodeAnnouncerTest.java | Added new tests for NodeAnnouncer feature including update and session kill cases. |
| server/src/test/java/org/apache/druid/client/client/BatchServerInventoryViewTest.java | Updated to use NodeAnnouncer and refined thread pool executor creation. |
| server/src/main/java/org/apache/druid/server/coordination/CuratorDataSegmentServerAnnouncer.java | Changed injection from Announcer to ServiceAnnouncer. |
| server/src/main/java/org/apache/druid/server/coordination/BatchDataSegmentAnnouncer.java | Updated type references and dependency injection for the new ServiceAnnouncer. |
| server/src/main/java/org/apache/druid/guice/AnnouncerModule.java | Provided alternative bindings for single-threaded and direct executor announcers. |
| server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidNodeAnnouncer.java | Updated constructor injection and logging to reflect the new ServiceAnnouncer. |
| server/src/main/java/org/apache/druid/curator/announcement/ServiceAnnouncer.java | Added new interface to abstract announcer behavior. |
| server/src/main/java/org/apache/druid/curator/announcement/PathChildrenAnnouncer.java | Refactored implementation and logging to support the PathChildrenAnnouncer functionality. |
| server/src/main/java/org/apache/druid/curator/announcement/Announceable.java | Moved the Announceable class out of the original Announcer to support reuse. |
| server/src/main/java/org/apache/druid/curator/CuratorConfig.java | Added new configuration property to switch between caching strategies. |
| docs/configuration/index.md & docs/api-reference/tasks-api.md | Updated documentation to explain and reflect the new caching strategy option. |
| Various indexing-service test files | Updated tests to use NodeAnnouncer and adjusted thread configuration for executor creation. |
|
@GWphua you've a conflict with master; I think we could merge it after that's addressed! |
|
@kgyrtkirk The conflicts have been addressed. Thanks for the heads-up! |
|
thank you @GWphua for improving on this! |
|
@GWphua What version of zk are you running? |
|
We are using 3.5.9 |
Thanks @GWphua. What about Curator version? Same as OSS? Did you have to make any ZK/Curator-related upgrades/config changes to get things to work here? |
|
Maybe the Curator/ZK version i'm using is too old: What are the issues you are facing + Did you experience problems in both PathChildrenCache + NodeCache? |
Yes, we are experiencing issues with Curator versions greater than 5.0.0, which are used in the latest OSS (v34) Druid. We have encountered problems with both versions 5.5.0 and 5.8.0. This appears to be related to CURATOR-549, which was introduced in Curator 5.0.0 and later. Curator is attempting to use a new feature that is only available in ZooKeeper server version 3.6 or higher. Since we are running ZooKeeper server version 3.5.8, this causes Druid to fail to connect to ZooKeeper. As a result, we are seeing continuous SUSPEND/RECONNECT events, with the ZooKeeper server closing the connection. Note that this issue only occurs with NodeCache, as that is where Curator attempts to use the new feature. PathChildrenCache does not have this problem because it does not use the new ZooKeeper feature code path. We suspect that this new feature Curator is trying to use is related to persistent watchers/CuratorCache. Seems related to https://lists.apache.org/thread/nl7zrzgyfp2b5wxdkrovk0yhqfto9yl7 So, I think to use NodeCache, you would either have to be on Curator <5.0.0 or running ZK Server >=3.6. Although, I think Curator has some critical fixes in 5.x.x too. |
|
@GWphua Btw....related to Huge Number of Watches, have you try setting druid.announcer.skipSegmentAnnouncementOnZk to true and use http for segment discovery (druid.serverview.type=http). Http for segment discovery has been the default since Druid v25 (#13592 (comment)) |
|
Hey @maytasm, we did not configure this setting in our clusters. However, we can take a look at whether it helps us with our use-case. Thanks 😄 |
Not too confident about this, but I feel the problem may be because of upgrading the deprecated NodeCache to CuratorCache. Should Curator remove the deprecated
|
Fixes #6647
Description
This PR is built upon #6683 and #9172 and aims to reduce the number of ZooKeeper watch counts.
Fixed Huge Number of Watches in ZooKeeper
The current
Announcer.javaleverages on Apache Curator's PathChildrenCache. In its present form, the announcement mechanism watches the immediate parent of the specified path. This results in all child nodes under the parent path being monitored by the ZooKeeper ensemble, including sibling nodes and children of the specified path. This causes an unnecessarily large number of ZooKeeper watches to be produced.The new
NodeAnnouncer.javaclass is simplyAnnouncer.javabut leverages on NodeCache instead to watch a single node during announcement. By eliminating the watches on child nodes, this approach significantly reduces the total number of watch counts in ZooKeeper. Users can opt-in to use the newNodeAnnouncerby setting toggling the feature flagdruid.zk.service.pathChildrenCacheStrategy=false.Tests conducted on the production server also indicate a decrease in watch counts resulting from this change.
Note:
The use of the two different announcer classes simultaneously may result in a
KeeperException.NotEmptyException. This happens when two nodes are sharing the same parent, and since both announcers do not have a full picture of the nodes it is watching, the exception will be thrown when the following occurs:PathChildrenAnnouncerremoves all of its tracked children nodes.PathChildrenAnnouncertries to remove the parent node.NodeAnnounceris still watching one or more child node, the attempt byPathChildrenAnnouncerin removing the parent node will result in the exception.Documentation
NodeAnnouncer.Refactoring
AnnouncertoPathChildrenAnnouncerAnnounceableclass out ofPathChildrenAnnouncer.ServiceAnnouncerinterface to facilitate dependency injection for different flavours of caching strategies.ZKPathsUtils.javato abstract the retrieval of ZooKeeper path and ZooKeeper node.Release note
New: A new opt-in caching strategy is provided that uses a much smaller number of ZooKeeper watches for service announcement.
Key changed/added classes in this PR
Announcer.java->PathChildrenAnnouncer.javaServiceAnnouncer.javaNodeAnnouncer.javaAnnounceable.javaAnnouncerModule.javaCuratorConfig.javaDirectExecutorAnnouncer&SingleThreadedAnnouncerannotations for Guice.docs/configuration/index.md&.spellingfor docs.This PR has: