Skip to content

Fix huge number of watches in zk#9172

Closed
asdf2014 wants to merge 5 commits intoapache:masterfrom
asdf2014:fix_huge_zk_watches
Closed

Fix huge number of watches in zk#9172
asdf2014 wants to merge 5 commits intoapache:masterfrom
asdf2014:fix_huge_zk_watches

Conversation

@asdf2014
Copy link
Copy Markdown
Member

Description

  • Fix huge number of watches in zk

  • Tear down nodeAnnouncer

  • Remove useless Logger and ExecutorService

  • Init CuratorListener by lambda

  • Improve explicit type

  • Using CuratorMultiTransaction instead of CuratorTransaction

  • Add @GuardedBy("toAnnounce") for toUpdate field

  • Improve docs

Related to #6683


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

* Tear down nodeAnnouncer

* Remove useless Logger and ExecutorService

* Init CuratorListener by lambda

* Improve explicit type

* Using CuratorMultiTransaction instead of CuratorTransaction

* Add @GuardedBy("toAnnounce") for toUpdate field

* Improve docs
Copy link
Copy Markdown
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also apply the same improvements to Announcer when possible, as the ones suggested for NodeAnnouncer here, as well as these improvements already applied by @kaijianding and you to this moment, in the course of #6683.

import java.util.concurrent.CopyOnWriteArrayList;

/**
* NodeAnnouncer announces single node on Zookeeper and only watches this node,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a mirroring comment in Announcer

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private final ConcurrentMap<String, NodeCache> listeners = new ConcurrentHashMap<>();
private final ConcurrentMap<String, byte[]> announcedPaths = new ConcurrentHashMap<>();
/**
* Only the one created the parent path can drop the parent path, so should remember these created parents.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment sounds confusing, shouldn't one of "parent path" occurrences in it be something else?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


started = false;

Closer closer = Closer.create();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add unannouncements, and pathsCreatedInThisAnnouncer deletion operation to the Closer too, so they all attempted in case of problems.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand this comment. These delete operations are not closable, can they be registered in Closer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like closer.register(() -> unannounce(path))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will patch this comment 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this change hasn't been applied yet?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

unannounce(announcementPath);
}

if (!pathsCreatedInThisAnnouncer.isEmpty()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract this as a method

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary? 😰

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optional

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's stay the original way 😄

}
}
catch (Exception e) {
log.debug(e, "Problem checking if the parent existed, ignoring.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the comment ", assuming it doesn't exist."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


final byte[] billy = StringUtils.toUtf8("billy");
final String testPath = "/somewhere/test2";
final String parent = ZKPaths.getPathAndNode(testPath).getPath();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* NodeAnnouncer announces single node on Zookeeper and only watches this node,
* while {@link Announcer} watches all child paths, not only this node
*/
public class NodeAnnouncer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a concurrent control flow documentation explaining how and why somebody may call announce() and update() concurrently or before start().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will improve this doc later.

Copy link
Copy Markdown
Member

@leventov leventov Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

private void createPath(String parentPath, boolean removeParentsIfCreated)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate @GuardedBy("toAnnounce")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Only the one created the parent path can drop the parent path, so should remember these created parents.
*/
private final List<String> pathsCreatedInThisAnnouncer = new CopyOnWriteArrayList<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please annotate @GuardedBy("toAnnounce")
  2. Doesn't need to be CopyOnWriteArrayList, can be a simple ArrayList.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try {
curator.create().creatingParentsIfNeeded().forPath(parentPath);
if (removeParentsIfCreated) {
pathsCreatedInThisAnnouncer.add(parentPath);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path is added pathsCreatedInThisAnnouncer regardless of whether it was actually created two lines above or already existed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, what's your concern?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then the path is not actually paths**Created**InThisAnnouncer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be resolved before merge

@pzhdfy
Copy link
Copy Markdown
Contributor

pzhdfy commented Jan 14, 2020

this is similar to #6683?

@asdf2014
Copy link
Copy Markdown
Member Author

@pzhdfy
Copy link
Copy Markdown
Contributor

pzhdfy commented Jan 14, 2020

@pzhdfy Yep, the original discussion is here: https://lists.apache.org/thread.html/r92fcfa896418b941dd4aa1eed7b60aaf5b7e2ea55137600d844ff4a4%40%3Cdev.druid.apache.org%3E

We use #6683 in our production environment, and it works great fine. The watch count decrease more than 90% .

So I raise this pr to be merged.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @asdf2014, thank you for taking up this issue! I left a couple of comments. Besides them, I would like to say, it could be better if we share some common codes between NodeAnnouncer and Announcer. But I would regard this comment as a nit and don't mind if the refactoring would have done in a followup PR.

import java.util.concurrent.ConcurrentMap;

/**
* {@link NodeAnnouncer} announces single node on Zookeeper and only watches this node,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: announces a single node.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -64,6 +67,7 @@ public class Announcer
private final ExecutorService pathChildrenCacheExecutor;

private final List<Announceable> toAnnounce = new ArrayList<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add @GuaredBy("toAnnounce") for this list too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* In case a path is added to this collection in {@link #announce} before zk is connected,
* should remember the path and do announce in {@link #start} later.
*/
private final List<Announceable> toAnnounce = new ArrayList<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add @GuaredBy("toAnnounce") for this list too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private final ConcurrentMap<String, NodeCache> listeners = new ConcurrentHashMap<>();
private final ConcurrentMap<String, byte[]> announcedPaths = new ConcurrentHashMap<>();
/**
* Only the one created the parent path can drop it, so should remember these created parents.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rephrasing such as This list is to remember all paths this node announcer has created. On {@list #stop}, the node announcer is responsible for deleting all paths in this list.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It sounds better. Also, I use {@link #stop} instead of {@list #stop}.


started = false;

Closer closer = Closer.create();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this change hasn't been applied yet?

try {
if (!Arrays.equals(oldBytes, bytes)) {
announcedPaths.put(path, bytes);
updateAnnouncement(path, bytes);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be probably worth to define the concurrency control more precisely across all variables and methods in this class. I'm not sure why listeners is a concurrentHashMap since it's guarded by toAnnounce. Similar question for announcedPaths. Seems like it's a concurrentHashMap to allow reads without locking toAnnounce. However, unannounce() updates announcedPaths without the lock on toAnnounce. Maybe they don't have to be concurrentHashMaps. Or maybe there is another way to provide a better concurrency control.

@asdf2014
Copy link
Copy Markdown
Member Author

Hi, @jihoonson. Thank you and @leventov for your comments. They are very helpful. I would prefer to raise another one for a larger refactoring, so we can resolve this issue ASAP.

@jihoonson
Copy link
Copy Markdown
Contributor

Hi, @jihoonson. Thank you and @leventov for your comments. They are very helpful. I would prefer to raise another one for a larger refactoring, so we can resolve this issue ASAP.

Ok. Please consider the latest comments from me and @leventov.

@asdf2014
Copy link
Copy Markdown
Member Author

@jihoonson Okay, no problem. Thanks also to @leventov for creating the #9244 issue.

@jihoonson
Copy link
Copy Markdown
Contributor

@asdf2014 thanks. Let me know when this PR is ready for another review.

@jihoonson
Copy link
Copy Markdown
Contributor

Hi @asdf2014, have you had a chance to address the comments from me and @leventov?

@asdf2014
Copy link
Copy Markdown
Member Author

@jihoonson Sure, I will try to continue to address these comments. Thanks for reminding.

@jihoonson
Copy link
Copy Markdown
Contributor

@asdf2014 thank you!

@asdf2014
Copy link
Copy Markdown
Member Author

@jihoonson You are welcome!

@stale
Copy link
Copy Markdown

stale Bot commented May 30, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label May 30, 2020
@stale
Copy link
Copy Markdown

stale Bot commented Jun 27, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale Bot closed this Jun 27, 2020
@jihoonson jihoonson reopened this Jun 27, 2020
@stale
Copy link
Copy Markdown

stale Bot commented Jun 27, 2020

This pull request/issue is no longer marked as stale.

@stale stale Bot removed the stale label Jun 27, 2020
@stale
Copy link
Copy Markdown

stale Bot commented Aug 29, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Aug 29, 2020
@stale
Copy link
Copy Markdown

stale Bot commented Oct 4, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale Bot closed this Oct 4, 2020
@a2l007
Copy link
Copy Markdown
Contributor

a2l007 commented Mar 22, 2021

Hi @asdf2014 , are you planning to continue working on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants