[WIP]Use Announcer for internal announcements + ClusterInforResource#2242
[WIP]Use Announcer for internal announcements + ClusterInforResource#2242guobingkun wants to merge 1 commit intoapache:masterfrom
Conversation
ba0e773 to
d79867a
Compare
|
Hi @guobingkun , is there a proposal related to this? At first glance this seems to be trying to make the cluster have some level of self-awareness. This is a non-trivial thing because there isn't really a solid concept or delineation between "service" and "capacities". To me it would make more sense to have a host:port combination announce itself, maybe with some predefined set of capacities, but then have any state information be accessible through endpoints exposed at the host:port, with API guaranteed by what type of capability the service has announced. I'm specifically cautious about having the Announcer store state. |
d79867a to
69146fc
Compare
|
@guobingkun I had put together some suggestions here #2040 it doesn't look any of those have been taken into account. |
ab6a2af to
f214473
Compare
|
Had a chat about this PR between @nishantmonu51 @guobingkun and myself. Regarding ONLY the conflicts between #2286 and this PR, the key thing #2286 needs is a way to accomplish this: https://github.com/druid-io/druid/pull/2286/files#diff-610c15be877b04e26b818259a34300e4R63 where something can be announced by some "key" and all things which have announced themselves as some "key" can be found by a discovery service of some kind. |
f214473 to
6d01f4f
Compare
|
I've spoken with @guobingkun about this in private a few times and I wanted to make this part of the conversation more open: One of my concerns with this PR is uncertainty of how useful it is to look at a group of nodes defined by a label determined in their runtime.properties. For example: it is useful for a human operator to be able to discover the known nodes by whatever their service name is defined in the runtime.properties... but those kinds of settings are not terribly helpful to a machine in the cluster. Likewise knowing what Cli was used to launch the node is also only moderately helpful since there could be any number of modules which extend the basic functionality. As such, there are two "classes" of service discovery which seem helpful:
An example of this is an internal use case where we have one group of brokers which handles fast queries with prompt expectations on query time, and another group of brokers which handle very long-running queries with longer query time expectations. They are both run via CliBroker, but have different service names, different sets of properties, and serve very different purposes. In such a case the answer to the question of "What are the brokers in my cluster?" becomes unclear, and you have to look at WHY you need to know that information. I caution against implementation questions like "What are the brokers in my cluster?" instead favoring questions like "What nodes in my cluster serve XXX purpose?". The difference I see between this PR and #2286 is that #2286 attempts to solve the following: Given that there exist cluster wide resources identified by a unique ID, how do you provide a general framework around being able to post / get / delete those resources across the node, with the intended use case being something which is managing cluster state doing the post / get / delete requests. The overlap is that #2286 relies on "service discovery" to answer the "What nodes in my cluster serve XXX purpose?" So as long as this PR has a mechanism in the DruidServiceDiscovery to answer that question, a drop-in replacement for the Curator ServiceDiscovery used in #2286 should be trivial. |
2616b61 to
9594c27
Compare
There was a problem hiding this comment.
Can you clarify the difference between hostText, host, type, tier, name, and service?
There was a problem hiding this comment.
I think the confusion comes form the fact that we currently use CuratorServiceAnnouncer for both internal and external announcement, which writes DruidNode into Zookeeper instead of DruidServerMetadata, though it should've written DruidServerMetadata since the later one is intended to contain all the necessary information.
The reason I have to add hostText, port and service is because those information are in DruidNode but not in DruidServerMetadata (though it should've been). I think in the future we could deprecate host since it's basically the concatenation of hostText + port.
9594c27 to
dcf1729
Compare
c1cc524 to
4126387
Compare
d84e017 to
bcdaff8
Compare
bcdaff8 to
7c30a71
Compare
7c30a71 to
938499c
Compare
|
@himanshug @cheddar have you guys reviewed this? |
|
@fjy i haven't reviewed this and wasn't really plugged in. but, from the conversation in PR and similar PRs around this I believe @drcrallen @nishantmonu51 @xvrl should possibly review this. I can take a look at it from code perspective but need to get understanding of the discussions that happened and conclusions reached. |
|
@drcrallen I introduced the "capability" concept you mentioned for realtime/historical nodes announcing their capability of serving a segment, please have a look. |
|
@guobingkun merge conflicts |
|
@guobingkun is this still WIP? also merge conflicts |
|
@fjy Yes, I will finish this up soon. |
This PR fixes #2040
(1) makes Druid nodes(Historical, Realtime, Overlord, Coordinator,MiddleManager, Broker) use Announcer for internal announcements. In Zookeeper you can find all Druid nodes' announcement under
/druid/announcement(assuming announcements path base is/druid)Now under
/druid/announcement, there'll be/druid/announcement/{type}(3) Introduce "capability" concept inside Druid. For example, historical and realtime nodes have the capability of serving segments, at startup, they will announce this capability under
/druid/announcement/capability(4) Use DruidServerDiscovery for internal Druid server discovery. This interface allows us not rely on Zookeeper to discover/distribute internal Druid server information.
(5) Allow Druid to make announcements on an external zookeeper host, which is set by
druid.zk.service.externalHost(4) adds a ClusterInfoResource to Coordinator so that you can query for Druid nodes information through Coordinator endpoint.
Example:
{ "router": [], "realtime": [], "coordinator": [ { "name": "localhost:8082", "host": "localhost:8082", "maxSize": 0, "type": "coordinator", "tier": "_default_tier", "priority": 0 }, { "name": "localhost:9082", "host": "localhost:9082", "maxSize": 0, "type": "coordinator", "tier": "_default_tier", "priority": 0 } ], "historical": [ { "name": "localhost:8081", "host": "localhost:8081", "maxSize": 50000000000, "type": "historical", "tier": "_default_tier", "priority": 0 }, { "name": "localhost:9081", "host": "localhost:9081", "maxSize": 50000000000, "type": "historical", "tier": "second_tier", "priority": 0 } ], "broker": [], "overlord": [ { "name": "localhost:8089", "host": "localhost:8089", "maxSize": 0, "type": "overlord", "tier": "_default_tier", "priority": 0 } ] }Todo: Add UTs