Skip to content

[WIP]Use Announcer for internal announcements + ClusterInforResource#2242

Closed
guobingkun wants to merge 1 commit intoapache:masterfrom
guobingkun:announcement
Closed

[WIP]Use Announcer for internal announcements + ClusterInforResource#2242
guobingkun wants to merge 1 commit intoapache:masterfrom
guobingkun:announcement

Conversation

@guobingkun
Copy link
Copy Markdown
Contributor

This PR fixes #2040

(1) makes Druid nodes(Historical, Realtime, Overlord, Coordinator,MiddleManager, Broker) use Announcer for internal announcements. In Zookeeper you can find all Druid nodes' announcement under /druid/announcement (assuming announcements path base is /druid)

Now under /druid/announcement, there'll be

[historical, realtime, middleManager, coordinator, broker, overlord]
  • All Druid nodes announce themselves directly under /druid/announcement/{type}

(3) Introduce "capability" concept inside Druid. For example, historical and realtime nodes have the capability of serving segments, at startup, they will announce this capability under /druid/announcement/capability

(4) Use DruidServerDiscovery for internal Druid server discovery. This interface allows us not rely on Zookeeper to discover/distribute internal Druid server information.

(5) Allow Druid to make announcements on an external zookeeper host, which is set by druid.zk.service.externalHost

(4) adds a ClusterInfoResource to Coordinator so that you can query for Druid nodes information through Coordinator endpoint.

Example:

{
  "router": [],
  "realtime": [],
  "coordinator": [
    {
      "name": "localhost:8082",
      "host": "localhost:8082",
      "maxSize": 0,
      "type": "coordinator",
      "tier": "_default_tier",
      "priority": 0
    },
    {
      "name": "localhost:9082",
      "host": "localhost:9082",
      "maxSize": 0,
      "type": "coordinator",
      "tier": "_default_tier",
      "priority": 0
    }
  ],
  "historical": [
    {
      "name": "localhost:8081",
      "host": "localhost:8081",
      "maxSize": 50000000000,
      "type": "historical",
      "tier": "_default_tier",
      "priority": 0
    },
    {
      "name": "localhost:9081",
      "host": "localhost:9081",
      "maxSize": 50000000000,
      "type": "historical",
      "tier": "second_tier",
      "priority": 0
    }
  ],
  "broker": [],
  "overlord": [
    {
      "name": "localhost:8089",
      "host": "localhost:8089",
      "maxSize": 0,
      "type": "overlord",
      "tier": "_default_tier",
      "priority": 0
    }
  ]
}

Todo: Add UTs

@guobingkun guobingkun force-pushed the announcement branch 2 times, most recently from ba0e773 to d79867a Compare January 11, 2016 16:58
@drcrallen
Copy link
Copy Markdown
Contributor

Hi @guobingkun , is there a proposal related to this?

At first glance this seems to be trying to make the cluster have some level of self-awareness. This is a non-trivial thing because there isn't really a solid concept or delineation between "service" and "capacities".

To me it would make more sense to have a host:port combination announce itself, maybe with some predefined set of capacities, but then have any state information be accessible through endpoints exposed at the host:port, with API guaranteed by what type of capability the service has announced.

I'm specifically cautious about having the Announcer store state.

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Jan 11, 2016

@guobingkun I had put together some suggestions here #2040 it doesn't look any of those have been taken into account.

@guobingkun
Copy link
Copy Markdown
Contributor Author

@xvrl Sure, let me check #2040

@guobingkun guobingkun changed the title Use Announcer for internal announcements + ClusterInforResource [WIP]Use Announcer for internal announcements + ClusterInforResource Jan 11, 2016
@guobingkun guobingkun closed this Jan 11, 2016
@guobingkun guobingkun reopened this Jan 11, 2016
@guobingkun guobingkun force-pushed the announcement branch 2 times, most recently from ab6a2af to f214473 Compare January 19, 2016 17:38
@fjy fjy added the Discuss label Jan 19, 2016
@drcrallen
Copy link
Copy Markdown
Contributor

Had a chat about this PR between @nishantmonu51 @guobingkun and myself. Regarding ONLY the conflicts between #2286 and this PR, the key thing #2286 needs is a way to accomplish this: https://github.com/druid-io/druid/pull/2286/files#diff-610c15be877b04e26b818259a34300e4R63 where something can be announced by some "key" and all things which have announced themselves as some "key" can be found by a discovery service of some kind.

@drcrallen
Copy link
Copy Markdown
Contributor

I've spoken with @guobingkun about this in private a few times and I wanted to make this part of the conversation more open:

One of my concerns with this PR is uncertainty of how useful it is to look at a group of nodes defined by a label determined in their runtime.properties. For example: it is useful for a human operator to be able to discover the known nodes by whatever their service name is defined in the runtime.properties... but those kinds of settings are not terribly helpful to a machine in the cluster.

Likewise knowing what Cli was used to launch the node is also only moderately helpful since there could be any number of modules which extend the basic functionality.

As such, there are two "classes" of service discovery which seem helpful:

  1. Discovery of a descriptor whose mapping of nodes to descriptors is surjective, meaning each node only uses ONE description (example: service name)
  2. Discovery of a descriptor which returns a list of nodes which it describes (example: "can load deep-storage segments"). But any particular node may have any number of descriptors.

An example of this is an internal use case where we have one group of brokers which handles fast queries with prompt expectations on query time, and another group of brokers which handle very long-running queries with longer query time expectations. They are both run via CliBroker, but have different service names, different sets of properties, and serve very different purposes. In such a case the answer to the question of "What are the brokers in my cluster?" becomes unclear, and you have to look at WHY you need to know that information.

I caution against implementation questions like "What are the brokers in my cluster?" instead favoring questions like "What nodes in my cluster serve XXX purpose?".

The difference I see between this PR and #2286 is that #2286 attempts to solve the following: Given that there exist cluster wide resources identified by a unique ID, how do you provide a general framework around being able to post / get / delete those resources across the node, with the intended use case being something which is managing cluster state doing the post / get / delete requests.

The overlap is that #2286 relies on "service discovery" to answer the "What nodes in my cluster serve XXX purpose?" So as long as this PR has a mechanism in the DruidServiceDiscovery to answer that question, a drop-in replacement for the Curator ServiceDiscovery used in #2286 should be trivial.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the difference between hostText, host, type, tier, name, and service?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion comes form the fact that we currently use CuratorServiceAnnouncer for both internal and external announcement, which writes DruidNode into Zookeeper instead of DruidServerMetadata, though it should've written DruidServerMetadata since the later one is intended to contain all the necessary information.

The reason I have to add hostText, port and service is because those information are in DruidNode but not in DruidServerMetadata (though it should've been). I think in the future we could deprecate host since it's basically the concatenation of hostText + port.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@guobingkun guobingkun force-pushed the announcement branch 3 times, most recently from d84e017 to bcdaff8 Compare February 12, 2016 21:53
@guobingkun guobingkun changed the title [WIP]Use Announcer for internal announcements + ClusterInforResource Use Announcer for internal announcements + ClusterInforResource Feb 16, 2016
@fjy fjy added this to the 0.9.1 milestone Feb 18, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 18, 2016

@himanshug @cheddar have you guys reviewed this?

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 19, 2016

@himanshug @cheddar ?

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 23, 2016

@himanshug @cheddar

@himanshug
Copy link
Copy Markdown
Contributor

@fjy i haven't reviewed this and wasn't really plugged in. but, from the conversation in PR and similar PRs around this I believe @drcrallen @nishantmonu51 @xvrl should possibly review this. I can take a look at it from code perspective but need to get understanding of the discussions that happened and conclusions reached.

@guobingkun
Copy link
Copy Markdown
Contributor Author

@drcrallen I introduced the "capability" concept you mentioned for realtime/historical nodes announcing their capability of serving a segment, please have a look.
@xvrl This PR is ready to review.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 27, 2016

@guobingkun merge conflicts

@guobingkun guobingkun changed the title Use Announcer for internal announcements + ClusterInforResource [WIP]Use Announcer for internal announcements + ClusterInforResource Mar 8, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Mar 16, 2016

@guobingkun is this still WIP? also merge conflicts

@guobingkun
Copy link
Copy Markdown
Contributor Author

@fjy Yes, I will finish this up soon.

@guobingkun guobingkun closed this Mar 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Separate internal node discovery from external service discovery

5 participants