Optional long-polling based segment announcement via HTTP instead of Zookeeper by himanshug · Pull Request #3902 · apache/druid

himanshug · 2017-02-02T20:40:39Z

This PR introduces following configurations.

//to be used at coordinators and brokers to use an implementation of ServerInventoryView that
//syncs segments list over HTTP instead of zookeeper
druid.announcer.type=http

//to be used at historicals and realtime tasks to stop pushing segment updates to zookeeper
//false by default
druid.announcer.skipSegmentAnnouncementOnZk=true

I'm keeping those undocumented for now. We will use them on our big internal clusters first before asking users to use.

changes -
All nodes, that serve segments e.g. historicals and realtime indexing tasks, provide a "/druid-internal/v1/segments" HTTP endpoint that can incrementally provide the list of segments being served by that node, this uses Async IO so jetty threads are not held waiting. To power this endpoint, following method is added to DataSegmentAnnouncer interface.

  /**
   * Returns Future that lists the segment load/drop requests since given counter.
   */
  public ListenableFuture<SegmentChangeRequestsSnapshot> getSegmentChangesSince(SegmentChangeRequestHistory.Counter counter);

On coordinator/brokers side, HttpServerInventoryView class is introduced which is equivalent to BatchServerInventoryView but syncs the segment inventory using the mentioned HTTP endpoint. Server discovery
is still done via zookeeper. ServerInventoryView is made an interface and its old contents are moved to AbstractCuratorServerInventoryView .

kaijianding · 2017-02-03T15:21:52Z

If I understand correctly, you update segments by calling httpClient.go(url) for each server in the servers list infinitely inside a thread pool.
To my knowledge, the httpClient.go() use a connection pool which actually using ImmediateCreationResourceHolder. This resourceHolder holds an objectList whose size is configed by broker.numConnections. httpClient.go takes first connection from the head of objectList and give it back and add the connection to the end of objectList when httpClient.go() finished.
In this way all connections hold by this resourceHolder are active(or hot?), whether this is a kind of waste if the druid cluster is a not a busy cluster?

If the cluster is not a busy cluster, can a connection is in disconnected state(disconnected by historical/realtime node when over the idleTimeOut) when attempt to use this connection to get segment list? In this case a new connection is created(renew), will it cause performance issue if lots of connections need to be renew? Can always get a disconnected connection for the server if the servers list is large and numConnections is too big?

himanshug · 2017-02-03T19:05:58Z

@kaijianding thanks for taking a look. you are right about the connection usage and yes coordinator/broker would be using one connection from HttpClient per historical/realtime all the time.
On a non-busy cluster or in the worst case coordinator/broker would have to create one connection per historical/realtime every 4 mins ( default config in HttpServerInventoryViewConfig ). But, They will be created sequentially and not in parallel.
However, it turns out that this problem can be very easily fixed by changing the get/giveBack behavior of ResourcePool.ImmediateCreationResourceHolder to LIFO instead of current FIFO. It can be done by replacing https://github.com/metamx/http-client/blob/master/src/main/java/com/metamx/http/client/pool/ResourcePool.java#L237 by objectList.addFirst(object). I will do a PR to fix that.

btw, I have also made few updates in SegmentListerResource to make it async and not hold any jetty threads while in wait.

drcrallen · 2017-02-07T00:59:40Z

Related #2368

himanshug · 2017-02-08T15:00:54Z

removing the discovery of historical/realtime nodes from zookeeper is gonna relate with #2312 (as described in #2312 (comment) ) . I am looking into that and that would be done in a separate PR.

weijietong · 2017-02-24T09:16:30Z

@himanshug according to your idea, through looping asking segments from historical/realtime nodes background ,how to avoid holding stale segment infos on coordinator/brokers side if not looped in time while a query occurred ?

himanshug · 2017-02-24T16:04:48Z

@weijietong brokers/coordinators would not have stale segment info because they are running inifinite (no wait) loop to get latest segment information from historical/realtime nodes.... historical/realtime nodes "hold" the request till there is new information to provide or timeout provided in the request is reached.

pjain1 · 2017-03-10T21:55:17Z

@himanshug a general comment - it would be good to put some comments on important classes, variables and methods. For example, a high level design comment on HttpServerInventoryView, some comments on SegmentChangeRequestHistory, usage of hash in Counter, reason of passing waitingFutures in SettableFuture etc. This would make it easier to understand the code and help the person changing the code not make mistakes.

I have gone through the code at high level and it looks good me to so far.

himanshug · 2017-04-11T17:28:37Z

@pjain1 added some docs for clarification

gianm · 2017-04-11T20:44:08Z

@himanshug, I know I said on the dev sync today that I thought this should remain undocumented, but I changed my mind. I'm still ok with it only being reviewed by people from your organization, but I think it should be documented as experimental. That way other folks could help try it out too.

cheddar · 2017-04-27T22:50:57Z

It feels weird to me that this is taking a CuratorFramework. I think it would be better for this to depend on something closer to what it actually needs (i.e. something that allows for discovering nodes in the cluster). The implementation for that could remain essentially the same as what exists here, but I think it would be nice to not clutter this class with that code.

refactored server notifications in separate class .

cheddar · 2017-04-27T22:54:50Z

"segment callbacks" reads to me as if it should be called on every single segment. The actual callback being called here seems like it shouldn't be called quite that often (maybe once?). Is something named weird or am I mistaken about what's happening?

yes , it is for individual segments and once at inventory initialization. this naming and behavior is retained from CuratorBasedServerInventoryView ( called ServerInventoryView earlier)

cheddar · 2017-04-27T22:55:37Z

Why does this block on the fetch?

not needed anymore , it was needed in an older version of code ... removed now

cheddar · 2017-04-27T22:55:59Z

What if holder is null?

added check

cheddar · 2017-04-27T22:56:28Z

Why are we guarding a final?

added comment in code, this is to ensure consistent segment state stored in DruidServer and counter managed.

cheddar · 2017-04-27T23:34:01Z

This might be pre-mature optimization, but I worry about how this data structure is being used. One option would be to view the list as a revolving buffer and essentially maintain an index of the oldest "valid" item. You can replace and increment that on add and start your search from there wrapping back around.

added a CircularBuffer impl instead of using list as a circular buffer.

cheddar · 2017-04-27T23:35:14Z

Please include the actual values that came in here, just the message as it exists won't really help much with debugging if it actually gets thrown.

cheddar · 2017-04-27T23:37:36Z

This is an optimization, maybe not so important, but I think you could actually compute the index into the array with some maths and the counters.

not really and also as you said not so important.

cheddar · 2017-04-27T23:38:19Z

Why are we removing the isAnnounced check?

updated BatchDataSegmentAnnouncer.announce(..) to always do the check so that it gets done in other places as well.

cheddar · 2017-04-27T23:39:28Z

If you do what I wrote in a previous comment and separate the lookup management into a different implementation of DataSegmentAnnouncer, then have that concrete type injected here instead of the interface (if you mark it @Nullable Guice should be fine even if it's not bound because the http thing isn't being used)

this now directly uses BatchDataSegmentAnnouncer

himanshug · 2017-05-09T18:26:16Z

@cheddar I also introduced DataSegmentServerAnnouncer and removed server announcements from DataSegmentAnnouncer implementation. Now only node who really serve segments will announce themselves in inventory and not random peon task processes.

himanshug · 2017-05-09T18:32:36Z

@gianm I still think we should keep it undocumented for now and not confuse regular users with a feature which is just to optimize things and haven't been used at scale yet. We will try and document it in next release probably.

cheddar · 2017-05-12T16:44:53Z

👍

…Zookeeper

…etty qos filters can be configured easily when needed

…segment list fetch has been succeeded from all servers

…ce themselves and not all peon processes

…oaded

* Optional segment announcement via HTTP (apache#3902) * BatchServerInventoryView is created twice (apache#3244)

himanshug force-pushed the seg_assignment branch from a391840 to 8e2b1be Compare February 2, 2017 20:59

himanshug force-pushed the seg_assignment branch 2 times, most recently from bd2f8de to 373feff Compare February 3, 2017 17:43

nishantmonu51 self-assigned this Feb 5, 2017

himanshug mentioned this pull request Feb 8, 2017

[Proposal] HTTPAnnouncer + Remove Zookeeper Dependency #2312

Closed

himanshug added this to the 0.10.1 milestone Feb 22, 2017

himanshug force-pushed the seg_assignment branch 2 times, most recently from 3962f92 to 824631a Compare April 11, 2017 15:45

himanshug force-pushed the seg_assignment branch from 824631a to 2295cf4 Compare April 24, 2017 17:25

cheddar requested changes Apr 27, 2017

View reviewed changes

himanshug force-pushed the seg_assignment branch from 9e2b359 to dc1ea15 Compare May 9, 2017 18:13

himanshug force-pushed the seg_assignment branch from dc1ea15 to 6df7f7e Compare May 12, 2017 16:22

cheddar approved these changes May 12, 2017

View reviewed changes

himanshug added 5 commits May 16, 2017 15:55

Optional long-polling based segment announcement via HTTP instead of …

917740f

…Zookeeper

address review comments

72211bc

make endpoint /druid-internal/v1 instead of /druid/internal so that j…

15ae9b1

…etty qos filters can be configured easily when needed

update segment callback initialization to be called only after first …

f2aa520

…segment list fetch has been succeeded from all servers

address review comments

dd0dcfc

himanshug added 3 commits May 16, 2017 15:55

remove size check not required anymore as only segment servers announ…

1b52d1e

…ce themselves and not all peon processes

annouce segment server on historical only after cached segments are l…

4d0f7d0

…oaded

fix checkstyle errors

f2d9126

himanshug force-pushed the seg_assignment branch from 0b9e04a to f2d9126 Compare May 16, 2017 21:09

himanshug merged commit daa8ef8 into apache:master May 17, 2017

himanshug deleted the seg_assignment branch December 29, 2017 17:35

clambertus unassigned nishantmonu51 Jul 6, 2018

seoeun25 added a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020

apache#2443 Add optional segment announcement via HTTP

c473f1c

* Optional segment announcement via HTTP (apache#3902) * BatchServerInventoryView is created twice (apache#3244)

Conversation

himanshug commented Feb 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaijianding commented Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

himanshug commented Feb 3, 2017

Uh oh!

drcrallen commented Feb 7, 2017

Uh oh!

himanshug commented Feb 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weijietong commented Feb 24, 2017

Uh oh!

himanshug commented Feb 24, 2017

Uh oh!

pjain1 commented Mar 10, 2017

Uh oh!

himanshug commented Apr 11, 2017

Uh oh!

gianm commented Apr 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

himanshug commented May 9, 2017

Uh oh!

himanshug commented May 9, 2017

Uh oh!

cheddar commented May 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

himanshug commented Feb 2, 2017 •

edited

Loading

kaijianding commented Feb 3, 2017 •

edited

Loading

himanshug commented Feb 8, 2017 •

edited

Loading