Skip to content

Optional segment load/drop management without zookeeper using http#4874

Closed
himanshug wants to merge 9 commits intoapache:masterfrom
himanshug:http_segment_assignment
Closed

Optional segment load/drop management without zookeeper using http#4874
himanshug wants to merge 9 commits intoapache:masterfrom
himanshug:http_segment_assignment

Conversation

@himanshug
Copy link
Copy Markdown
Contributor

@himanshug himanshug commented Sep 28, 2017

TODOs:

  • Test on a cluster with decent load.

This patch introduces following changes.

On Historical Side:

  1. All non-zookeeper related code is extracted out of ZkCoordinator and moved into new class SegmentLoadDropHandler. ZkCoordinator now uses SegmentLoadDropHandler to delegate load/drop requests.
  2. A new method ListenableFuture SegmentLoadDropHandler.processBatch(List<DataSegmentChangeRequest>) is added to support http endpoint for segment load/drop management.
  3. New endpoint POST /druid-internal/v1/segments/changeRequests is added in SegmentListerResource that uses ListenableFuture SegmentLoadDropHandler.processBatch(List<DataSegmentChangeRequest>) to delegate batch of load/drop requests received. Here is a copy-paste of javadoc for this endpoint.
  /**
   * This endpoint is used by HttpLoadQueuePeon to assign segment load/drop requests batch. This endpoint makes the
   * client wait till one of the following events occur. Note that this is implemented using async IO so no jetty
   * threads are held while in wait.
   *
   * (1) Given timeout elapses.
   * (2) Some load/drop request completed.
   *
   * It returns a map of "load/drop request -> SUCCESS/FAILED/PENDING status" for each request in the batch.
   */
  1. druid.segmentCache.numLoadingThreads configuration is revived and must be greater than the "druid.coordinator.loadqueuepeon.http.batchSize described later.

On Coordinator Side:

  1. CuratorLoadQueuePeon is created by copying existing LoadQueuePeon class.
  2. LoadQueuePeon is made abstract class.
  3. HttpLoadQueuePeon is created to do segment load/drop management via http. It uses the new endpoint introduced above to send batch of load/drop requests to Historical node.
  4. Following new configurations are introduced.
    druid.coordinator.loadqueuepeon.type = http or curator, curator by default
    druid.coordinator.loadqueuepeon.http.batchSize = number of load/drop to try and process in parallel, must be less than druid.segmentCache.numLoadingThreads on historical
    druid.coordinator.loadqueuepeon.http.repeatDelay = delay Duration for periodic check for new load/drop request to send to historical. note this could be very large given that same code is executed whenever a new load/drop is requested to HttpLoadQueuePeon without waiting for schedule to kick in, default 1 minute
    druid.coordinator.loadqueuepeon.http.hostTimeout = default 5 minutes, timeout used to be specified on new endpoint introduced for batch load/drop request.

In the long run, I plan to remove CuratorLoadQueuePeon and ZkCoordinator. They are being kept now only to be backward compatible and for HttpLoadQueuePeon to prove itself in production.

Documentation for HttpLoadQueuePeon is intentionally left out.

@himanshug himanshug force-pushed the http_segment_assignment branch 2 times, most recently from 50043d1 to 103d434 Compare October 3, 2017 16:57
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair is a really bad class to build an API around. It is poorly defined in terms of what the LHS and RHS are. Please remove the JSON annotations here and instead create a concrete class that actually represents what you are sending across the wire.

In general, if you ever see Pair<> on any sort of public function signature or API, it was done incorrectly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not using Pair anymore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not document this, was that intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will let it run on our servers for a while before documenting. hopefully we'll have enough confidence in this before 0.11.1 release so will get documented before that release.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation is an abomination. Just make it public

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor requires creator to pass the ExecutorService which is very handy for tests to be able to deterministic. However, I don't really want any other code to use this constructor and hence added that annotation. If you still believe that its not useful then I'll remove it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there no extra objects you could log out with this to provide more information in the log?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, just wanted to mark start/stop in log to check whether it indeed gets started/stopped etc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doing a cache of the load/drop requests or a cache of the segments that have already been stored? If the latter, Niketh has a PR that moves the cache to be co-located with the segment storage instead of a separate cache directory. This won't follow that logic, please explore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to work in this PR. its a copy paste from ZkCoordinator and this PR is keeping old things exactly same.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is acting like these requests succeeded, is that the correct thing to do?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this behavior is retained from existing behavior in CuratorLoadQueuePeon (copy of older LoadQueuePeon class )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and dropSegment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this behavior is retained from existing behavior in CuratorLoadQueuePeon (copy of older LoadQueuePeon class ). I am not sure about this but it was recently added so kept.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be returning the set, which is likely mutable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this behavior is retained from existing behavior in CuratorLoadQueuePeon (copy of older LoadQueuePeon class ). but changed to return unmodifiable version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this type thing? We should either leverage polymorphism or completely eliminate it. Using it sometimes and not other times gets confusing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use Polymorphism.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand this cancellation future stuff....

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is removed.

@himanshug himanshug changed the title [WIP] Optional segment load/drop management without zookeeper using http Optional segment load/drop management without zookeeper using http Oct 5, 2017
@himanshug himanshug force-pushed the http_segment_assignment branch 2 times, most recently from ec0b74f to 20ca85b Compare October 11, 2017 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants