add configs to enable fast request failure on broker and historical#4540
add configs to enable fast request failure on broker and historical#4540himanshug merged 15 commits intoapache:masterfrom
Conversation
drcrallen
left a comment
There was a problem hiding this comment.
Good addition, but I have some architectural questions on how this is being applied.
| threadPool.setMinThreads(config.getNumThreads()); | ||
| threadPool.setMaxThreads(config.getNumThreads()); | ||
| final QueuedThreadPool threadPool; | ||
| if (config.getQueueSize() == Integer.MAX_VALUE) { |
There was a problem hiding this comment.
Can this reference a static constant in the io.druid package tree somewhere?
There was a problem hiding this comment.
i think its fine given that its a static constant already and just really checking whether queue is unbounded or bounded.
| threadPool = new QueuedThreadPool( | ||
| config.getNumThreads(), | ||
| config.getNumThreads(), | ||
| 60000, |
There was a problem hiding this comment.
Does this need to be hard coded?
There was a problem hiding this comment.
yes, just the default value that is used when you do new QueuedThreadPool() in current code, it isn't/wasn't configurable and i don't think it needs to be.
later socket idle timeout is set which is independent of this and that is indeed configurable.
| @Override | ||
| public Filter getFilter() | ||
| { | ||
| return new LimitRequestsFilter(maxActiveRequests); |
There was a problem hiding this comment.
the usage in io.druid.server.initialization.jetty.JettyServerInitUtils#addExtensionFilters seems to indicate this is expected to be memoized, but none of the implementations treat it that way. Is this a minor mis-handling in the Jetty Server Init Utils?
There was a problem hiding this comment.
'io.druid.server.initialization.jetty.JettyServerInitUtils#addExtensionFilters' gets called only once in the lifetime of Druid process when setting up Jetty , so getFilter() also gets called just once. It doesn't matter whether its memoized or not.
| { | ||
| private final int maxActiveRequests; | ||
|
|
||
| private AtomicInteger activeRequestsCount = new AtomicInteger(); |
| chain.doFilter(request, response); | ||
| } else { | ||
| // See https://tools.ietf.org/html/rfc6585 for status code 429 explanation. | ||
| ((HttpServletResponse)response).sendError(429, "Too Many Requests"); |
There was a problem hiding this comment.
Will this actually stop the response? if another filter attempts to send a DIFFERENT response, it is not at all clear which one would win. Especially since the Set usage of servlet holders makes the ordering non deterministic.
There was a problem hiding this comment.
guice doc says that set iteration order is same as the order of inserts done ( https://google.github.io/guice/api-docs/4.1/javadoc/com/google/inject/multibindings/Multibinder.html )
however its not guaranteed that this filter will be the first in the chain but it will definitely appear before any user filters defined in extensions.
now this will work as long as all filters before this filter follow the general rule of never blocking/holding the request .
however, as I think about it and to be absolutely sure.. I am going to move the add of this filter inside QueryJettyServerInitializer directly instead of using the filter holder mechanism , then its place in the chain would be guaranteed. That would have a side effect of this feature being technically available on historicals as well but that shouldn't change anything given that by default it wouldn't be included in the filter chain.
makes sense ?
|
@himanshug Is there any advantage of having the queueSize variable? We are throttling on the activeRequests. We can actually have queue size as the default (unbounded) and throttle on the activeRequests like you have done. This way we will avoid having yet another config variable for tuning also |
|
@niketh bounding the queueSize is available option for safety only , just in case if you didn't have enough jetty threads to handle requests before failing at the filter. |
|
@drcrallen ping ? |
| } | ||
| catch (ConfigurationException e) { | ||
| throw new ProvisionException(Iterables.getFirst(e.getErrorMessages(), null).getMessage()); | ||
| catch (Exception e) { |
There was a problem hiding this comment.
This will swallow interrupted exceptions, is it needed to catch all exceptions here?
Also, is there a reason the ProvisionException wouldn't be able to just take a "caused by"?
There was a problem hiding this comment.
at this point server startup would fail so it doesn't matter as we would do same for interrupted exception.
There was a problem hiding this comment.
and, no ProvidsionException does not take caused by.
There was a problem hiding this comment.
What's special about ProvisionException? Losing the actual stacktrace of an exception generally sucks (especially if this is going to kill the process start). Can we just wrap into a RuntimeException instead?
[Himanshu] changed, that works too.
| serverConfig.getNumThreads() > 1, | ||
| "numThreads must be > 1 to enable Request Limit Filter." | ||
| ); | ||
| log.info("Enabling Request Limit Filter with limit [%d].", serverConfig.getNumThreads()-1); |
There was a problem hiding this comment.
if you have x threads to handle request concurrently, then you can allow only "x-1" to go through the filter.. xth request needs to be rejected because (x+1)th request wouldn't come to filter and would rather wait on jetty queue.
There was a problem hiding this comment.
Ah, my brain had this as processing threads, but its actually jetty threads. gotcha
There was a problem hiding this comment.
added it as a comment.
| ); | ||
| log.info("Enabling Request Limit Filter with limit [%d].", serverConfig.getNumThreads()-1); | ||
| root.addFilter(new FilterHolder(new LimitRequestsFilter(serverConfig.getNumThreads()-1)), | ||
| "/*", null |
There was a problem hiding this comment.
Some of the "management" endpoints (like lookups) have a thread limit already. Having them both collide could be confusing.
There was a problem hiding this comment.
not sure about that but it should be simple to understand. this is a server level hard limit.... finer end point specific limits are there for resource appropriation really.
|
For example, do you really need to specify the atomic counter in order to achieve what you want? |
|
@drcrallen |
drcrallen
left a comment
There was a problem hiding this comment.
Suggest adding some unit tests for the new classes but otherwise looks good!
cheddar
left a comment
There was a problem hiding this comment.
I'm approving, but my comment asking for more clarity in the comment I think should be addressed before merging. I'm approving because I think it is fine to merge once addressed without further review.
| */ | ||
| public class JettyServerModule extends JerseyServletModule | ||
| { | ||
| // This is the maximum number of threads used by jetty acceptors and selectors |
There was a problem hiding this comment.
Given that this is a magic number based on the current implementation of Jetty that we use, please specify the breakdown of the magic number and the version of Jetty is it based on.
There was a problem hiding this comment.
added more details on computed number, also made it compute based on number of ServerConnectors being setup (we know that based on settings of whether plainText or TLS or both connectors are configured)
| } | ||
| catch (ConfigurationException e) { | ||
| throw new ProvisionException(Iterables.getFirst(e.getErrorMessages(), null).getMessage()); | ||
| catch (Exception e) { |
There was a problem hiding this comment.
What's special about ProvisionException? Losing the actual stacktrace of an exception generally sucks (especially if this is going to kill the process start). Can we just wrap into a RuntimeException instead?
[Himanshu] changed, that works too.
c1e6045 to
1d16220
Compare
|
@himanshug I'm putting together the 0.12.0 release notes now, can you update those with some info for this PR? #5211 |
|
@jon-wei sure, will update |
|
updated 0.12.0 release notes #5211 |
|
@himanshug I'm not seeing the update, I hope I didn't clobber your changes D: (I was updating some of the highlights) |
|
@jon-wei np, updated again. save a local copy this time :) |
|
@himanshug great, thanks! |
Currently clients can overwhelm a broker inadvertently by sending too many requests which get queued in unbounded jetty worker pool queue. Clients typically close the connection after a certain client side timeout but Broker has to handle all those requests giving appearance of being unresponsive while client would continue to retry and be stuck in this endless loop.
This patch provides configurations to have two type of limits.
(1) ability to limit jetty queue size: to be used when user wants to have limited number of requests waiting when all threads are busy handling other requests.
(2) ability to disable queuing: to be used when user want no request to wait but rather be rejected immediately.
See doc updates to broker.md and historical.md for configuration details.