Add QOS filtering of overlord requests#18033
Conversation
107aadb to
45deabe
Compare
| public class ServerConfig | ||
| { | ||
| public static final int DEFAULT_GZIP_INFLATE_BUFFER_SIZE = 4096; | ||
| public static final int DEFAULT_NUM_PACKING_THREADS = 30; |
There was a problem hiding this comment.
What does "packing" signify here?
There was a problem hiding this comment.
its the extra threads we add . Couldn't really find a good word.
|
|
||
| threadPool.setDaemon(true); | ||
| jettyServerThreadPool = threadPool; | ||
| jettyServerThreadPool.setDaemon(true); |
There was a problem hiding this comment.
Original code was more appropriate where jettyServerThreadPool was assigned only after the value threadPool was fully baked.
Otherwise, the doMonitor method might emit erratic metrics.
The ideal fix here would to be to make jettyServerThreadPool non-static but that is not needed in this PR and it would require other clean up too.
There was a problem hiding this comment.
Fixed the jettyServerPool to get initialized in one go with the relevant threads.
The daemon stuff should not effect the monitor emitting metrics.
Yeah there is a race here. Agreed we can fix it as part of another PR.
There was a problem hiding this comment.
Are the changes to this file really needed in this PR?
I am not sure how it affects the main QoS change.
| final QueuedThreadPool queuedThreadPool = (QueuedThreadPool) server.getThreadPool(); | ||
| final int maxThreads = queuedThreadPool.getMaxThreads(); |
There was a problem hiding this comment.
Might be nice to have the logic to compute the max threads commoned out somewhere that can be used here. It seems weird to need a handle to the actual ThreadPool object just to determine the max threads.
| } | ||
| } | ||
|
|
||
| protected static boolean addQOSFiltering(ServletContextHandler root, int threadsForOvelordWork) |
There was a problem hiding this comment.
Please add a short javadoc.
| protected static boolean addQOSFiltering(ServletContextHandler root, int threadsForOvelordWork) | |
| protected static boolean addQosFiltering(ServletContextHandler root, int threadsForOvelordWork) |
| * As QOS filtering is enabled on overlord requests, we need to update the QOS filter paths in | ||
| * {@link org.apache.druid.cli.CliOverlord#addQOSFiltering(ServletContextHandler, int)} when a new jersey resource is added. |
There was a problem hiding this comment.
| * As QOS filtering is enabled on overlord requests, we need to update the QOS filter paths in | |
| * {@link org.apache.druid.cli.CliOverlord#addQOSFiltering(ServletContextHandler, int)} when a new jersey resource is added. | |
| * Since QoS filtering is enabled for Overlord requests, update the QoS filter paths in | |
| * {@link #addQosFiltering} whenever a new jersey resource is added here. |
| }, | ||
| threadsForOvelordWork | ||
| ); | ||
| JettyServerInitUtils.addFilters(root, Collections.singleton(filterHolder)); |
There was a problem hiding this comment.
If already bound using JettyBindings.addQosFilter in addOverlordJerseyResources, we wouldn't need to call this at all since JettyServerInitUtils.addQosFilters is already being called from OverlordJettyServerInitializer.initialize.
| JettyBindings.QosFilterHolder filterHolder = new JettyBindings.QosFilterHolder( | ||
| new String[]{ | ||
| "/druid-internal/v1/*", | ||
| "/druid/indexer/v1/*" | ||
| }, | ||
| threadsForOvelordWork | ||
| ); |
There was a problem hiding this comment.
Other modules seem to use JettyBindings.addQosFilter for this purpose. Please see comment on addOverlordJerseyResources method.
| "QOS filter is disabled for the overlord requests." + | ||
| "Set `druid.server.http.numThread` to a value greater than %d to enable QoSFilter.", | ||
| ServerConfig.DEFAULT_NUM_PACKING_THREADS |
There was a problem hiding this comment.
| "QOS filter is disabled for the overlord requests." + | |
| "Set `druid.server.http.numThread` to a value greater than %d to enable QoSFilter.", | |
| ServerConfig.DEFAULT_NUM_PACKING_THREADS | |
| "QoS filtering is disabled for Overlord requests. " + | |
| "Set `druid.server.http.numThreads` to a value greater than [%d] to enable.", | |
| ServerConfig.DEFAULT_NUM_PACKING_THREADS |
There was a problem hiding this comment.
Thanks for the catch.
| protected static boolean addQOSFiltering(ServletContextHandler root, int threadsForOvelordWork) | ||
| { | ||
| if (threadsForOvelordWork >= ServerConfig.DEFAULT_NUM_PACKING_THREADS) { | ||
| log.info("Enabling QOS filter on overlord requests with limit [%d].", threadsForOvelordWork); |
There was a problem hiding this comment.
| log.info("Enabling QOS filter on overlord requests with limit [%d].", threadsForOvelordWork); | |
| log.info("Enabling QoS filtering for Overlord requests with limit[%d].", threadsForOvelordWork); |
| * {@link org.apache.druid.cli.CliOverlord#addQOSFiltering(ServletContextHandler, int)} when a new jersey resource is added. | ||
| */ | ||
| private static class OverlordJettyServerInitializer implements JettyServerInitializer | ||
| private static void addOverlordJerseyResources(Binder binder) |
There was a problem hiding this comment.
Going by what other modules such as LookupModule are doing, we should bind the qos filter in this method itself using JettyBindings.addQosFilter.
There was a problem hiding this comment.
Since we donot have the exact thread pool handy, at the time of binding we can probably not do this.
There was a problem hiding this comment.
Okay Found a way to do this. Adjusting.
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
Will get to this soon. |
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
|
reopen |
|
|
||
| final int threadsForOverlordWork = serverHttpNumThreads - THREADS_RESERVED_FOR_HEALTH_CHECK; | ||
|
|
||
| if (threadsForOverlordWork >= ServerConfig.DEFAULT_MIN_QOS_THRESHOLD) { |
There was a problem hiding this comment.
Why is this based on ServerConfig.DEFAULT_MIN_QOS_THRESHOLD (30)? If someone sets druid.server.http.numThreads = 25 then they won't have QoS; I don't understand why that is good.
Secondly, IMO it would be better to apply the QoS to the action API only. That way, if the OL is bogged down with handling actions, it can still respond to the APIs that the web console uses. Applying QoS to all OL APIs, like this PR currently does, would make it able to respond to health checks but not actually function in any other way. It seems to defeat the purpose of a health check.
So, how about having a config like druid.indexer.server.maxConcurrentActions where the default is based on serverHttpNumThreads? Perhaps max(1, serverHttpNumThreads - 4) or max(1, serverHttpNumThreads * 0.8). And allow admins to set it to druid.indexer.server.maxConcurrentActions = 0 which would disable the QoS.
There was a problem hiding this comment.
I was optimizing the patch for larger clusters, because that was the use case I had at that time, but thinking more you are correct it should happen across the board.
Added a new config:
"druid.indexer.server.maxConcurrentActions"
Went with max(1, max(serverHttpNumThreads - 4, serverHttpNumThreads * 0.8)) so that large clusters are not impacted much.
|
|
||
|
|
||
| final int serverHttpNumThreads = properties.containsKey(CliIndexerServerModule.SERVER_HTTP_NUM_THREADS_PROPERTY) | ||
| ? Integer.parseInt(properties.getProperty(CliIndexerServerModule.SERVER_HTTP_NUM_THREADS_PROPERTY)) |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
|
|
||
| final int maxConcurrentActions; | ||
| if (properties.containsKey("druid.indexer.server.maxConcurrentActions")) { | ||
| maxConcurrentActions = Integer.parseInt(properties.getProperty("druid.indexer.server.maxConcurrentActions")); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
Adds QOS filtering for the overlord so that health check threads are not blocked.