improve query timeout handling and limit max scatter-gather bytes#4229
improve query timeout handling and limit max scatter-gather bytes#4229himanshug merged 2 commits intoapache:masterfrom
Conversation
bb7ce47 to
860e5db
Compare
There was a problem hiding this comment.
Please also create a runtime.properties property that can be used to enforce this limit even if the thing generating the query doesn't play nicely.
Said property should likely be enforced even if the context has a larger number in it.
[Himanshu]: created runtime property that is enforced and query fails if context param tries to increase the limit.
There was a problem hiding this comment.
Might as well grab this once and cache it for the life of the response handler.
There was a problem hiding this comment.
I think it would be better to create this once higher up rather than checked and double checked constantly with everything.
There was a problem hiding this comment.
This reference isn't going to change. Pull it out once and just use it.
There was a problem hiding this comment.
I think it would be nicer if both of these were to return booleans and the call sites were to just skip things or whatever if the limits are exceeded.
There was a problem hiding this comment.
they contain specific error message which would need to be repeated at all call sites. also, they are used in a relatively small scope and look ok.
There was a problem hiding this comment.
Why not attach total bytes gathered as well?
There was a problem hiding this comment.
maxScatterGatherBytes is a really blunt hammer, it would fail queries that can stream through large result sets and don't necessarily need to fail (like non-nested groupBys, and scan queries).
What do you think about using backpressure instead? So instead of running out of memory (current behavior), or failing (this patch), instead stop reading from the data node temporarily.
|
@gianm even with non-nested groupBys etc if broker is not consuming data fast enough then it will all get accumulated in SequenceInputStream created in DirectDruidClient. |
|
Discussed on the dev sync: since this is useful to you as-is, and making the change that you suggested would make the feature less useful to you, then it makes sense to go through with it as-is but to make it clear in the docs what the behavior would be and what the expected use case is. It seems like something pretty special-case to me. |
|
@gianm yep, lets keep it as is. i will updated the docs and resolve review comments. |
|
👍 |
gianm
left a comment
There was a problem hiding this comment.
👍 on the design; I did not review the code.
|
@cheddar, should be good to merge if you reviewed the code, since I assume @himanshug implicitly approves the design of his own patch. |
| final QueryMetrics<? super Query<T>> queryMetrics = toolChest.makeMetrics(query); | ||
| queryMetrics.server(host); | ||
|
|
||
| long timeoutAt = ((Long) context.get(QUERY_FAIL_TIME)).longValue(); |
There was a problem hiding this comment.
It introduces NPE, if the query is made bypassing QueryResource, i. e. from inside QueryEngine of another type of query, via QuerySegmentWalker.
There was a problem hiding this comment.
Druid SQL ran into that too and was fixed via #4305.
I think this is fine since, imo, making queries without going through the resources is something you do 'at your own risk' and isn't an officially supported mode of operation.
There was a problem hiding this comment.
QueryResource is neither official API in #4433. Making a query as HTTP request to the same JVM runtime is ridiculous.
There was a problem hiding this comment.
What I'm saying is that I would consider the whole concept of queries making other queries as not officially supported. If you want to do that in an extension then go for it but there is no official API for it.
There was a problem hiding this comment.
Rationale for me is:
- Queries making other queries is going outside the normal framework. The normal framework is a query comes in to the broker, then fans out to historicals/other data nodes, then fans back in and the broker does a merge and returns results. There's not a standard place for "make another subquery" to fit in.
- Queries making other queries may lead to metrics not being completely collected, or security rules not being properly applied (some of which are checked in the resources), or requests not being properly logged, or exceptions not being properly alerted on. This needs to be thought through.
- Users are still free to write extensions where queries make other queries, but due to the above two points, I don't think that should be considered a "public api" at this time.
| |`druid.server.http.numThreads`|Number of threads for HTTP requests.|10| | ||
| |`druid.server.http.maxIdleTime`|The Jetty max idle time for a connection.|PT5m| | ||
| |`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000| | ||
| |`druid.server.http.maxScatterGatherBytes`|Maximum number of bytes gathered from data nodes such as historicals and realtime processes to execute a query. This is an advance configuration that allows to protect in case broker is under heavy load and not utilizing the data gathered in memory fast enough and leading to OOMs. This limit can be further reduced at query time using `maxScatterGatherBytes` in the context. Note that having large limit is not necessarily bad if broker is never under heavy concurrent load in which case data gathered is processed quickly and freeing up the memory used.|Long.MAX_VALUE| |
There was a problem hiding this comment.
I wonder why this property is prefixed by druid.server instead of druid.broker. Is it planned to be applied to other node types?
There was a problem hiding this comment.
i don't think it makes sense for any other node type. And now that you mention it, I think probably made more sense to call it druid.broker.http.maxScatterGatherBytes . we can possibly change it in future or remove this config if backpressure story improves.
|
@himanshug hi, How big is the most appropriate setting of maxScatterGatherBytes? |
|
@hellobabygogo this is a blunt tool, we set it to 1G considering the type of queries we have size of expected response. this config exists to prevent broker from getting into an OOM situation , so "how big" really depends on max jvm heap and concurrent queries etc. #6313 might be more appropriate in general. |
towards #1415
also introduced
maxScatterGatherBytesquery context parameter. At Broker, this limits total number of bytes gathered from downstream nodes such as historicals and realtime indexers. If Broker is under heavy load and not consuming the data fast enough coming from downstream nodes then it gets stored in memory and may lead to OOMs. This setting provides a workaround to limit per query max memory utilization for storing data received from downstream nodes.