improve query timeout handling and limit max scatter-gather bytes by himanshug · Pull Request #4229 · apache/druid

himanshug · 2017-04-28T16:03:20Z

towards #1415

also introduced maxScatterGatherBytes query context parameter. At Broker, this limits total number of bytes gathered from downstream nodes such as historicals and realtime indexers. If Broker is under heavy load and not consuming the data fast enough coming from downstream nodes then it gets stored in memory and may lead to OOMs. This setting provides a workaround to limit per query max memory utilization for storing data received from downstream nodes.

cheddar

Please also create a runtime.properties property that can be used to enforce this limit even if the thing generating the query doesn't play nicely.

Said property should likely be enforced even if the context has a larger number in it.

[Himanshu]: created runtime property that is enforced and query fails if context param tries to increase the limit.

cheddar · 2017-05-05T21:48:05Z

Might as well grab this once and cache it for the life of the response handler.

cheddar · 2017-05-05T21:50:36Z

I think it would be better to create this once higher up rather than checked and double checked constantly with everything.

cheddar · 2017-05-05T21:51:03Z

This reference isn't going to change. Pull it out once and just use it.

cheddar · 2017-05-05T21:54:56Z

I think it would be nicer if both of these were to return booleans and the call sites were to just skip things or whatever if the limits are exceeded.

they contain specific error message which would need to be repeated at all call sites. also, they are used in a relatively small scope and look ok.

cheddar · 2017-05-05T21:55:59Z

Why not attach total bytes gathered as well?

gianm

maxScatterGatherBytes is a really blunt hammer, it would fail queries that can stream through large result sets and don't necessarily need to fail (like non-nested groupBys, and scan queries).

What do you think about using backpressure instead? So instead of running out of memory (current behavior), or failing (this patch), instead stop reading from the data node temporarily.

himanshug · 2017-05-08T14:12:41Z

@gianm even with non-nested groupBys etc if broker is not consuming data fast enough then it will all get accumulated in SequenceInputStream created in DirectDruidClient.
However, I think I should also "decrement" total_bytes as data is consumed from SequenceInputStream, so that if broker is consuming fast enough then the total_bytes number does not grow. total_bytes then becomes total number of bytes stored there at the time and not total_bytes consumed from data nodes. How would you feel about that instead of blocking?
problem with blocking is that if multiple "bad" queries are sent then they would end up causing OOMs.

gianm · 2017-05-09T17:35:48Z

Discussed on the dev sync: since this is useful to you as-is, and making the change that you suggested would make the feature less useful to you, then it makes sense to go through with it as-is but to make it clear in the docs what the behavior would be and what the expected use case is. It seems like something pretty special-case to me.

himanshug · 2017-05-09T18:33:58Z

@gianm yep, lets keep it as is.
timeout handling is still very general and applies to everyone. maxScatterGatherBytes, may be not so much.

i will updated the docs and resolve review comments.

cheddar · 2017-05-12T16:56:38Z

👍

gianm

👍 on the design; I did not review the code.

gianm · 2017-05-12T20:03:52Z

@cheddar, should be good to merge if you reviewed the code, since I assume @himanshug implicitly approves the design of his own patch.

leventov · 2017-06-21T23:15:06Z

      final QueryMetrics<? super Query<T>> queryMetrics = toolChest.makeMetrics(query);
      queryMetrics.server(host);

+      long timeoutAt = ((Long) context.get(QUERY_FAIL_TIME)).longValue();


It introduces NPE, if the query is made bypassing QueryResource, i. e. from inside QueryEngine of another type of query, via QuerySegmentWalker.

Druid SQL ran into that too and was fixed via #4305.

I think this is fine since, imo, making queries without going through the resources is something you do 'at your own risk' and isn't an officially supported mode of operation.

QueryResource is neither official API in #4433. Making a query as HTTP request to the same JVM runtime is ridiculous.

What I'm saying is that I would consider the whole concept of queries making other queries as not officially supported. If you want to do that in an extension then go for it but there is no official API for it.

Rationale for me is:

Queries making other queries is going outside the normal framework. The normal framework is a query comes in to the broker, then fans out to historicals/other data nodes, then fans back in and the broker does a merge and returns results. There's not a standard place for "make another subquery" to fit in.

Queries making other queries may lead to metrics not being completely collected, or security rules not being properly applied (some of which are checked in the resources), or requests not being properly logged, or exceptions not being properly alerted on. This needs to be thought through.

Users are still free to write extensions where queries make other queries, but due to the above two points, I don't think that should be considered a "public api" at this time.

jihoonson · 2017-10-26T00:54:28Z

 |`druid.server.http.numThreads`|Number of threads for HTTP requests.|10|
 |`druid.server.http.maxIdleTime`|The Jetty max idle time for a connection.|PT5m|
 |`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000|
+|`druid.server.http.maxScatterGatherBytes`|Maximum number of bytes gathered from data nodes such as historicals and realtime processes to execute a query. This is an advance configuration that allows to protect in case broker is under heavy load and not utilizing the data gathered in memory fast enough and leading to OOMs. This limit can be further reduced at query time using `maxScatterGatherBytes` in the context. Note that having large limit is not necessarily bad if broker is never under heavy concurrent load in which case data gathered is processed quickly and freeing up the memory used.|Long.MAX_VALUE|


I wonder why this property is prefixed by druid.server instead of druid.broker. Is it planned to be applied to other node types?

i don't think it makes sense for any other node type. And now that you mention it, I think probably made more sense to call it druid.broker.http.maxScatterGatherBytes . we can possibly change it in future or remove this config if backpressure story improves.

Thanks. It sounds good.

hellobabygogo · 2019-05-13T14:23:25Z

@himanshug hi, How big is the most appropriate setting of maxScatterGatherBytes?

himanshug · 2019-05-13T16:38:59Z

@hellobabygogo this is a blunt tool, we set it to 1G considering the type of queries we have size of expected response. this config exists to prevent broker from getting into an OOM situation , so "how big" really depends on max jvm heap and concurrent queries etc. #6313 might be more appropriate in general.

himanshug force-pushed the query_timeout branch from bb7ce47 to 860e5db Compare April 28, 2017 16:35

himanshug added this to the 0.10.1 milestone Apr 28, 2017

himanshug changed the title ~~improve query timeout handling at broker~~ [WIP]improve query timeout handling at broker May 5, 2017

himanshug force-pushed the query_timeout branch from 860e5db to c64f8b0 Compare May 5, 2017 19:25

himanshug changed the title ~~[WIP]improve query timeout handling at broker~~ improve query timeout handling and limit max scatter-gather bytes May 5, 2017

himanshug force-pushed the query_timeout branch from c64f8b0 to 57347c8 Compare May 5, 2017 21:32

cheddar requested changes May 5, 2017

View reviewed changes

gianm reviewed May 5, 2017

View reviewed changes

gianm added Design Review Feature labels May 9, 2017

himanshug force-pushed the query_timeout branch from 243ed70 to 62d98fe Compare May 10, 2017 15:53

himanshug added 2 commits May 11, 2017 08:43

improve query timeout handling and limit max scatter-gather bytes

9c43b2d

address review comments

882452f

himanshug force-pushed the query_timeout branch from 62d98fe to 882452f Compare May 11, 2017 13:43

cheddar approved these changes May 12, 2017

View reviewed changes

gianm approved these changes May 12, 2017

View reviewed changes

himanshug merged commit 136b2fa into apache:master May 16, 2017

himanshug mentioned this pull request May 17, 2017

fix timeout check bug in DirectDruidClient #4287

Merged

jon-wei mentioned this pull request May 20, 2017

Timeout and maxScatterGatherBytes handling for queries run by Druid SQL #4305

Merged

leventov reviewed Jun 21, 2017

View reviewed changes

gianm mentioned this pull request Sep 29, 2017

Unable to query large data set with scan-query via broker #4865

Closed

gianm mentioned this pull request Oct 10, 2017

Backpressure for broker-to-historical communication #4933

Closed

jihoonson reviewed Oct 26, 2017

View reviewed changes

himanshug deleted the query_timeout branch December 29, 2017 17:34

Conversation

himanshug commented Apr 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cheddar left a comment • edited by himanshug Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

himanshug commented May 8, 2017

Uh oh!

gianm commented May 9, 2017

Uh oh!

himanshug commented May 9, 2017

Uh oh!

cheddar commented May 12, 2017

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

gianm commented May 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hellobabygogo commented May 13, 2019

Uh oh!

himanshug commented May 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

himanshug commented Apr 28, 2017 •

edited

Loading

cheddar left a comment •

edited by himanshug

Loading

gianm left a comment •

edited

Loading