Skip to content

improve query timeout handling and limit max scatter-gather bytes#4229

Merged
himanshug merged 2 commits intoapache:masterfrom
himanshug:query_timeout
May 16, 2017
Merged

improve query timeout handling and limit max scatter-gather bytes#4229
himanshug merged 2 commits intoapache:masterfrom
himanshug:query_timeout

Conversation

@himanshug
Copy link
Copy Markdown
Contributor

@himanshug himanshug commented Apr 28, 2017

towards #1415

also introduced maxScatterGatherBytes query context parameter. At Broker, this limits total number of bytes gathered from downstream nodes such as historicals and realtime indexers. If Broker is under heavy load and not consuming the data fast enough coming from downstream nodes then it gets stored in memory and may lead to OOMs. This setting provides a workaround to limit per query max memory utilization for storing data received from downstream nodes.

@himanshug himanshug added this to the 0.10.1 milestone Apr 28, 2017
@himanshug himanshug changed the title improve query timeout handling at broker [WIP]improve query timeout handling at broker May 5, 2017
@himanshug himanshug changed the title [WIP]improve query timeout handling at broker improve query timeout handling and limit max scatter-gather bytes May 5, 2017
Copy link
Copy Markdown
Contributor

@cheddar cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also create a runtime.properties property that can be used to enforce this limit even if the thing generating the query doesn't play nicely.

Said property should likely be enforced even if the context has a larger number in it.

[Himanshu]: created runtime property that is enforced and query fails if context param tries to increase the limit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well grab this once and cache it for the life of the response handler.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to create this once higher up rather than checked and double checked constantly with everything.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reference isn't going to change. Pull it out once and just use it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer if both of these were to return booleans and the call sites were to just skip things or whatever if the limits are exceeded.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they contain specific error message which would need to be repeated at all call sites. also, they are used in a relatively small scope and look ok.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not attach total bytes gathered as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxScatterGatherBytes is a really blunt hammer, it would fail queries that can stream through large result sets and don't necessarily need to fail (like non-nested groupBys, and scan queries).

What do you think about using backpressure instead? So instead of running out of memory (current behavior), or failing (this patch), instead stop reading from the data node temporarily.

@himanshug
Copy link
Copy Markdown
Contributor Author

@gianm even with non-nested groupBys etc if broker is not consuming data fast enough then it will all get accumulated in SequenceInputStream created in DirectDruidClient.
However, I think I should also "decrement" total_bytes as data is consumed from SequenceInputStream, so that if broker is consuming fast enough then the total_bytes number does not grow. total_bytes then becomes total number of bytes stored there at the time and not total_bytes consumed from data nodes. How would you feel about that instead of blocking?
problem with blocking is that if multiple "bad" queries are sent then they would end up causing OOMs.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 9, 2017

Discussed on the dev sync: since this is useful to you as-is, and making the change that you suggested would make the feature less useful to you, then it makes sense to go through with it as-is but to make it clear in the docs what the behavior would be and what the expected use case is. It seems like something pretty special-case to me.

@himanshug
Copy link
Copy Markdown
Contributor Author

@gianm yep, lets keep it as is.
timeout handling is still very general and applies to everyone. maxScatterGatherBytes, may be not so much.

i will updated the docs and resolve review comments.

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented May 12, 2017

👍

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the design; I did not review the code.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 12, 2017

@cheddar, should be good to merge if you reviewed the code, since I assume @himanshug implicitly approves the design of his own patch.

final QueryMetrics<? super Query<T>> queryMetrics = toolChest.makeMetrics(query);
queryMetrics.server(host);

long timeoutAt = ((Long) context.get(QUERY_FAIL_TIME)).longValue();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It introduces NPE, if the query is made bypassing QueryResource, i. e. from inside QueryEngine of another type of query, via QuerySegmentWalker.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Druid SQL ran into that too and was fixed via #4305.

I think this is fine since, imo, making queries without going through the resources is something you do 'at your own risk' and isn't an officially supported mode of operation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryResource is neither official API in #4433. Making a query as HTTP request to the same JVM runtime is ridiculous.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm saying is that I would consider the whole concept of queries making other queries as not officially supported. If you want to do that in an extension then go for it but there is no official API for it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale for me is:

  • Queries making other queries is going outside the normal framework. The normal framework is a query comes in to the broker, then fans out to historicals/other data nodes, then fans back in and the broker does a merge and returns results. There's not a standard place for "make another subquery" to fit in.
  • Queries making other queries may lead to metrics not being completely collected, or security rules not being properly applied (some of which are checked in the resources), or requests not being properly logged, or exceptions not being properly alerted on. This needs to be thought through.
  • Users are still free to write extensions where queries make other queries, but due to the above two points, I don't think that should be considered a "public api" at this time.

|`druid.server.http.numThreads`|Number of threads for HTTP requests.|10|
|`druid.server.http.maxIdleTime`|The Jetty max idle time for a connection.|PT5m|
|`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000|
|`druid.server.http.maxScatterGatherBytes`|Maximum number of bytes gathered from data nodes such as historicals and realtime processes to execute a query. This is an advance configuration that allows to protect in case broker is under heavy load and not utilizing the data gathered in memory fast enough and leading to OOMs. This limit can be further reduced at query time using `maxScatterGatherBytes` in the context. Note that having large limit is not necessarily bad if broker is never under heavy concurrent load in which case data gathered is processed quickly and freeing up the memory used.|Long.MAX_VALUE|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this property is prefixed by druid.server instead of druid.broker. Is it planned to be applied to other node types?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it makes sense for any other node type. And now that you mention it, I think probably made more sense to call it druid.broker.http.maxScatterGatherBytes . we can possibly change it in future or remove this config if backpressure story improves.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It sounds good.

@himanshug himanshug deleted the query_timeout branch December 29, 2017 17:34
@hellobabygogo
Copy link
Copy Markdown

@himanshug hi, How big is the most appropriate setting of maxScatterGatherBytes?

@himanshug
Copy link
Copy Markdown
Contributor Author

@hellobabygogo this is a blunt tool, we set it to 1G considering the type of queries we have size of expected response. this config exists to prevent broker from getting into an OOM situation , so "how big" really depends on max jvm heap and concurrent queries etc. #6313 might be more appropriate in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants