Add limit to query result buffering queue#4949
Add limit to query result buffering queue#4949stevenchen3 wants to merge 3 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
@stevenchen3 thanks for the contribution! Is this fixing #4933? (It sounds like it is, from your description.)
Could you please fill out the project CLA at http://druid.io/community/cla.html?
| return maxBufferSizeBytes - totalBufferedBytes.get(); | ||
| } | ||
|
|
||
| private void checkBufferCapacity(long bytes, long timeoutMs) |
There was a problem hiding this comment.
It looks like this method will block until buffer space becomes available. We should make sure it's not running in any threads where that would be problematic.
| } | ||
| queue.put(new ChannelBufferInputStream(response.getContent())); | ||
|
|
||
| checkBufferCapacity(bytes, queryBufferingTimeout); |
There was a problem hiding this comment.
I think this is going to be a busy-wait-loop in a netty thread (IIRC that is what handleResponse gets called in) so this could cause starvation of the netty thread pool.
Am I reading this right? If so is there some better way we can do backpressure with netty?
I'm not a netty expert so I'm not sure what the best way is to go here. Ideally we want netty to stop reading from the socket and stop delivering content to us, which would propagate to the historical nodes, which would eventually block up their http threads. But I don't know how to make that happen with netty.
There was a problem hiding this comment.
http://normanmaurer.me/presentations/2014-facebook-eng-netty/slides.html#33.0 seems maybe relevant.
There was a problem hiding this comment.
@gianm True. It might cause starvation and that's why I introduced queryBufferingTimeout. This is a workaround to fix the OOMs issue with the broker, I am not a netty expert neither, not sure what will be the best to do backpressure with netty.
There was a problem hiding this comment.
Are you interested in exploring a way to do this without blocking the netty threads? That would really be an ideal solution I think.
There was a problem hiding this comment.
Seems like netty also supports some watermark buffers which can possibly be used to achive backpressure.
Atleast the javadocs here seems to indicate this -
http://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html
However this might need an upgrade to netty version used by druid.
There was a problem hiding this comment.
@gianm I would love to explore what I can do without blocking netty threads.
| checkQueryTimeout(); | ||
| checkTotalBytesLimit(response.getContent().readableBytes()); | ||
|
|
||
| final long bytes = response.getContent().readableBytes(); |
There was a problem hiding this comment.
Nit: You can rename it totalBytes or something similar.
| private void checkBufferCapacity(long bytes, long timeoutMs) | ||
| { | ||
| final long startTimeMs = System.currentTimeMillis(); | ||
| boolean isTimeout = false; |
There was a problem hiding this comment.
Nit: Instead you can do something like:
private void checkBufferCapacity(long bytes, long timeoutMs)
{
final long startTimeMs = System.currentTimeMillis();
boolean isTimeout = false;
while (maxBufferSizeBytes < Long.MAX_VALUE && getBufferCapacity() < bytes) {
if (System.currentTimeMillis() - startTimeMs > timeoutMs) {
String msg = StringUtils.format(
"Query[%s] url[%s] max buffer limit reached: waiting for free buffer timeout",
query.getId(),
url
);
setupResponseReadFailure(msg, null);
throw new RE(msg);
}
}
totalBufferedBytes.addAndGet(bytes);
}
| } | ||
|
|
||
| public static <T> Query<T> withMaxScatterGatherBytes(Query<T> query, long maxScatterGatherBytesLimit) | ||
| private static <T> Query<T> withCustomizedArgument(Query<T> query, String key, long value) |
There was a problem hiding this comment.
consider renaming method name to withCustomizedLimit and argument value -> maxLimit
| } | ||
| queue.put(new ChannelBufferInputStream(response.getContent())); | ||
|
|
||
| checkBufferCapacity(bytes, queryBufferingTimeout); |
There was a problem hiding this comment.
Seems like netty also supports some watermark buffers which can possibly be used to achive backpressure.
Atleast the javadocs here seems to indicate this -
http://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html
However this might need an upgrade to netty version used by druid.
|
It looks like #6014 is attempting to solve the same problem. |
|
I ended up hitting a LOT of problems when I tried to get #6014 into production that I wrote up in the pr. |
|
I think this will run into the same problem that @drcrallen ran into with #6014, due to blocking HttpClient worker threads. I am going to close this since I think we need a different approach: but, @stevenchen3 I hope we can continue the discussion in #6014 or #4933. |
The current implementation of
DirectDruidClientmaintains an unlimited queue to buffer data returned by query nodes. When query huge amount of data from Druid viabrokerand the user client side is unable to consume the results as fast as Druid servers produce them, it will causeOutOfMemoryErroronbroker(see #4865 for more details).In this PR, we propose to add a configurable limit (i.e., limit to total bytes to buffer at this queue to
maxBufferSizeBytes, the default value isLong.MAX_VALUE) to this queue on a per query basis. Compared tomaxScatterGatherBytes, it will wait untilqueryBufferingTimeout(the default value is 5 minutes) when the queue currently reaches its capacity rather than fail the query immediately. However, it still respects the behavior ofmaxScatterGatherBytes.