Make timeout behavior consistent to document#4134
Conversation
|
Travis failure seems not related to this patch. All tests passed in my local machine. |
| timeoutAt - (System.currentTimeMillis() - start) | ||
| ); | ||
| if (hasTimeout) { | ||
| responseContext.put( |
There was a problem hiding this comment.
Thanks for letting me know. This behavior looks weird, so I'd like to fix it in this pr if it doesn't require so large change. Since I don't know exactly how the responseContext is supposed to be used, let me make sure what the correct solution is. Is it ok to just move CTX_TIMEOUT_AT to QueryContext instead of ResponseContext?
There was a problem hiding this comment.
Maybe it's not that easy to fix this, because CTX_TIMEOUT_AT and CTX_COUNT are used in some non-obvious way in ScanQueryEngine
There was a problem hiding this comment.
Ok, then I'll look at after this pr.
| final Number queryTimeout = query.getContextValue(QueryContextKeys.TIMEOUT, null); | ||
| final long timeoutAt = (queryTimeout == null || queryTimeout.longValue() == 0L) | ||
| ? JodaUtils.MAX_INSTANT : System.currentTimeMillis() + queryTimeout.longValue(); | ||
| final long timeoutAt = System.currentTimeMillis() + QueryContexts.getTimeout(query); |
There was a problem hiding this comment.
Here timeoutAt is set to System.currentTimeMillis() if timeout is not specified, is that what you wanted to do?
There was a problem hiding this comment.
timeoutAt is used only if timeout is specified (https://github.com/jihoonson/druid/blob/8b64ce6ec9e083632bce7bf9fcc365b1d857a41b/extensions-contrib/scan-query/src/main/java/io/druid/query/scan/ScanQueryEngine.java#L161-L182).
| Number timeout = query.getContextValue(QueryContextKeys.TIMEOUT); | ||
| if (timeout == null) { | ||
| final long timeout = QueryContexts.getTimeout(query); | ||
| if (timeout == 0) { |
There was a problem hiding this comment.
QueryContexts.DEFAULT_TIMEOUT? Or add a method QueryContext.isNoTimeout(timeout)
| return new MergeIterable<>( | ||
| ordering.nullsFirst(), | ||
| timeout == null ? | ||
| timeout == 0 ? |
| final Number timeout = query.getContextValue(QueryContextKeys.TIMEOUT, (Number) null); | ||
| if (timeout == null) { | ||
| final long timeout = QueryContexts.getTimeout(query); | ||
| if (timeout == 0) { |
| public Sequence<T> run(final Query<T> query, Map<String, Object> responseContext) | ||
| { | ||
| if (BaseQuery.getContextBySegment(query, false)) { | ||
| if (QueryContexts.isBySegment(query, false)) { |
There was a problem hiding this comment.
isBySegment is always called with false as the default argument, maybe remove the parameter
| throw new TimeoutException(); | ||
| } | ||
| } else { | ||
| mergeBufferHolder = mergeBufferPool.take(-1); |
There was a problem hiding this comment.
In the spirit of blocking APIs from JDK to have a method that takes a timeout and timeUnit, and a method without parameters for indefinite blocking, rather than encoding this by negative argument.
| final long timeout = QueryContexts.getTimeout(query); | ||
| final ResourceHolder<List<ByteBuffer>> mergeBufferHolders = mergeBufferPool.takeBatch( | ||
| requiredMergeBufferNum, timeout.longValue() | ||
| requiredMergeBufferNum, timeout |
There was a problem hiding this comment.
takeBatch() javadoc says that "negative means no timeout", here timeout is QueryContext.DEFAULT_TIMEOUT, that is 0.
| final Number timeout = query.getContextValue(QueryContextKeys.TIMEOUT, (Number) null); | ||
| return timeout == null ? future.get() : future.get(timeout.longValue(), TimeUnit.MILLISECONDS); | ||
| final long timeout = QueryContexts.getTimeout(query); | ||
| return timeout == 0 ? future.get() : future.get(timeout, TimeUnit.MILLISECONDS); |
| ) | ||
| ); | ||
|
|
||
| if (QueryContexts.getTimeout(query) < 0) { |
There was a problem hiding this comment.
This check could be moved into QueryContexts.getTimeout()
| * | ||
| * @return a resource, or null if the timeout was reached | ||
| */ | ||
| public ReferenceCountingResourceHolder<T> take(final long timeout) |
There was a problem hiding this comment.
If you don't want to add a time unit parameter (which in my choice even if the method is always called with the same time unit argument, for readability on the caller side), please rename the parameter to "timeoutMs".
| } | ||
| } | ||
| ); | ||
| return wrapObject(timeout > 0 ? pollObject(timeout) : pollObject()); |
There was a problem hiding this comment.
Since there is a dedicated no-timeout method, in this method I suggest to check if timeout is positive and throw an exception otherwise.
| queryWatcher.registerQuery(query, future); | ||
| final long timeout = QueryContexts.getTimeout(query); | ||
| if (timeout == 0) { | ||
| if (QueryContexts.isNoTimeout(timeout)) { |
There was a problem hiding this comment.
I think it's clearer here:
if (QueryContext.hasTimeout(query)) {
future.get(QueryContext.getTimeout(query), TimUnit.MILLISECONDS);
} else {
future.get();
}This is also true for other places where isNoTimeout() is used, except the one in ChainedExecutionQueryRunner.
There was a problem hiding this comment.
Ah, sorry I missed this comment. Yeah, it looks good. I changed including ChainedExecutionQueryRunner. It also looks simple and good.
| final Number queryTimeout = query.getContextValue(QueryContextKeys.TIMEOUT, null); | ||
| final long timeoutAt = (queryTimeout == null || queryTimeout.longValue() == 0L) | ||
| ? JodaUtils.MAX_INSTANT : System.currentTimeMillis() + queryTimeout.longValue(); | ||
| final long timeoutAt = System.currentTimeMillis() + QueryContexts.getTimeout(query); |
| timeoutAt - (System.currentTimeMillis() - start) | ||
| ); | ||
| if (hasTimeout) { | ||
| responseContext.put( |
There was a problem hiding this comment.
Maybe it's not that easy to fix this, because CTX_TIMEOUT_AT and CTX_COUNT are used in some non-obvious way in ScanQueryEngine
|
@leventov, thanks. I've addressed your comments. |
| * @return a resource, or null if the timeout was reached | ||
| */ | ||
| public ReferenceCountingResourceHolder<T> take(final long timeout) | ||
| public ReferenceCountingResourceHolder<T> take(final long timeoutMs) |
There was a problem hiding this comment.
It still has non-obvious undocumented corner case behaviour if timeoutMs=0, than it waits indefinitely, rather than not waiting at all.
There was a problem hiding this comment.
I don't understand this comment.. If timeoutMs = 0, then take() immediately returns an object or null without waiting at all. Actually, negative timeoutMss have the same meaning.
| queryWatcher.registerQuery(query, future); | ||
| final long timeout = QueryContexts.getTimeout(query); | ||
| if (timeout == 0) { | ||
| if (QueryContexts.isNoTimeout(timeout)) { |
|
@leventov I first thought that it could be used in the future, so kept it. But, yeah, it's simple and can add again if necessary. I removed now. |
|
Would anyone please review this patch? |
|
Taking a look. |
gianm
left a comment
There was a problem hiding this comment.
LGTM other than comment about how the default query timeout config is dealt with.
| queryId = UUID.randomUUID().toString(); | ||
| query = query.withId(queryId); | ||
| } | ||
| if (query.getContextValue(QueryContextKeys.TIMEOUT) == null) { |
There was a problem hiding this comment.
Instead of changing the default behavior, how about keeping the default behavior the same, but fixing the docs and making it more configurable? So I'm suggesting:
- Add a
druid.server.http.defaultQueryTimeoutproperty and document that in configuration/broker.md and configuration/historical.md. Use that to set the default timeout here, and default that property to PT5M. - Update query-context.md to reflect that the default isn't 0, it's the value of
druid.server.http.defaultQueryTimeout.
There was a problem hiding this comment.
Sounds good. I changed the default query timeout to be configurable.
There was a problem hiding this comment.
Sorry @jihoonson, I wasn't clear. For backwards compatibility reasons the query context parameter should stay milliseconds. Since druid.server.http.defaultQueryTimeout is new it could either be milliseconds or a period.
There was a problem hiding this comment.
Right. I changed both of them to millis.
|
|
||
| Query<T> withDataSource(DataSource dataSource); | ||
|
|
||
| Query<T> withDefaultTimeout(long defaultTimeout); |
There was a problem hiding this comment.
Yes, you're right. I missed this and we need to fix it.
Reported here.
maxIdleTime.This change is