Alternative groupBy strategy.#2998
Conversation
| |`druid.processing.buffer.sizeBytes`|This specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime nodes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed.|1073741824 (1GB)| | ||
| |`druid.processing.buffer.poolCacheMaxCount`|processing buffer pool caches the buffers for later use, this is the maximum count cache will grow to. note that pool can create more buffers than it can cache if necessary.|Integer.MAX_VALUE| | ||
| |`druid.processing.formatString`|Realtime and historical nodes use this format string to name their processing threads.|processing-%s| | ||
| |`druid.processing.numMergeBuffers`|The number of direct memory buffers available for merging query results. The buffers are sized by `druid.processing.buffer.sizeBytes`. Not all queries need these buffers. By default, no queries use these buffers, so the default pool size is zero.|0| |
There was a problem hiding this comment.
what are some reasonable values to set this?
There was a problem hiding this comment.
However many concurrent groupBy queries you want to be able to run with the strategy from this PR, I guess it depends on your expected query load and how much memory you have.
There was a problem hiding this comment.
can we add some documentation about how to use the new groupBys then? I feel people won't figure out what parameters to set without being told to set them
There was a problem hiding this comment.
Maybe in the groupBy query doc page?
|
@gianm this is failing UT |
|
fixed |
|
Some more benchmarks, this time on a c4.8xlarge with a 31 thread processing pool, varying the number of segments to see how well the concurrent merging scales. It looks like neither the old or new approach scales perfectly, but the new one does scale better. oldFaithful epinephelinae |
|
@gianm benchmarks look great! this is really promising |
There was a problem hiding this comment.
Is this pretty much a copy of the StupidPool code?
There was a problem hiding this comment.
This part is similar, yeah. One difference is that this one adds back to the pool on finalize whereas stupid doesn't.
There was a problem hiding this comment.
Can you put that in a comment here?
1cf5cdc to
3058de2
Compare
There was a problem hiding this comment.
this error msg is really confusing
There was a problem hiding this comment.
will rewrite to "keyBuffer.remaining[%s] != keySize[%s], buffer was the wrong size?!"
There was a problem hiding this comment.
we could say, keySerde.keySize()[%s] is not same as keySerde.toByteBuffer(key).remaining()[%s]
0a96bdb to
064aac2
Compare
|
@fjy added docs |
|
👍 |
There was a problem hiding this comment.
why not simply mergeBufferHolder = mergeBufferPool.take(timeout != null && timeout.longValue() > 0 ? timeout.longValue() : -1); without getting current time?
There was a problem hiding this comment.
good point, switched this, but kept timeoutAt as it is needed for waitForFutureCompletion
|
minor comment #2998 (comment) but 👍 overall |
There was a problem hiding this comment.
IncrementalIndex would have returned this as empty string for nulls,
are you sure we need to convert empty strings back to null here ?
There was a problem hiding this comment.
The tests fail without emptyToNull here.
furthermore, I believe treating empties as nulls is consistent with other areas.
This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in apache#2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.
|
@nishantmonu51 updated PR with changes, thanks for reviewing |
|
👍 |
| computationBufferSize | ||
| ); | ||
|
|
||
| return ByteBuffer.allocateDirect(computationBufferSize); |
There was a problem hiding this comment.
I'm just wondering why using allocateDirect here?
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also
introduces a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.
Both of these are described in more detail in #2987.
There are two goals of this patch:
result sets, faster, with better memory management.
columns, avoiding materialization.
This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
Benchmarks:
master
epinephelinae