Fine grained buffer management for groupby#3863
Conversation
|
@jihoonson lots of merge conflicts due to some other PRs getting merged |
|
@fjy, thanks. I fixed conflicts. |
|
I've reviewed the core changes in Can you add a description to the PR that this changes the order of nested group by buffer acquisition such that the inner queries acquire buffers first as they are being processed, and also limits the number of merge buffers needed for nested queries to 2, regardless of the nesting depth? #3806 added a doc comment in |
|
@jon-wei, thanks for your review. I updated the PR description. |
There was a problem hiding this comment.
Sequence changes look okay to me
Re: the deadlock in #3819, couldn't that still happen with this PR if a query timeout is not set? e.g., two nested group by queries running at the same time, with 2 merge buffers available, they both get a buffer for the inner query, but then neither can get the second buffer needed?
|
|
||
| @Override | ||
| public <OutType> Yielder<OutType> toYielder( | ||
| Supplier<OutType> initValue, YieldingAccumulator<OutType, T> accumulator |
There was a problem hiding this comment.
suggest renaming initValue in the methods with supplier parameters to initValueSupplier to make it easier to distinguish the methods
|
Also, can you point me to the change related to the setting a blocking pool timeout? I wasn't able to find that while reviewing |
|
@jon-wei, thanks. I addressed your comments. |
|
thanks, 👍 |
gianm
left a comment
There was a problem hiding this comment.
@jihoonson, re: the comment on Sequence.accumulate, do you think it makes sense to do this without modifying the Sequence interface?
Please also add a groupBy query test verifying that no more than 2 buffers are needed for a deeply nested groupBy. I recognize that there's a SQL test that hits this, but the SQL tests shouldn't be relied on for testing query engine specific behaviors.
| * | ||
| * @return accumulated value. | ||
| */ | ||
| <OutType> OutType accumulate(Supplier<OutType> initValSupplier, Accumulator<OutType, T> accumulator); |
There was a problem hiding this comment.
I don't understand the point of having a lazy initValue option to accumulate. Accumulate is supposed to do all of its work immediately, so why would the initValue need to be lazy?
There was a problem hiding this comment.
Oh, I see, it's so things like BaseSequence can defer creation of initValue until accumulation actually starts (after the iterator is made).
In that case, instead of modifying the Sequence interface, what do you think about moving buffer-taking from out of iterator-making and into the Accumulator in GroupByRowProcessor?
There was a problem hiding this comment.
@gianm yes, I also considered that way. To do so, I think it is inevitable to check the grouper's initialization state in the accumulate() method. I would like to avoid checking it for every accumulate() call. Another way may be to add an initialization method to Accumulator, but I don't think this is a good way.
If there are some reasonable reasons, I can consider moving buffer-taking to Accumulator again. Would you share if you have?
There was a problem hiding this comment.
The main reasons I'm asking these questions are:
- the contracts of the new methods confused me: it took me a bit to understand why there was both a lazy-init and non-lazy-init version of accumulate, given that both are expected to init the value and then fully exhaust the sequence before returning.
- increasing size of the Sequence interface makes usage and implementation more complex, so I want to make sure to consider other options first.
I agree about trying to avoid an init for Accumulator.
I bet the overhead of checking for initialization in each call to accumulate() is going to be unmeasurably low. There's a lot of stuff going on per row (reading values, hashing, table lookup, aggregation).
What do you think is best?
There was a problem hiding this comment.
I understand what you're concerned with. I agree on it may confuse users. How about adding afterMake() method to BaseSequence.IteratorMaker?
There was a problem hiding this comment.
That sounds like a good approach, along with some javadoc that the purpose of separating afterMake() from make() is that it allows resource allocation to be deferred until iteration actually begins, which matters if Sequences are nested inside each other.
There was a problem hiding this comment.
Hmm, after thinking about it more maybe I changed my mind… I think afterMake() doesn't make much sense for this since the resource allocation is tied to accumulation, not iteration. So it should really be associated with the Accumulator and not the IteratorMaker.
I'd be fine with either of these:
-
Stick with your original idea of accumulate taking a Supplier of the initValue, and update the javadocs to clarify that the purpose is to defer initialization of initValue until accumulation actually begins. Actually this seems like behavior we'd always want (why not?) and so I'd also consider making this the only API rather than having both a supplier and nonsupplier one. Callers that don't care about deferring initialization can wrap the thing in
Suppliers.ofInstance. -
Just keep things simple and have the GroupBy Accumulator check for inited-ness on each call to
accumulate(), I think the perf overhead here won't be bad.
Sorry for the back and forth, just wrapping my head around what is the best and clearest way of doing this.
There was a problem hiding this comment.
Right. afterMake() is not a good option.
I think making an iterator in BaseSequence.accumulate() causes this problem. Sequence.accumulate() method is supposed to start to do only its accumulation work immediately, but it actually accompanies a side-effect, initialization and cleanup of an iterator. It looks for convenience of initializing and cleaning up the iterator, but I'm not sure this is a good design choice. Maybe WrappingSequence added recently (in #3693) looks more proper for this purpose, but I don't have any idea how to move iterator initialization and cleanup out of BaseSequence without a big change.
I feel that my idea is little tricky to work around the problem of tightly coupled iterator initialization and accumulation. Let's do it simply for now.
|
|
||
| /** | ||
| * Return an Yielder for accumulated sequence. | ||
| * The {@code initValSupplier} provides an way for lazy evaluation of the initial value. |
There was a problem hiding this comment.
If this method guarantees lazy evaluation of the initial value, then this javadoc should be stronger and say that.
| * | ||
| * @see Yielder | ||
| */ | ||
| <OutType> Yielder<OutType> toYielder(Supplier<OutType> initValSupplier, YieldingAccumulator<OutType, T> accumulator); |
There was a problem hiding this comment.
The behavior of this method is different enough from toYielder that maybe a new name is warranted, like toYielderLazy.
| { | ||
| @Override | ||
| public <OutType> Yielder<OutType> toYielder(OutType initValue, YieldingAccumulator<OutType, T> accumulator) | ||
| public <OutType> Yielder<OutType> toYielder(OutType initValSupplier, YieldingAccumulator<OutType, T> accumulator) |
There was a problem hiding this comment.
Rename gone wild? This isn't a Supplier.
| This merge buffer is immediately released once they are not used anymore during the query processing, | ||
| but two or more concurrent nested groupBys can potentially lead to deadlocks since the merge buffers are limited in number | ||
| and are acquired one-by-one instead of a complete set. At this time we recommend that you avoid too many concurrent | ||
| execution of groupBys with the v2 strategy. |
There was a problem hiding this comment.
At this time we recommend that you avoid too many concurrent execution of groupBys with the v2 strategy.
This is stronger than what was there before… probably too strong. It's fine to execute as many concurrent non-deeply-nested groupBys as you want. Even double-nested groupBys are fine (groupBy -> groupBy -> table). Could you please adjust the wording? I don't want to scare people too much, just an appropriate amount.
There was a problem hiding this comment.
Thanks. I'll improve the doc.
| that you avoid deeply-nested groupBys with the v2 strategy. Doubly-nested groupBys (groupBy -> groupBy -> table) are | ||
| safe and do not suffer from this issue. If you like, you can forbid deeper nesting by setting | ||
| `druid.sql.planner.maxQueryCount = 2`. | ||
| For executing nested groupBys with the v2 groupBy strategy, you need to set `druid.processing.numMergeBuffers` to at least 2. |
There was a problem hiding this comment.
Similar comment to groupbyquery.md
|
@gianm thanks for your review. I'll add tests to check the number of merge buffers used for group-by queries. |
- Revert Sequence - Add isInitialized() to Grouper - Initialize the grouper in RowBasedGrouperHelper.Accumulator - Simple refactoring RowBasedGrouperHelper.Accumulator - Add tests for checking the number of used merge buffers - Improve docs
…grained-buffer-management-for-groupby
|
@gianm thanks. I addressed your comments. Additionally, I did a simple refactoring for RowBasedGrouperHelper. |
gianm
left a comment
There was a problem hiding this comment.
@jihoonson, code looks good.
Were the RowBasedGrouperHelper refactorings meant to improve performance or readability? Did you do benchmarks of groupBy before and after the refactorings to confirm performance is equal or better?
| Additionally, the "v2" strategy uses merging buffers for merging. It is currently the only query implementation that | ||
| does so. By default, Druid is configured without any merging buffer pool, so to use the "v2" strategy you must also | ||
| set `druid.processing.numMergeBuffers` to some non-zero number. | ||
| set `druid.processing.numMergeBuffers` to some non-zero number. Furthermore, if you want to execute deeply nested gropuBys, |
|
@gianm just for readability. I simply run GroupByBenchmark and got the same result. |
gianm
left a comment
There was a problem hiding this comment.
thx @jihoonson. 👍 after travis
In this PR, I newly added two methods to
Sequencefor lazy evaluation of initial value of accumulation.After this patch, the buffer acquisition order for nested group-by query execution is changed from outer -> inner to inner -> outer. In addition, once the intermediate result of inner queries aren't needed anymore, the buffer holding it is released immediately. As a result, nested group-by queries always need at most 2 buffers.
#3693 will cause some conflicts, and I believe that some changes of #3693 should also be applied to this patch. So, please review it first.
This change is