new OffHeapIncrementalIndex that does only aggregations off-heap#2325
new OffHeapIncrementalIndex that does only aggregations off-heap#2325drcrallen merged 4 commits intoapache:masterfrom
Conversation
|
@himanshug have you thought at all about extending this PR and enabling on disk computations with groupBys? |
|
@fjy This is a self-sufficient step that has immediate gains without introducing any major query latency. I'm thinking about on-disk, but it is a challenge to do that without adding significant query latency... currently, I'm in the stage of brainstorming on that in the background. That said, current PR might enable larger groupBys than before. |
|
@himanshug @fjy , does this need to be a blocker for 0.9.0 ? |
|
@himanshug Have you guys ever tried enabling swap and using more direct byte buffers than available memory? |
There was a problem hiding this comment.
can this just move to a protected class instead of duplicating it?
|
@drcrallen thanks for review, I will probably rework this a bit after #2085 is merged. |
|
also, I do not want to block 0.9.0 RC because of this so setting the milestone for this to be 0.9.1 |
1101519 to
660f590
Compare
|
@drcrallen all review comments addressed. |
|
👍 |
There was a problem hiding this comment.
That's not quite true. If I have a max size of 10 and I request 100 buffers, I'll still get 100 buffers, but I'll only return 10 for later use. Then if I request 100 I'll have 10 queued and 90 new ones. And when they are done they will return 10 and drop 90 to GC.
There was a problem hiding this comment.
I'm at a loss right now on the proper way to document that behavior. Maybe just in the stupidPool javadoc?
There was a problem hiding this comment.
yes, you are right and this is just size of the cache. do you want to suggest better messaging?
There was a problem hiding this comment.
how about, "processing buffer pool caches the buffers for later use, this is the maximum count cache will grow to. however pool can create more buffers than it can cache."
There was a problem hiding this comment.
ok updated doc to be more explicit.
There was a problem hiding this comment.
FYI there's io.druid.segment.CloserRule for just this kind of use
There was a problem hiding this comment.
that looks nicer... changed to use CloserRule.
660f590 to
9fe1b28
Compare
|
@drcrallen addressed comments. |
|
Cool 👍 with 4 commits (after travis) @himanshug would you be willing to update this thread with performance numbers between onheap and offheap after your internal testing? (I'm assuming that would come after this is merged) |
|
@drcrallen thanks, will update with test results, thatz why I haven't made the off-heap default but opt-in. |
new OffHeapIncrementalIndex that does only aggregations off-heap
Our groupBy queries contain many metric columns of type thetaSketch which are big and lead to significant GC pressure on the process, this PR enables doing metric aggregation for group by queries completely off-heap.
changes introduced:
fixes #2297
I understand that this PR would have conflicts with some other open PRs. I will rebase and fix merge conflicts as and when other PRs get merged.