update OffheapIncrementalIndex to put facts off-heap #2847
update OffheapIncrementalIndex to put facts off-heap #2847himanshug wants to merge 1 commit intoapache:masterfrom
Conversation
|
@himanshug what are the performance implications of this? last time we tried mapdb we had a lot of problems with serde and it decreased ingestion rates substantially Also, please remember to submit proposals for these types of changes |
de9e133 to
3a670a9
Compare
|
@fjy OffheapIncrementalIndex is not available for indexing[yet], but only optionally used in GroupBy queries which tries to optimize things by not sorting results in IncrementalIndex when possible (see #2571 ). with that, here are some numbers based on https://github.com/druid-io/druid/blob/master/benchmarks/src/main/java/io/druid/benchmark/IncrementalIndexAddRowsBenchmark.java . OffheapIncrementalIndex with sortFacts = TRUE OffheapIncrementalIndex with sortFacts = FALSE OnheapIncrementalIndex with sortFacts = TRUE note that in groupBy query processing sortFacts is TRUE only once in the very end before returning results to end user and with sortFacts = FALSE off-heap performance is not that bad. this impl of off-heap is different from previous one as it does not serde any strings and keeps dimension dictionary on-heap wherease old impl tried to put everything off-heap. |
There was a problem hiding this comment.
I wonder if we should default to maxOffHeapSize always and ignore maxResults here ?
There was a problem hiding this comment.
copy/paste error, fixed, it is only supposed to rely on max off-heap limit and not on maxRows.
bb9a2f0 to
70f6fff
Compare
…r define max limit in terms of size instead of number of rows
|
all my comments are addressed, 👍 |
|
i guess its not needed after the new groupBy query strategy. |
and also let user define max limit in terms of off-heap size instead of max number of rows