Skip to content

update OffheapIncrementalIndex to put facts off-heap #2847

Closed
himanshug wants to merge 1 commit intoapache:masterfrom
himanshug:off_heap_facts
Closed

update OffheapIncrementalIndex to put facts off-heap #2847
himanshug wants to merge 1 commit intoapache:masterfrom
himanshug:off_heap_facts

Conversation

@himanshug
Copy link
Copy Markdown
Contributor

and also let user define max limit in terms of off-heap size instead of max number of rows

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Apr 18, 2016

@himanshug what are the performance implications of this? last time we tried mapdb we had a lot of problems with serde and it decreased ingestion rates substantially

Also, please remember to submit proposals for these types of changes

@himanshug himanshug force-pushed the off_heap_facts branch 2 times, most recently from de9e133 to 3a670a9 Compare April 18, 2016 04:37
@himanshug
Copy link
Copy Markdown
Contributor Author

@fjy OffheapIncrementalIndex is not available for indexing[yet], but only optionally used in GroupBy queries which tries to optimize things by not sorting results in IncrementalIndex when possible (see #2571 ). with that, here are some numbers based on https://github.com/druid-io/druid/blob/master/benchmarks/src/main/java/io/druid/benchmark/IncrementalIndexAddRowsBenchmark.java .

OffheapIncrementalIndex with sortFacts = TRUE
Benchmark Mode Cnt Score Error Units
OffheapIncrementalIndexRowsBenchmark.normalFloats avgt 20 45.928 ± 1.221 us/op
OffheapIncrementalIndexRowsBenchmark.normalLongs avgt 20 46.847 ± 2.234 us/op
OffheapIncrementalIndexRowsBenchmark.normalStrings avgt 20 51.218 ± 1.358 us/op

OffheapIncrementalIndex with sortFacts = FALSE
Benchmark Mode Cnt Score Error Units
OffheapIncrementalIndexRowsBenchmark.normalFloats avgt 3 15.450 ± 9.408 us/op
OffheapIncrementalIndexRowsBenchmark.normalLongs avgt 3 13.112 ± 1.487 us/op
OffheapIncrementalIndexRowsBenchmark.normalStrings avgt 3 22.428 ± 24.327 us/op

OnheapIncrementalIndex with sortFacts = TRUE
Benchmark Mode Cnt Score Error Units
IncrementalIndexAddRowsBenchmark.normalFloats avgt 200 15.176 ± 0.625 us/op
IncrementalIndexAddRowsBenchmark.normalLongs avgt 200 14.611 ± 0.570 us/op
IncrementalIndexAddRowsBenchmark.normalStrings avgt 200 21.284 ± 0.394 us/op

note that in groupBy query processing sortFacts is TRUE only once in the very end before returning results to end user and with sortFacts = FALSE off-heap performance is not that bad.

this impl of off-heap is different from previous one as it does not serde any strings and keeps dimension dictionary on-heap wherease old impl tried to put everything off-heap.
note that on-heap version can not be created without very large heaps but off-heap version (even if 3x slower with sortFacts being true) allows larger groupBy queries with ability to upper bound on off-heap size.

@nishantmonu51 nishantmonu51 self-assigned this Apr 19, 2016
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should default to maxOffHeapSize always and ignore maxResults here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste error, fixed, it is only supposed to rely on max off-heap limit and not on maxRows.

…r define max limit in terms of size instead of number of rows
@himanshug
Copy link
Copy Markdown
Contributor Author

@fjy @nishantmonu51 ?

@nishantmonu51
Copy link
Copy Markdown
Member

all my comments are addressed, 👍

@himanshug
Copy link
Copy Markdown
Contributor Author

i guess its not needed after the new groupBy query strategy.

@himanshug himanshug closed this Aug 5, 2016
@himanshug himanshug deleted the off_heap_facts branch January 3, 2017 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants