Skip to content

avoid sort while doing groupBy merging when possible#2571

Merged
fjy merged 2 commits intoapache:masterfrom
himanshug:gp_by_avoid_sort
Mar 14, 2016
Merged

avoid sort while doing groupBy merging when possible#2571
fjy merged 2 commits intoapache:masterfrom
himanshug:gp_by_avoid_sort

Conversation

@himanshug
Copy link
Copy Markdown
Contributor

while doing #2566 , I realised, most of the time when we are using IncrementalIndex for groupBy merging, it is not necessary to maintain sorted map.
Since unsorted hashmaps are more efficient in both speed and memory usage, this PR changes groupBy processing to avoid sorting while merging whenever possible.

Note that this has no user impact in terms of functionality as non-sorted index would be used only during intermediate merging.

also, a next step to #2325 . I am trying to push facts map off-heap as well and turns out it is again more efficient to maintain off-heap hash map than sorted maps.

@himanshug himanshug added this to the 0.9.1 milestone Mar 1, 2016
@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Mar 1, 2016

@himanshug any benchmarks ?

@himanshug
Copy link
Copy Markdown
Contributor Author

@b-slim it can only improve performance for various cases. improvement will largely depend upon the size of intermediate groupBy result sets. More the number of dimensions and rows to be put in IncrementalIndex better the performace would be.

various benchmarks for sorted vs un-sorted maps can be seen at http://www.mapdb.org/benchmarks.html . they have direct relevance here. performance difference in "Random Updates" should be noted.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Mar 1, 2016

@himanshug thanks, just to clarify, my question was for the sake of learning. In fact i thought that by contract queryRunner returns an order-timed sequence as indicated here. And based on that thought that mergeRunners can use that ordered property to do the merge.

@himanshug
Copy link
Copy Markdown
Contributor Author

@b-slim gpBy merging happens by storing whole result sets in IncrementalIndex and which does not depend upon ordered input.

@himanshug himanshug force-pushed the gp_by_avoid_sort branch 2 times, most recently from 4488aad to 86b2f72 Compare March 1, 2016 20:53
@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Mar 10, 2016

@himanshug Can you add some comments to GroupByQueryQueryToolChest that calls attention to the optimization being made?

👍 aside from that

@himanshug
Copy link
Copy Markdown
Contributor Author

@jon-wei added comment.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Mar 14, 2016

👍

fjy added a commit that referenced this pull request Mar 14, 2016
avoid sort while doing groupBy merging when possible
@fjy fjy merged commit 06813b5 into apache:master Mar 14, 2016
@navis
Copy link
Copy Markdown
Contributor

navis commented Mar 16, 2016

@fjy I think #2142 is related to this. Could you check it whenever time allows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants