avoid sort while doing groupBy merging when possible#2571
Conversation
|
@himanshug any benchmarks ? |
|
@b-slim it can only improve performance for various cases. improvement will largely depend upon the size of intermediate groupBy result sets. More the number of dimensions and rows to be put in IncrementalIndex better the performace would be. various benchmarks for sorted vs un-sorted maps can be seen at http://www.mapdb.org/benchmarks.html . they have direct relevance here. performance difference in "Random Updates" should be noted. |
|
@himanshug thanks, just to clarify, my question was for the sake of learning. In fact i thought that by contract |
|
@b-slim gpBy merging happens by storing whole result sets in IncrementalIndex and which does not depend upon ordered input. |
4488aad to
86b2f72
Compare
|
@himanshug Can you add some comments to GroupByQueryQueryToolChest that calls attention to the optimization being made? 👍 aside from that |
…d in groupBy merging to improve performance
86b2f72 to
dc0214b
Compare
|
@jon-wei added comment. |
|
👍 |
avoid sort while doing groupBy merging when possible
… during insert. Leans on the logic from apache#2571 with respect to deciding when to sort and when not to sort.
while doing #2566 , I realised, most of the time when we are using IncrementalIndex for groupBy merging, it is not necessary to maintain sorted map.
Since unsorted hashmaps are more efficient in both speed and memory usage, this PR changes groupBy processing to avoid sorting while merging whenever possible.
Note that this has no user impact in terms of functionality as non-sorted index would be used only during intermediate merging.
also, a next step to #2325 . I am trying to push facts map off-heap as well and turns out it is again more efficient to maintain off-heap hash map than sorted maps.