Skip to content

GroupBy: When possible, sort results on read rather than continuously during insert.#2871

Closed
gianm wants to merge 1 commit intoapache:masterfrom
gianm:sort-after-indexing
Closed

GroupBy: When possible, sort results on read rather than continuously during insert.#2871
gianm wants to merge 1 commit intoapache:masterfrom
gianm:sort-after-indexing

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Apr 21, 2016

Leans on the logic from #2571 with respect to deciding when to sort and when not to sort.

This does involve somewhat more memory use (need to copy the facts to a list and then sort the list). But as the list is just references to already-existing Map.Entry objects, the overhead shouldn't be too bad.

Before:

Result "mergeQueryableIndex":
  7899182.389 ±(99.9%) 490250.490 us/op [Average]
  (min, avg, max) = (7346007.865, 7899182.389, 9206588.959), stdev = 564573.193
  CI (99.9%): [7408931.899, 8389432.879] (assumes normal distribution)


# Run complete. Total time: 00:05:35

Benchmark                             Mode  Cnt        Score        Error  Units
GroupByBenchmark.mergeQueryableIndex  avgt   20  7899182.389 ± 490250.490  us/op

After:

Result "mergeQueryableIndex":
  5952538.010 ±(99.9%) 365008.858 us/op [Average]
  (min, avg, max) = (5645155.827, 5952538.010, 7256515.626), stdev = 420344.744
  CI (99.9%): [5587529.153, 6317546.868] (assumes normal distribution)


# Run complete. Total time: 00:04:25

Benchmark                             Mode  Cnt        Score        Error  Units
GroupByBenchmark.mergeQueryableIndex  avgt   20  5952538.010 ± 365008.858  us/op

Depends on #2870

… during insert.

Leans on the logic from apache#2571 with respect to deciding when to sort and when not to sort.
@gianm gianm added this to the 0.9.1 milestone Apr 21, 2016
final Comparator<TimeAndDims> comparator = descending
? Ordering.from(dimsComparator()).reverse()
: dimsComparator();
List<Map.Entry<TimeAndDims, Integer>> factsList = Lists.newArrayList(getFacts().entrySet());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be ok for on-heap index but will unexpectedly put everything on-heap even for any off-heap implementation of IncrementalIndex (for example #2847 )
one option would be to update QueryToolChest to not exercise this path when off-heap merging was requested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that for the current offheap impl this is fine (even the offheap one still has an onheap facts map) but things are different if the facts map is offheap.

I am ok waiting on this PR until #2847 is settled and then figuring out what is best.

@fjy fjy modified the milestones: 0.9.2, 0.9.1 May 3, 2016
@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented May 12, 2016

closing for now.

@gianm gianm closed this May 12, 2016
@gianm gianm removed this from the 0.9.2 milestone Aug 5, 2016
@gianm gianm deleted the sort-after-indexing branch September 23, 2022 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants