Skip to content

groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.#3740

Merged
fjy merged 2 commits intoapache:masterfrom
gianm:groupby-improvements
Dec 7, 2016
Merged

groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.#3740
fjy merged 2 commits intoapache:masterfrom
gianm:groupby-improvements

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Dec 5, 2016

~30% improvement on query benchmarks that include serde time (queryMultiQueryableIndexWithSerde).

Specifically:

  • Remove timestamp from RowBasedKey when not needed.
  • Set timestamp to null in MapBasedRows that are not part of the final merge.
  • Added two new benchmarks, queryMultiQueryableIndexWithSerde (simulates serde between historical and broker) and queryMultiQueryableIndexWithSpilling (includes spilling to disk) that show the improvement here, which is mostly from having less serde work to do.
groupby-improvements

Benchmark                                              (defaultStrategy)  (initialBuckets)  (numProcessingThreads)  (numSegments)  (queryGranularity)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt        Score       Error  Units
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 all            100000           basic.A  avgt   30   373040.126 ±  6687.778  us/op
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 day            100000           basic.A  avgt   30   704732.206 ±  9292.572  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 all            100000           basic.A  avgt   30   486083.016 ±  4252.756  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 day            100000           basic.A  avgt   30  1028039.357 ± 11060.639  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 all            100000           basic.A  avgt   30   444659.485 ±  5380.572  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 day            100000           basic.A  avgt   30   532730.064 ±  6565.590  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 all            100000           basic.A  avgt   30    75440.164 ±  1382.679  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 day            100000           basic.A  avgt   30    76651.784 ±  1288.932  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 all            100000           basic.A  avgt   30    37673.145 ±   689.344  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 day            100000           basic.A  avgt   30    40981.706 ±  1276.147  us/op

master

Benchmark                                              (defaultStrategy)  (initialBuckets)  (numProcessingThreads)  (numSegments)  (queryGranularity)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt        Score       Error  Units
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 all            100000           basic.A  avgt   30   366566.054 ±  5324.826  us/op
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 day            100000           basic.A  avgt   30   704091.741 ±  7608.095  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 all            100000           basic.A  avgt   30   676987.839 ± 10675.159  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 day            100000           basic.A  avgt   30  1015616.319 ± 12242.756  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 all            100000           basic.A  avgt   30   487088.000 ±  6308.555  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 day            100000           basic.A  avgt   30   536788.568 ± 10157.553  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 all            100000           basic.A  avgt   30    75694.447 ±  1499.463  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 day            100000           basic.A  avgt   30    76950.858 ±  1223.638  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 all            100000           basic.A  avgt   30    37716.549 ±   938.930  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 day            100000           basic.A  avgt   30    38152.708 ±   548.103  us/op

@gianm gianm added this to the 0.9.3 milestone Dec 5, 2016
@gianm gianm assigned fjy and jon-wei Dec 5, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Dec 5, 2016

👍

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Dec 6, 2016

@gianm there's some conflicts

gianm added 2 commits December 6, 2016 15:17
…t for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
@gianm gianm force-pushed the groupby-improvements branch from d7a4816 to f5175c2 Compare December 6, 2016 23:37
@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Dec 6, 2016

@fjy @jon-wei updated

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Dec 6, 2016

👍 after travis

@fjy fjy merged commit b1bac9f into apache:master Dec 7, 2016
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
…t for the final merge. (apache#3740)

* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.

Also use more iterations.

* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
@gianm gianm deleted the groupby-improvements branch March 1, 2017 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants