Skip to content

Optimization for expressions that hit a single long column.#6599

Merged
gianm merged 5 commits intoapache:masterfrom
gianm:expr-long-cache
Nov 13, 2018
Merged

Optimization for expressions that hit a single long column.#6599
gianm merged 5 commits intoapache:masterfrom
gianm:expr-long-cache

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Nov 11, 2018

This patch adds an LRU cache for expression selectors on top of
a single long column.

There was previously a single-long-input optimization that applied only
to the time column. These have been combined together. Also adds
type-specific value caching to ExprEval, which allowed simplifying
the SingleLongInputCachingExpressionColumnValueSelector code.

Benchmarks (the ones we'd expect to see improvement on are arithmeticOnLong and stringConcatAndCompareOnLong):

master:

Benchmark                                                  (rowsPerSegment)  Mode  Cnt   Score   Error  Units
ExpressionSelectorBenchmark.arithmeticOnLong                        1000000  avgt   30   35.975 ± 1.445  ms/op
ExpressionSelectorBenchmark.stringConcatAndCompareOnLong            1000000  avgt   30  157.388 ± 6.704  ms/op
ExpressionSelectorBenchmark.strlenUsingExpressionAsLong             1000000  avgt   30   15.079 ± 0.074  ms/op
ExpressionSelectorBenchmark.strlenUsingExpressionAsString           1000000  avgt   30   13.146 ± 0.334  ms/op
ExpressionSelectorBenchmark.strlenUsingExtractionFn                 1000000  avgt   30    4.679 ± 0.252  ms/op
ExpressionSelectorBenchmark.timeFloorUsingCursor                    1000000  avgt   30   13.438 ± 0.162  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExpression                1000000  avgt   30   12.797 ± 0.110  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExtractionFn              1000000  avgt   30   11.328 ± 0.221  ms/op

patch:

Benchmark                                                  (rowsPerSegment)  Mode  Cnt   Score   Error  Units
ExpressionSelectorBenchmark.arithmeticOnLong                        1000000  avgt   30  13.807 ± 0.390  ms/op
ExpressionSelectorBenchmark.stringConcatAndCompareOnLong            1000000  avgt   30  13.743 ± 0.213  ms/op
ExpressionSelectorBenchmark.strlenUsingExpressionAsLong             1000000  avgt   30  15.246 ± 0.054  ms/op
ExpressionSelectorBenchmark.strlenUsingExpressionAsString           1000000  avgt   30  12.483 ± 0.495  ms/op
ExpressionSelectorBenchmark.strlenUsingExtractionFn                 1000000  avgt   30   4.666 ± 0.241  ms/op
ExpressionSelectorBenchmark.timeFloorUsingCursor                    1000000  avgt   30  14.617 ± 0.317  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExpression                1000000  avgt   30  13.782 ± 0.497  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExtractionFn              1000000  avgt   30  11.414 ± 0.147  ms/op

There was previously a single-long-input optimization that applied only
to the time column. These have been combined together. Also adds
type-specific value caching to ExprEval, which allowed simplifying
the SingleLongInputCachingExpressionColumnValueSelector code.

public class LruEvalCache
{
private final Long2ObjectLinkedOpenHashMap<ExprEval> m = new Long2ObjectLinkedOpenHashMap<>(CACHE_SIZE);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't size it from the beginning and let it grow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. I changed it.

Copy link
Copy Markdown
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 👍

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Nov 13, 2018

Thanks for the reviews.

@gianm gianm added this to the 0.13.1 milestone Nov 13, 2018
@gianm gianm merged commit 52f6bdc into apache:master Nov 13, 2018
@gianm gianm deleted the expr-long-cache branch November 13, 2018 17:36
gianm added a commit to implydata/druid-public that referenced this pull request Nov 16, 2018
)

* Optimization for expressions that hit a single long column.

There was previously a single-long-input optimization that applied only
to the time column. These have been combined together. Also adds
type-specific value caching to ExprEval, which allowed simplifying
the SingleLongInputCachingExpressionColumnValueSelector code.

* Add more benchmarks.

* Don't use LRU cache for __time.

* Simplify a bit.

* Let the cache grow.
gianm added a commit to implydata/druid-public that referenced this pull request Nov 16, 2018
)

* Optimization for expressions that hit a single long column.

There was previously a single-long-input optimization that applied only
to the time column. These have been combined together. Also adds
type-specific value caching to ExprEval, which allowed simplifying
the SingleLongInputCachingExpressionColumnValueSelector code.

* Add more benchmarks.

* Don't use LRU cache for __time.

* Simplify a bit.

* Let the cache grow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants