Skip to content

re-use expression vector evaluation results for the same offset in expression vector selectors#10614

Merged
clintropolis merged 9 commits intoapache:masterfrom
clintropolis:vector-expr-selector-dont-recompute-for-same-vector
Jan 13, 2021
Merged

re-use expression vector evaluation results for the same offset in expression vector selectors#10614
clintropolis merged 9 commits intoapache:masterfrom
clintropolis:vector-expr-selector-dont-recompute-for-same-vector

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Nov 28, 2020

Description

This PR improves ExpressionVectorValueSelector and ExpressionVectorObjectSelector to re-use evaluation results for the same selector. While I haven't measured this at all, it makes a relatively big difference when druid.generic.useDefaultValueForNull=false just because the previous behavior meant that the expression was evaluated twice for numeric primitive expressions: once to get the value vector and again to get the null vector. I was aware of this issue but didn't get to it in the first round of additions I did for the vectorized expression stuffs and sort of forgot about it until now 😅 .

It works by expanding Expr.VectorInputBinding to further mimic ReadableVectorOffset:

    /**
     * Returns an integer that uniquely identifies the current position of the underlying vector offset, if this
     * binding is backed by a segment. This is useful for caching: it is safe to assume nothing has changed in the
     * offset so long as the id remains the same. See also: ReadableVectorOffset (in druid-processing)
     */
    int getCurrentVectorId();

which allows slightly modifying the backing of the ExpressionVectorInputBinding implementation to be based on ReadableVectorOffset which has a getId method, instead of VectorSizeInspector which only offers current and max vector size.

The size inspector provided by VectorColumnSelectorFactory.getVectorSizeInspector was already a ReadableVectorOffset, but to avoid exposing the entire offset interface which isn't going to be useful for most callers, I have added a new interface, ReadableVectorInspector, and pushed getId to it, reworked the method into getReadableVectorInspector.

With an identifier in place, the selectors can cache evaluation results and re-use them as long as the underlying offset does not change.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • ExpressionVectorValueSelector
  • ExpressionVectorObjectSelector
  • ExpressionVectorInputBinding
  • VectorColumnSelectorFactory
  • ReadableVectorInspector

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change LGTM, but I'm wondering if we can add some unit tests that verify the returned result is from cache. Maybe it could be easier to write such tests using mocks.

@jihoonson
Copy link
Copy Markdown
Contributor

LGTM. Thanks @clintropolis.

@clintropolis
Copy link
Copy Markdown
Member Author

thanks for review @jihoonson

@clintropolis clintropolis merged commit 9362dc7 into apache:master Jan 13, 2021
@clintropolis clintropolis deleted the vector-expr-selector-dont-recompute-for-same-vector branch January 13, 2021 20:45
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
…pression vector selectors (apache#10614)

* cache expression selector results by associating vector expression bindings to underlying vector offset

* better coverage, fix floats

* style

* stupid bot

* stupid me

* more test

* intellij threw me under the bus when it generated those junit methods

* narrow interface instead of passing around offset
@jihoonson jihoonson added this to the 0.21.0 milestone Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants