Skip to content

support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions#10499

Merged
clintropolis merged 6 commits intoapache:masterfrom
clintropolis:vector-nil-expr
Oct 27, 2020
Merged

support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions#10499
clintropolis merged 6 commits intoapache:masterfrom
clintropolis:vector-nil-expr

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Oct 9, 2020

Description

This PR adds support for vectorizing expressions in cases where inputs are missing, using either null or default values as inputs, depending on the value of druid.generic.useDefaultValueForNull.

This PR also makes non-vectorized expression type handling a bit more consistent across different types of expressions. Major changes here include operator expressions will now try to preserve the type when one of the arguments is null instead of always producing double values, and math functions now follow logic similar to the operators.

Tagging PR as release notes/incompatible because the changes cause some expressions to output slightly different results (typically longs instead of doubles). Examples:
longColumn + nonExistentColumn -> longColumn + 0L instead of (double) longColumn + 0.0

and math functions will produce output from non-existent inputs in default mode instead of always producing zeros:
max(longColumn, nonExistentColumn) -> max(longColumn, 0L)


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* non-vectorized, per-row type detection. In this mode, null values are {@link ExprType#STRING} typed, despite
* potentially coming from an underlying numeric column. This method is not well suited for array handling
*/
public static ExprType autoDetect(ExprEval result, ExprEval other)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do result and other mean?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah those are not very good variable names, eval and otherEval probably would've been better

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a part to the javadoc about when the input types would not be trustable (is it because of the string nulls from numeric columns, or are there other cases)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a part to the javadoc about when the input types would not be trustable (is it because of the string nulls from numeric columns, or are there other cases)?

I have this blurb:

In this mode, null values are {@link ExprType#STRING} typed, despite potentially coming from an underlying numeric column

but I have added the missing column case too

}

// non-vectorized expressions
if (type == ExprType.STRING) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the javadoc for this method with the mixed type case and the reasoning behind preferring the non-string type?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this check isn't actually necessary anymore, it was from an intermediary state this branch was in prior to adding the autoDetect function when I was instead trying to use this in the eval of operator expressions.

* non-vectorized, per-row type detection. In this mode, null values are {@link ExprType#STRING} typed, despite
* potentially coming from an underlying numeric column. This method is not well suited for array handling
*/
public static ExprType autoDetect(ExprEval result, ExprEval other)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a part to the javadoc about when the input types would not be trustable (is it because of the string nulls from numeric columns, or are there other cases)?

// only set output type
if (ExpressionPlan.none(traits, ExpressionPlan.Trait.UNKNOWN_INPUTS, ExpressionPlan.Trait.NEEDS_APPLIED)) {
// only set output type if we are pretty confident about input types
final boolean shoulComputeOutput = ExpressionPlan.none(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoulComputeOutput -> shouldComputeOutput

@clintropolis clintropolis merged commit d0821de into apache:master Oct 27, 2020
@clintropolis clintropolis deleted the vector-nil-expr branch October 27, 2020 02:55
@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
…nsistent type handling for non-vectorized expressions (apache#10499)

* support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions

* inspector

* changes

* more test

* clean
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants