support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions#10499
Conversation
…nsistent type handling for non-vectorized expressions
| * non-vectorized, per-row type detection. In this mode, null values are {@link ExprType#STRING} typed, despite | ||
| * potentially coming from an underlying numeric column. This method is not well suited for array handling | ||
| */ | ||
| public static ExprType autoDetect(ExprEval result, ExprEval other) |
There was a problem hiding this comment.
What do result and other mean?
There was a problem hiding this comment.
ah those are not very good variable names, eval and otherEval probably would've been better
There was a problem hiding this comment.
Can you add a part to the javadoc about when the input types would not be trustable (is it because of the string nulls from numeric columns, or are there other cases)?
There was a problem hiding this comment.
Can you add a part to the javadoc about when the input types would not be trustable (is it because of the string nulls from numeric columns, or are there other cases)?
I have this blurb:
In this mode, null values are {@link ExprType#STRING} typed, despite potentially coming from an underlying numeric column
but I have added the missing column case too
| } | ||
|
|
||
| // non-vectorized expressions | ||
| if (type == ExprType.STRING) { |
There was a problem hiding this comment.
Can you update the javadoc for this method with the mixed type case and the reasoning behind preferring the non-string type?
There was a problem hiding this comment.
Oh, this check isn't actually necessary anymore, it was from an intermediary state this branch was in prior to adding the autoDetect function when I was instead trying to use this in the eval of operator expressions.
| * non-vectorized, per-row type detection. In this mode, null values are {@link ExprType#STRING} typed, despite | ||
| * potentially coming from an underlying numeric column. This method is not well suited for array handling | ||
| */ | ||
| public static ExprType autoDetect(ExprEval result, ExprEval other) |
There was a problem hiding this comment.
Can you add a part to the javadoc about when the input types would not be trustable (is it because of the string nulls from numeric columns, or are there other cases)?
| // only set output type | ||
| if (ExpressionPlan.none(traits, ExpressionPlan.Trait.UNKNOWN_INPUTS, ExpressionPlan.Trait.NEEDS_APPLIED)) { | ||
| // only set output type if we are pretty confident about input types | ||
| final boolean shoulComputeOutput = ExpressionPlan.none( |
There was a problem hiding this comment.
shoulComputeOutput -> shouldComputeOutput
…nsistent type handling for non-vectorized expressions (apache#10499) * support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions * inspector * changes * more test * clean
Description
This PR adds support for vectorizing expressions in cases where inputs are missing, using either null or default values as inputs, depending on the value of
druid.generic.useDefaultValueForNull.This PR also makes non-vectorized expression type handling a bit more consistent across different types of expressions. Major changes here include operator expressions will now try to preserve the type when one of the arguments is null instead of always producing double values, and math functions now follow logic similar to the operators.
Tagging PR as release notes/incompatible because the changes cause some expressions to output slightly different results (typically longs instead of doubles). Examples:
longColumn + nonExistentColumn -> longColumn + 0Linstead of(double) longColumn + 0.0and math functions will produce output from non-existent inputs in default mode instead of always producing zeros:
max(longColumn, nonExistentColumn) -> max(longColumn, 0L)This PR has: