move numeric null value coercion out of expression processing engine#13809
move numeric null value coercion out of expression processing engine#13809clintropolis merged 11 commits intoapache:masterfrom
Conversation
… naming with Function and ApplyFunction; javadocs for some stuff
| // null values can (but not always) appear as string typed | ||
| // so type isn't necessarily string unless value is non-null | ||
| if (NullHandling.sqlCompatible() || (Types.is(leftVal.type(), ExprType.STRING) && leftVal.value() != null)) { |
There was a problem hiding this comment.
I feel like I'm not understanding this logic, but it seems like previously you were doing null checks if it was a String. Now, you will only do null checks if it is a String and the left-hand side is not null? This seems to be trying to coerce for boolean operations, if the left side is numerical but also not null, it seems like we should have semantics that are like if == 0 then false, else true. But, I think that this will just ignore it if it's not String?
There was a problem hiding this comment.
I initially made this change before i added valueOrDefault, because in a situation where expressions are evaluated without the type information to tell us that a null is a long (such as ingestion time transforms), an expression like null && 1 with the left side is a null value it will be evaluated as a STRING typed expression, and follow the string/sql compatible rules, in default value mode meaning value() spits out null. if the expression was 1 && null however, it would be evaluated with long rules, and spit out 0 in default value mode.
None of this really matters now though, since callers should be using valueOrDefault() after the changes in this patch, instead of value(), which produces equivalent results, so I will switch it back and switch the tests around null constants to use that.
| public final int asInt() | ||
| { | ||
| if (value == null) { | ||
| assert NullHandling.replaceWithDefault(); |
There was a problem hiding this comment.
Do we really need the assertions?
There was a problem hiding this comment.
no they can be probably removed, i added during testing and for consistency with some of the other ExprEval, but i think i will just remove them all instead.
| } | ||
|
|
||
| @Override | ||
| public boolean isNumericNull() |
There was a problem hiding this comment.
I know this is a comment on unchanged code, but wondering how you feel about renaming this to just isNull? I've never quite understand why the Numeric part was important.
There was a problem hiding this comment.
heh, so here it sort of is important for the way things work (right now) with non-vectorized expressions because the contract of the method is sort of like that of the ColumnValueSelector isNull method - that if it returns false then the asLong and such primitive returning methods will return a valid value.
StringExprEval for example implements these methods as best effort conversions, so StringExprEval("100") returns false for isNumericNull and asLong returns the value 100L in both default value and sql compat modes. StringExprEval("x") returns true for isNumericNull if in sql compatible mode, but the value() is not in fact null, so callers should know that the value of this is null in sql compatible mode instead of calling asLong to get 0. In default value mode, isNumericNull returns false and so no one cares if the string was a real number or not, if it wasn't it was zero.
Things that do not want to use these number methods can just check the result of value() directly.
At least for the way things currently work, this is actually probably a more accurate name than isNull, and arguably the ColumnValueSelector method should probably be renamed to match this one 😛 . The only things that check isNull on ColumnValueSelector are things that want to use the primitive numeric getters, object selectors often do not implement it since callers are currently expected to call getObject and check that it is null per the current javadoc contract in BaseNullableColumnValueSelector.
I would be in favor of changing this contract into making a correct implementation of isNull a requirement for all ColumnValueSelectors, but I totally don't want to do that in this PR, and it would require I think splitting up anything that does type coercion which might produce nulls into separate selectors (and expressions to more liberally explicitly cast values to different types instead of use these implicit conversions which some of them have)
| private DoubleExprEval(@Nullable Number value) | ||
| { | ||
| super(value == null ? NullHandling.defaultDoubleValue() : (Double) value.doubleValue()); | ||
| super(value == null ? null : value.doubleValue()); |
There was a problem hiding this comment.
Does the .doubleValue() add anything here? It looks like it's storing a reference to a Number anyway? I guess it's a forced cast?
There was a problem hiding this comment.
not really, can be simplified
There was a problem hiding this comment.
ah i was wrong, at least a lot of tests were counting on forced casts, i see lots of
Expected :java.lang.Long<1>
Actual :java.lang.Integer<1>
after removing it, will leave for now
| switch (castTo.getElementType().getType()) { | ||
| case DOUBLE: | ||
| return ExprEval.ofDoubleArray(value == null ? null : new Object[] {computeDouble()}); | ||
| return ExprEval.ofDoubleArray(value == null ? null : new Object[] {number != null ? number.doubleValue() : null}); |
There was a problem hiding this comment.
nit: invert the ternary to be == null?
| * Creates an {@link ExprVectorProcessor} for the 'isnull' function, that produces a "boolean" (long) typed output | ||
| * vector with values set to 1 if the input value was null or 0 if it was not null. |
There was a problem hiding this comment.
am I correct in understanding that this creates a long[]?
There was a problem hiding this comment.
yes, will clarify javadocs
| } | ||
|
|
||
| /** | ||
| * Creates an {@link ExprVectorProcessor} for the 'isnotnull' function, that produces a "boolean" (long) typed output |
There was a problem hiding this comment.
Same here, a long[]?
| * true/null, null/true, null/null -> null | ||
| * false/null, null/false -> false (0) |
There was a problem hiding this comment.
Why does null null out a true, but not null out a false!?!?!?!?!? Is that some SQL-ism to handling nulls?
There was a problem hiding this comment.
yeah, SQL compat stuff. null is basically a stand-in for "unknown" for logical operations,true && ??? could be true or false depending on the value of ??? so the result is ???. But false && ??? is definitely false no matter the value of ???.
There was a problem hiding this comment.
Ahhh, I can see the logic there... it's providing the best answer based on what it can know from the boolean operator. It would be a lot easier to reason about if it consistently nulled things out, but if this is how SQL is expected to work, so be it.
Description
This PR removes numeric null value coercion from the expression processing engine in favor of doing it on the consumer side. This appears to have minimal impact, since
isNumericNullwill still return false in default value mode, and numeric operators check this and use the numeric primitive methods ofExprEvalinstead ofvalue(). This walks back some of the changes in #13781. To help push this responsibility onto callers, a new method has been addedExprEval.valueOrDefault()as a convenience/alternative tovalue(), which will perform the coercion for callers if they wish, which many of them do.Since vector expression processors are only used at query time, I've modified the 'null vectors' (
boolean[]), I've modified them to consistently only set to true ifNullHandling.sqlCompatible()is true. Alternatively, we could have also instead checkedNullHandling.sqlCompatible()in the places we are using these null vectors, but doing it this way seemed more consistent with other vectorized selectors.Backstory, it would be nice to always write null values into segments, shifting
druid.generic.useDefaultValueForNullto a query time only setting on whether or not to use the null value bitmap index when scanning/aggregating columns. However, I have not made this change yet in this PR.Unrelated, I have renamed
Expr.buildVectorizedtoExpr.asVectorProcessorto be consistent withFunction.asVectorProcessorandApplyFunction.asVectorProcessor, and also added some javadocs toVectorProcessors.Release note
Null values input to and created by the Druid native expression processing engine no longer coerce
nullto the type appropriate 'default' value ifdruid.generic.useDefaultValueForNull=true. This should not impact existing behavior since this has been shifted onto the consumer and internally operators will still use default values in this mode, but calling this change to attention in case any subtle behavior changes around the handling ofnullvalues. Direct callers can get default values by using the newvalueOrDefault()method ofExprEval, instead ofvalue().This PR has: