various fixes and improvements to vectorization fallback#17098
various fixes and improvements to vectorization fallback#17098clintropolis merged 4 commits intoapache:masterfrom
Conversation
changes: * add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized * add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type * add support for array and complex vector constants * reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness) * fix array constructor expression, apply map expression to make actual evaluated type match the output type inference * fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric * fix isNull/isNotNull to correctly handle any type of input argument
| final ExprEvalVector<?> delegateOutput = delegate.evalVector(bindings); | ||
| final Object[] toCast = delegateOutput.getObjectVector(); | ||
| for (int i = 0; i < bindings.getCurrentVectorSize(); i++) { | ||
| ExprEval<?> cast = ExprEval.ofType(delegateType, toCast[i]).castTo(outputType); |
There was a problem hiding this comment.
IIRC from work on #16994, ExprEval.ofType(inType, o).castTo(outType) and ExprEval.ofType(outType, o) have subtly different behavior and I ended up preferring the latter. I think part of the reason was that the former could throw exceptions in cases where the cast was invalid, and the latter was more "forgiving" (like, returning null instead of throwing an exception). It happened that forgiving is what I wanted in that particular patch.
Here, what's the right thing?
There was a problem hiding this comment.
this is the vectorized cast operator for objects, so it should totally use castTo to be consistent with non-vectorized cast
| final Object[] objects = result.getObjectVector(); | ||
| final Object[] output = new String[objects.length]; | ||
| for (int i = 0; i < objects.length; i++) { | ||
| for (int i = 0; i < bindings.getCurrentVectorSize(); i++) { |
There was a problem hiding this comment.
I don't think so, previously it would just cast the whole vector size instead of only what was needed
| final ExprEvalObjectVector eval = new ExprEvalObjectVector(strings, ExpressionType.STRING); | ||
| final Object[] objects = new Object[maxVectorSize]; | ||
| if (type.isNumeric()) { | ||
| return constant((Long) null, maxVectorSize); |
There was a problem hiding this comment.
why does this use null instead of looking at what constant is?
There was a problem hiding this comment.
oops, idk why this is here, i think this is from a bad merge between some stashes with experiments i had going on. It should just be removed, and wasn't failing any tests because numeric types use their dedicated constant functions
| * @see org.apache.druid.math.expr.ConstantExpr | ||
| */ | ||
| public static <T> ExprVectorProcessor<T> constant(@Nullable String constant, int maxVectorSize) | ||
| public static <T> ExprVectorProcessor<T> constant(@Nullable Object constant, int maxVectorSize, ExpressionType type) |
There was a problem hiding this comment.
why did this change to Object? what sorts of non-String objects might be passed in? could you please update the javadoc?
There was a problem hiding this comment.
oh, i needed a way to make vector constants for basically anything not a numeric primitive, so this method was updated to make all of the object constants, will update javadoc
| final Object[] strings = new Object[maxVectorSize]; | ||
| Arrays.fill(strings, constant); | ||
| final ExprEvalObjectVector eval = new ExprEvalObjectVector(strings, ExpressionType.STRING); | ||
| final Object[] objects = new Object[maxVectorSize]; |
There was a problem hiding this comment.
we are creating this unnecessarily if type is numeric.
There was a problem hiding this comment.
numeric types shouldn't ever be calling this, added a defensive check
| final ExprEvalObjectVector eval = new ExprEvalObjectVector(strings, ExpressionType.STRING); | ||
| final Object[] objects = new Object[maxVectorSize]; | ||
| if (type.isNumeric()) { | ||
| return constant((Long) null, maxVectorSize); |
There was a problem hiding this comment.
why null though?
changes: * add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized * add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type * add support for array and complex vector constants * reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness) * fix array constructor expression, apply map expression to make actual evaluated type match the output type inference * fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric * fix isNull/isNotNull to correctly handle any type of input argument
…7142) changes: * add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized * add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type * add support for array and complex vector constants * reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness) * fix array constructor expression, apply map expression to make actual evaluated type match the output type inference * fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric * fix isNull/isNotNull to correctly handle any type of input argument
PR apache#16366 originally added fallback vectorization, a mechanism for making all expressions vectorizable. Later, apache#17098 fixed some issues that arose and apache#17248 disabled fallback vectorization in the out-of-box configuration. This patch fixes various remaining issues with inconsistent type handling between the vectorized and nonvectorized expr implementations. It does not yet re-enable fallback vectorization out of the box, due to remaining inconsistencies with conditional exprs like "case_searched", "case_simple", and "if". 1) Aligns the behavior of missing columns and literal nulls so they are always treated as null longs. This was already the case for vectorized identifiers, but non-vectorized identifiers and literal nulls were still represented as strings. 2) Replaces all occurrences of "ExprEval.of(null)" with either an explicit type, or a call to "ExprEval.ofMissing()". ofMissing is a new function for situations where an eval represents a null value of unknown type. It is equivalent to "ExprEval.ofLong(null)", but is a separate function for clarity at the call site. 3) Update "cast" to return the target type even for null values. 4) Update "greatest", "least", and "array" so they eval to types that match what is reported by "getOutputType". 5) Update "scalb" to coerce input strings as numbers, to better allow for type evolution and missing columns. 6) Update "reverse" to coerce inputs to strings, to better allow for type evolution and missing columns.
* Additional expr type alignment. PR #16366 originally added fallback vectorization, a mechanism for making all expressions vectorizable. Later, #17098 fixed some issues that arose and #17248 disabled fallback vectorization in the out-of-box configuration. This patch fixes various remaining issues with inconsistent type handling between the vectorized and nonvectorized expr implementations. It does not yet re-enable fallback vectorization out of the box, due to remaining inconsistencies with conditional exprs like "case_searched", "case_simple", and "if". 1) Aligns the behavior of missing columns and literal nulls so they are always treated as null longs. This was already the case for vectorized identifiers, but non-vectorized identifiers and literal nulls were still represented as strings. 2) Replaces all occurrences of "ExprEval.of(null)" with either an explicit type, or a call to "ExprEval.ofMissing()". ofMissing is a new function for situations where an eval represents a null value of unknown type. It is equivalent to "ExprEval.ofLong(null)", but is a separate function for clarity at the call site. 3) Update "cast" to return the target type even for null values. 4) Update "greatest", "least", and "array" so they eval to types that match what is reported by "getOutputType". 5) Update "scalb" to coerce input strings as numbers, to better allow for type evolution and missing columns. 6) Update "reverse" to coerce inputs to strings, to better allow for type evolution and missing columns. * Restore fallback in testArrayFns. * Fix issues.
changes:
ApplyFunctionsupport to vectorization fallback, allowing many of the remaining expressions to be vectorizedCastToObjectVectorProcessorso that vector engine can correctly cast any type