Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr.#16366
Merged
gianm merged 11 commits intoapache:masterfrom Jun 6, 2024
Merged
Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr.#16366gianm merged 11 commits intoapache:masterfrom
gianm merged 11 commits intoapache:masterfrom
Conversation
This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr. In addition: - Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well. - In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes it easier to identify tests that can now vectorize. - Fix a null-matcher bug in StringObjectVectorValueMatcher.
9 tasks
clintropolis
approved these changes
Jun 5, 2024
| for (int i = 0; i < mask.getSelectionSize(); i++) { | ||
| final int rowNum = mask.getSelection()[i]; | ||
| if ((value == null && includeUnknown) || Objects.equals(value, vector[rowNum])) { | ||
| if ((vector[rowNum] == null && includeUnknown) || Objects.equals(value, vector[rowNum])) { |
| * because bindings should only be used by identifiers, and this fallback processor is never used to | ||
| * implement identifiers. | ||
| */ | ||
| private static class UnusedBinding implements Expr.ObjectBinding |
Member
There was a problem hiding this comment.
nit: this seems pretty similar to the thing made by InputBindings.validateConstant which is used during sql planning to reduce constant expressions, maybe should consolidate.
I guess not totally possible though since the other one attaches the expression using the constant bindings for nicer validation messages, but that's not a thing here so it can be static and doesn't have the need since it should only be used for identifiers
Contributor
Author
There was a problem hiding this comment.
Yeah, they are very similar, but probably makes sense to keep them different for error-message reasons.
Merged
4 tasks
3 tasks
gianm
added a commit
to gianm/druid
that referenced
this pull request
Sep 9, 2025
PR apache#16366 originally added fallback vectorization, a mechanism for making all expressions vectorizable. Later, apache#17098 fixed some issues that arose and apache#17248 disabled fallback vectorization in the out-of-box configuration. This patch fixes various remaining issues with inconsistent type handling between the vectorized and nonvectorized expr implementations. It does not yet re-enable fallback vectorization out of the box, due to remaining inconsistencies with conditional exprs like "case_searched", "case_simple", and "if". 1) Aligns the behavior of missing columns and literal nulls so they are always treated as null longs. This was already the case for vectorized identifiers, but non-vectorized identifiers and literal nulls were still represented as strings. 2) Replaces all occurrences of "ExprEval.of(null)" with either an explicit type, or a call to "ExprEval.ofMissing()". ofMissing is a new function for situations where an eval represents a null value of unknown type. It is equivalent to "ExprEval.ofLong(null)", but is a separate function for clarity at the call site. 3) Update "cast" to return the target type even for null values. 4) Update "greatest", "least", and "array" so they eval to types that match what is reported by "getOutputType". 5) Update "scalb" to coerce input strings as numbers, to better allow for type evolution and missing columns. 6) Update "reverse" to coerce inputs to strings, to better allow for type evolution and missing columns.
gianm
added a commit
that referenced
this pull request
Sep 18, 2025
* Additional expr type alignment. PR #16366 originally added fallback vectorization, a mechanism for making all expressions vectorizable. Later, #17098 fixed some issues that arose and #17248 disabled fallback vectorization in the out-of-box configuration. This patch fixes various remaining issues with inconsistent type handling between the vectorized and nonvectorized expr implementations. It does not yet re-enable fallback vectorization out of the box, due to remaining inconsistencies with conditional exprs like "case_searched", "case_simple", and "if". 1) Aligns the behavior of missing columns and literal nulls so they are always treated as null longs. This was already the case for vectorized identifiers, but non-vectorized identifiers and literal nulls were still represented as strings. 2) Replaces all occurrences of "ExprEval.of(null)" with either an explicit type, or a call to "ExprEval.ofMissing()". ofMissing is a new function for situations where an eval represents a null value of unknown type. It is equivalent to "ExprEval.ofLong(null)", but is a separate function for clarity at the call site. 3) Update "cast" to return the target type even for null values. 4) Update "greatest", "least", and "array" so they eval to types that match what is reported by "getOutputType". 5) Update "scalb" to coerce input strings as numbers, to better allow for type evolution and missing columns. 6) Update "reverse" to coerce inputs to strings, to better allow for type evolution and missing columns. * Restore fallback in testArrayFns. * Fix issues.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch adds
FallbackVectorProcessor, a processor that adapts non-vectorizable operations into vectorizable ones. It is used inFunctionExprandBaseMacroFunctionExpr. As a result, all such expressions can now participate in vectorized queries.In addition:
Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well. Identifiers already did
return truefromcanVectorize, so this enables them to live up to their claims.In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes it easier to identify tests that can now vectorize.
Fixes a null-matcher bug in StringObjectVectorValueMatcher that was uncovered by certain newly-vectorizable test cases.
Benchmarks follow for
SqlExpressionBenchmarkqueries 26 and 27. These two queries are:In these cases fallback vectorization is not as compelling as proper vectorization, but it's better than unvectorized execution. The relative benefit is greater for query 27, likely because fallback vectorization for
CONCATenables thelong1 * double4to vectorize as well. In general I would expect the benefit to be greater for more complex queries, due to this effect.