Additional expr type alignment.#18503
Merged
gianm merged 3 commits intoapache:masterfrom Sep 18, 2025
Merged
Conversation
PR apache#16366 originally added fallback vectorization, a mechanism for making all expressions vectorizable. Later, apache#17098 fixed some issues that arose and apache#17248 disabled fallback vectorization in the out-of-box configuration. This patch fixes various remaining issues with inconsistent type handling between the vectorized and nonvectorized expr implementations. It does not yet re-enable fallback vectorization out of the box, due to remaining inconsistencies with conditional exprs like "case_searched", "case_simple", and "if". 1) Aligns the behavior of missing columns and literal nulls so they are always treated as null longs. This was already the case for vectorized identifiers, but non-vectorized identifiers and literal nulls were still represented as strings. 2) Replaces all occurrences of "ExprEval.of(null)" with either an explicit type, or a call to "ExprEval.ofMissing()". ofMissing is a new function for situations where an eval represents a null value of unknown type. It is equivalent to "ExprEval.ofLong(null)", but is a separate function for clarity at the call site. 3) Update "cast" to return the target type even for null values. 4) Update "greatest", "least", and "array" so they eval to types that match what is reported by "getOutputType". 5) Update "scalb" to coerce input strings as numbers, to better allow for type evolution and missing columns. 6) Update "reverse" to coerce inputs to strings, to better allow for type evolution and missing columns.
| return ExprEval.ofDouble(isNull ? null : results.getDoubleVector()[rowNum]); | ||
| } else { | ||
| return ExprEval.ofType(type, results.getObjectVector()[rowNum]); | ||
| return ExprEval.bestEffortOf(results.getObjectVector()[rowNum]); |
Member
There was a problem hiding this comment.
what was the problem with using ofType that is solved by best effort + cast?
Contributor
Author
There was a problem hiding this comment.
I don't remember exactly which test failed, but the problem was that ofType doesn't coerce, it mostly trusts that the value you pass in is actually that type. But, there were some functions that lie about their output type (i.e. they return an expr that doesn't match the type from getOutputType), and to plug those in properly we need to coerce to the declared output type. I fixed a few in this patch but I'm not sure I got them all.
clintropolis
approved these changes
Sep 18, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #16366 originally added fallback vectorization, a mechanism for making all expressions vectorizable. Later, #17098 fixed some issues that arose and #17248 disabled fallback vectorization in the out-of-box configuration.
This patch fixes various remaining issues with inconsistent type handling between the vectorized and nonvectorized expr implementations. It does not yet re-enable fallback vectorization out of the box, due to remaining inconsistencies with conditional exprs like "case_searched", "case_simple", and "if".
Aligns the behavior of missing columns and literal nulls so they are
always treated as null longs. This was already the case for vectorized
identifiers, but non-vectorized identifiers and literal nulls were still
represented as strings.
Replaces all occurrences of "ExprEval.of(null)" with either an explicit
type, or a call to "ExprEval.ofMissing()". ofMissing is a new function
for situations where an eval represents a null value of unknown type.
It is equivalent to "ExprEval.ofLong(null)", but is a separate function
for clarity at the call site.
Update "cast" to return the target type even for null values.
Update "greatest", "least", and "array" so they eval to types that
match what is reported by "getOutputType".
Update "scalb" to coerce input strings as numbers, to better allow
for type evolution and missing columns.
Update "reverse" to coerce inputs to strings, to better allow for
type evolution and missing columns.