Skip to content

Additional expr type alignment.#18503

Merged
gianm merged 3 commits intoapache:masterfrom
gianm:align-expr-typings
Sep 18, 2025
Merged

Additional expr type alignment.#18503
gianm merged 3 commits intoapache:masterfrom
gianm:align-expr-typings

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Sep 9, 2025

PR #16366 originally added fallback vectorization, a mechanism for making all expressions vectorizable. Later, #17098 fixed some issues that arose and #17248 disabled fallback vectorization in the out-of-box configuration.

This patch fixes various remaining issues with inconsistent type handling between the vectorized and nonvectorized expr implementations. It does not yet re-enable fallback vectorization out of the box, due to remaining inconsistencies with conditional exprs like "case_searched", "case_simple", and "if".

  1. Aligns the behavior of missing columns and literal nulls so they are
    always treated as null longs. This was already the case for vectorized
    identifiers, but non-vectorized identifiers and literal nulls were still
    represented as strings.

  2. Replaces all occurrences of "ExprEval.of(null)" with either an explicit
    type, or a call to "ExprEval.ofMissing()". ofMissing is a new function
    for situations where an eval represents a null value of unknown type.
    It is equivalent to "ExprEval.ofLong(null)", but is a separate function
    for clarity at the call site.

  3. Update "cast" to return the target type even for null values.

  4. Update "greatest", "least", and "array" so they eval to types that
    match what is reported by "getOutputType".

  5. Update "scalb" to coerce input strings as numbers, to better allow
    for type evolution and missing columns.

  6. Update "reverse" to coerce inputs to strings, to better allow for
    type evolution and missing columns.

PR apache#16366 originally added fallback vectorization, a mechanism for
making all expressions vectorizable. Later, apache#17098 fixed some issues
that arose and apache#17248 disabled fallback vectorization in the out-of-box
configuration.

This patch fixes various remaining issues with inconsistent type handling
between the vectorized and nonvectorized expr implementations. It does not
yet re-enable fallback vectorization out of the box, due to remaining
inconsistencies with conditional exprs like "case_searched", "case_simple",
and "if".

1) Aligns the behavior of missing columns and literal nulls so they are
   always treated as null longs. This was already the case for vectorized
   identifiers, but non-vectorized identifiers and literal nulls were still
   represented as strings.

2) Replaces all occurrences of "ExprEval.of(null)" with either an explicit
   type, or a call to "ExprEval.ofMissing()". ofMissing is a new function
   for situations where an eval represents a null value of unknown type.
   It is equivalent to "ExprEval.ofLong(null)", but is a separate function
   for clarity at the call site.

3) Update "cast" to return the target type even for null values.

4) Update "greatest", "least", and "array" so they eval to types that
   match what is reported by "getOutputType".

5) Update "scalb" to coerce input strings as numbers, to better allow
   for type evolution and missing columns.

6) Update "reverse" to coerce inputs to strings, to better allow for
   type evolution and missing columns.
Comment thread processing/src/main/java/org/apache/druid/math/expr/Function.java Dismissed
Comment thread processing/src/main/java/org/apache/druid/math/expr/Function.java Fixed
return ExprEval.ofDouble(isNull ? null : results.getDoubleVector()[rowNum]);
} else {
return ExprEval.ofType(type, results.getObjectVector()[rowNum]);
return ExprEval.bestEffortOf(results.getObjectVector()[rowNum]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the problem with using ofType that is solved by best effort + cast?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly which test failed, but the problem was that ofType doesn't coerce, it mostly trusts that the value you pass in is actually that type. But, there were some functions that lie about their output type (i.e. they return an expr that doesn't match the type from getOutputType), and to plug those in properly we need to coerce to the declared output type. I fixed a few in this patch but I'm not sure I got them all.

@gianm gianm merged commit bf1b2fe into apache:master Sep 18, 2025
112 of 113 checks passed
@gianm gianm deleted the align-expr-typings branch September 18, 2025 23:20
@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants