SCALAR_IN_ARRAY: Optimization and behavioral follow-ups.#16311
SCALAR_IN_ARRAY: Optimization and behavioral follow-ups.#16311gianm merged 3 commits intoapache:masterfrom
Conversation
1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly.
| return ExprEval.ofLongBoolean(Arrays.asList(array).contains(scalarExpr.value())); | ||
|
|
||
| if (scalarEval.value() == null) { | ||
| return Arrays.asList(array).contains(null) ? ExprEval.ofLongBoolean(true) : ExprEval.ofLong(null); |
There was a problem hiding this comment.
I wonder if the array contains null - then doesn't all places when false would have been returned should return null? for example:
c IN (2,null)
c = 2 OR c = null
c = 2 OR null
this last rewrite essentially mean that false is no more
....other way around is to think about = as IS NOT DISTINCT FROM ; however in that case the return value will never be null as <=> is a 2-valued
this is not a blocking comment; I was just wondering...
There was a problem hiding this comment.
The idea is that if the array contains null then the comparison acts like IS NOT DISTINCT FROM (always returns true or false), whereas if the array does not contain null, then the comparison acts like = (returns null if the lhs is null). It's the same way the native in and inType filters behave, so this is designed to align with those.
A future patch would convert from SQL IN to this SCALAR_IN_ARRAY function. We'll need to make sure to handle NULL appropriately in that patch-- it would not be ok to rewrite c IN (2, NULL) to SCALAR_IN_ARRAY(c, ARRAY[2, NULL]).
| ExprEval doApply(final ExprEval arrayExpr, final ExprEval scalarEval) | ||
| { | ||
| if (matchType == null) { | ||
| return ExprEval.ofLong(null); |
There was a problem hiding this comment.
note: matchType is known at asSingleThreaded call time; a strategy pattern could create a class returning only null unconditionally
it may make it a bit more readable and matchType may not be null - other benefits are possibly negligable...
There was a problem hiding this comment.
I added a specialization for WithNullArray, so now the WithConstantArray specialization requires a non-null constant array.
While doing this I also realized I had implemented WithConstantArray inefficiently: by overriding doApply rather than apply, the constant array expr would still have eval() called on it. I changed it to override apply instead.
* Four changes to scalar_in_array as follow-ups to apache#16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.
…6398) * Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Four changes to scalar_in_array as follow-ups to #16306:
Align behavior for
nullscalars to the behavior of the nativeinandinTypefilters: returntrueif the array itself contains null, else returnnull.Rename the class to more closely match the function name.
Add a specialization for constant arrays, where we build a
HashSet.Use
castForEqualityComparisonto properly handle cross-type comparisons.Additional tests verify comparisons between LONG and DOUBLE are now
handled properly.