Is your feature request related to a problem or challenge?
When array_has_any is passed a scalar for either of its argument, we can use a much faster algorithm: rather than doing O(N*M) comparisons for each row of the columnar arg, we can build a hash table on the scalar array and probe it instead.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
#18181 discusses a user-reported query where array_has_any is slow. In that scenario, array_has_any is called on a table column and an uncorrelated subquery that returns a 1245-element array. This results in calling array_has_any with a pair of columnar arguments (i.e., we don't take advantage of the fact that the subquery argument is effectively fixed and pass a scalar instead). Optimizing that query involves two steps:
- Optimize
array_has_any for a scalar arg, which is this ticket. This has value as a standalone optimization.
- Query optimization improvement to handle this general class of queries better; I'll do some more digging here and file another ticket shortly.
Is your feature request related to a problem or challenge?
When
array_has_anyis passed a scalar for either of its argument, we can use a much faster algorithm: rather than doing O(N*M) comparisons for each row of the columnar arg, we can build a hash table on the scalar array and probe it instead.Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
#18181 discusses a user-reported query where
array_has_anyis slow. In that scenario,array_has_anyis called on a table column and an uncorrelated subquery that returns a 1245-element array. This results in callingarray_has_anywith a pair of columnar arguments (i.e., we don't take advantage of the fact that the subquery argument is effectively fixed and pass a scalar instead). Optimizing that query involves two steps:array_has_anyfor a scalar arg, which is this ticket. This has value as a standalone optimization.