new SCALAR_IN_ARRAY function analogous to DRUID_IN by sreemanamala · Pull Request #16306 · apache/druid

sreemanamala · 2024-04-18T06:40:14Z

Description

creates a new function SCALAR_IN_ARRAY(expr, arr) to check if the scalar expr expr is present in the array expr arr or not

Release note

Key changed/added classes in this PR

ScalarInOperatorConversion.java
Function.java

This PR has:

kgyrtkirk · 2024-04-18T07:16:43Z

-| array_to_string(arr,str) | joins all elements of arr by the delimiter specified by str |
-| string_to_array(str1,str2) | splits str1 into an array on the delimiter specified by str2, which is a regular expression |
+| function                     | description                                                                                                                                                                                                   |
+|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|


I believe .md files are processed and rendered during page generation - is it really needed to add whitespace change to to unrelated lines - to make it a nice ascii table?
even the preious version was not caring at all if its a nice ascii or not by using | --- | --- |

kgyrtkirk · 2024-04-18T07:19:02Z

    }
  }

+  class ArrayScalarInFunction extends ArrayScalarFunction


I wonder if it would be possible to instead of adding a new native function - delegate all this work to the new type-aware native in filter @clintropolis have added?

EDIT: now I see that the new typed in filter may not be available in all circumstances - as its not supported in defaultValue mode

gianm · 2024-04-18T06:54:03Z

-| array_slice(arr,start,end) | return the subarray of arr from the 0 based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or less than end|
-| array_to_string(arr,str) | joins all elements of arr by the delimiter specified by str |
-| string_to_array(str1,str2) | splits str1 into an array on the delimiter specified by str2, which is a regular expression |
+| function                     | description                                                                                                                                                                                                   |


Please don't format these markdown tables like this- it makes diffs hard to read because of the whitespace-only changes. Please turn off the feature in your IDE; if you're using IntelliJ, I think the code style file in the repo should do it.

gianm · 2024-04-18T06:54:36Z

+| array_ordinal(arr,long)      | returns the array element at the 1 based index supplied, or null for an out of range index                                                                                                                    |
+| array_contains(arr,expr)     | returns 1 if the array contains the element specified by expr, or contains all elements specified by expr if expr is an array, else 0                                                                         |
+| array_overlap(arr1,arr2)     | returns 1 if arr1 and arr2 have any elements in common, else 0                                                                                                                                                |
+| scalar_in(expr,arr)          | returns 1 if the array contains the scalar specified by expr, else 0.                                                                                                                                         |                                                                                                                               |


scalar_in_array may be a better name. Wondering what you / others think?

Should also add a reference to the function in the SQL docs.

gianm · 2024-04-18T08:04:18Z

+    }
+
+    @Override
+    ExprEval doApply(ExprEval arrayExpr, ExprEval scalarExpr)


You don't need to do this now, but as a follow up you could add an asSingleThreaded impl that creates a Set to avoid the Arrays.asList(array).contains, similar to the technique used in OverlapConstantArray.

Three changes to scalar_in_array as follow-ups to apache#16306: 1) Rename the class to more closely match the function name. 2) Add a specialization for constant arrays, where we build a HashSet. 3) Use castForEqualityComparison to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly.

1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly.

* Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.

* Four changes to scalar_in_array as follow-ups to apache#16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.

…6398) * Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

scalar_in function

f9465a9

github-actions Bot added the Area - Querying label Apr 18, 2024

api doc

7cc6249

github-actions Bot added the Area - Documentation label Apr 18, 2024

kgyrtkirk reviewed Apr 18, 2024

View reviewed changes

gianm reviewed Apr 18, 2024

View reviewed changes

refactor

194ab89

sreemanamala changed the title ~~new SCALAR_IN function analogous to DRUID_IN~~ new SCALAR_IN_ARRAY function analogous to DRUID_IN Apr 18, 2024

gianm mentioned this pull request Apr 18, 2024

Array overlap to allow numeric operand #15964

Closed

10 tasks

gianm approved these changes Apr 19, 2024

View reviewed changes

gianm merged commit ad5701e into apache:master Apr 19, 2024

sreemanamala deleted the sql-in branch April 19, 2024 04:18

gianm mentioned this pull request Apr 19, 2024

SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. #16311

Merged

adarshsanjeev added this to the 30.0.0 milestone May 6, 2024

adarshsanjeev mentioned this pull request May 28, 2024

[DRAFT] 30.0.0 release notes #16505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new SCALAR_IN_ARRAY function analogous to DRUID_IN#16306

new SCALAR_IN_ARRAY function analogous to DRUID_IN#16306
gianm merged 3 commits intoapache:masterfrom
sreemanamala:sql-in

sreemanamala commented Apr 18, 2024 •

edited

Loading

Uh oh!

kgyrtkirk Apr 18, 2024

Uh oh!

kgyrtkirk Apr 18, 2024 •

edited

Loading

Uh oh!

gianm Apr 18, 2024

Uh oh!

gianm Apr 18, 2024

Uh oh!

Uh oh!

gianm Apr 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sreemanamala commented Apr 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Key changed/added classes in this PR

Uh oh!

kgyrtkirk Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

kgyrtkirk Apr 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

gianm Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gianm Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sreemanamala commented Apr 18, 2024 •

edited

Loading

kgyrtkirk Apr 18, 2024 •

edited

Loading