Skip to content

new SCALAR_IN_ARRAY function analogous to DRUID_IN#16306

Merged
gianm merged 3 commits intoapache:masterfrom
sreemanamala:sql-in
Apr 19, 2024
Merged

new SCALAR_IN_ARRAY function analogous to DRUID_IN#16306
gianm merged 3 commits intoapache:masterfrom
sreemanamala:sql-in

Conversation

@sreemanamala
Copy link
Copy Markdown
Contributor

@sreemanamala sreemanamala commented Apr 18, 2024

Description

creates a new function SCALAR_IN_ARRAY(expr, arr) to check if the scalar expr expr is present in the array expr arr or not

Release note


Key changed/added classes in this PR
  • ScalarInOperatorConversion.java
  • Function.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Comment thread docs/querying/math-expr.md Outdated
| array_to_string(arr,str) | joins all elements of arr by the delimiter specified by str |
| string_to_array(str1,str2) | splits str1 into an array on the delimiter specified by str2, which is a regular expression |
| function | description |
|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe .md files are processed and rendered during page generation - is it really needed to add whitespace change to to unrelated lines - to make it a nice ascii table?
even the preious version was not caring at all if its a nice ascii or not by using | --- | --- |

}
}

class ArrayScalarInFunction extends ArrayScalarFunction
Copy link
Copy Markdown
Member

@kgyrtkirk kgyrtkirk Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be possible to instead of adding a new native function - delegate all this work to the new type-aware native in filter @clintropolis have added?

EDIT: now I see that the new typed in filter may not be available in all circumstances - as its not supported in defaultValue mode

Comment thread docs/querying/math-expr.md Outdated
| array_slice(arr,start,end) | return the subarray of arr from the 0 based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or less than end|
| array_to_string(arr,str) | joins all elements of arr by the delimiter specified by str |
| string_to_array(str1,str2) | splits str1 into an array on the delimiter specified by str2, which is a regular expression |
| function | description |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't format these markdown tables like this- it makes diffs hard to read because of the whitespace-only changes. Please turn off the feature in your IDE; if you're using IntelliJ, I think the code style file in the repo should do it.

Comment thread docs/querying/math-expr.md Outdated
| array_ordinal(arr,long) | returns the array element at the 1 based index supplied, or null for an out of range index |
| array_contains(arr,expr) | returns 1 if the array contains the element specified by expr, or contains all elements specified by expr if expr is an array, else 0 |
| array_overlap(arr1,arr2) | returns 1 if arr1 and arr2 have any elements in common, else 0 |
| scalar_in(expr,arr) | returns 1 if the array contains the scalar specified by expr, else 0. | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scalar_in_array may be a better name. Wondering what you / others think?

Should also add a reference to the function in the SQL docs.

}

@Override
ExprEval doApply(ExprEval arrayExpr, ExprEval scalarExpr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do this now, but as a follow up you could add an asSingleThreaded impl that creates a Set to avoid the Arrays.asList(array).contains, similar to the technique used in OverlapConstantArray.

@sreemanamala sreemanamala changed the title new SCALAR_IN function analogous to DRUID_IN new SCALAR_IN_ARRAY function analogous to DRUID_IN Apr 18, 2024
@gianm gianm mentioned this pull request Apr 18, 2024
10 tasks
@gianm gianm merged commit ad5701e into apache:master Apr 19, 2024
@sreemanamala sreemanamala deleted the sql-in branch April 19, 2024 04:18
gianm added a commit to gianm/druid that referenced this pull request Apr 19, 2024
Three changes to scalar_in_array as follow-ups to apache#16306:

1) Rename the class to more closely match the function name.

2) Add a specialization for constant arrays, where we build a HashSet.

3) Use castForEqualityComparison to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.
gianm added a commit to gianm/druid that referenced this pull request Apr 20, 2024
1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.
gianm added a commit that referenced this pull request Apr 26, 2024
* Four changes to scalar_in_array as follow-ups to #16306:

1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.

* Fix spelling.

* Adjustments from review.
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
adarshsanjeev pushed a commit to adarshsanjeev/druid that referenced this pull request May 6, 2024
* Four changes to scalar_in_array as follow-ups to apache#16306:

1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.

* Fix spelling.

* Adjustments from review.
cryptoe pushed a commit that referenced this pull request May 8, 2024
…6398)

* Four changes to scalar_in_array as follow-ups to #16306:

1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.

* Fix spelling.

* Adjustments from review.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants