-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17619][SQL] To add support for pattern matching in ArrayContains expression #15294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17619][SQL] To add support for pattern matching in ArrayContains expression #15294
Conversation
…ns Expression.
## What changes were proposed in this pull request?
This change adds support for pattern matching in arrayContains expression for the string arrays.
For eg.
a. arrayContains ( Seq ( “\\d\\d\\s-\\s\\d\\d”, null, "", "pattern"), "12 - 20" ) returns true
b. arrayContains ( Seq ( "\\d\\d\\s-\\s\\d\\d", "", "pattern"), "132 - 20" ) ) returns false
c. arrayContains ( Seq ( "\\d\\d\\s-\\s\\d\\d", null, ””, "pattern"), "132 - 20" ) ) returns null
This change is completely backward compatible.
## How was this patch tested?
Added some more test cases for pattern match use case in the following:
a. CollectionFunctionsSuite.scala
b. DataFrameFunctionsSuite.scala
c. ExpressionToSQLSuite.scala
jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619
|
Can one of the admins verify this patch? |
| @ExpressionDescription( | ||
| usage = "_FUNC_(array, value) - Returns TRUE if the array contains the value.", | ||
| extended = " > SELECT _FUNC_(array(1, 2, 3), 2);\n true") | ||
| usage = """_FUNC_(array, value) - Returns TRUE if the array contains the value or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that this changes the behavior of this method and even makes it a little surprising. Before, ..array("Mr.X"), "Mrox".. didn't match but now it does. Accepting strings and regexes in the same place is inherently ambiguous. I don't know if we'd change the meaning of an existing function like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in that case wecan add one more expression , something like ArrayContainsWithPatternMatch? whats your thought about this?
…ayContains Expression." This reverts commit 69f6309.
…ns Expression.
## What changes were proposed in this pull request?
This change adds new expression ArrayContainsWithPatternMatch , which does the pattern matching for string types and works in the same way as ArrayContains for all other data types.
For eg.
a. ArrayContainsWithPatternMatch ( Seq ( “\\d\\d\\s-\\s\\d\\d”, null, "", "pattern"), "12 - 20" ) returns true
b. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d", "", "pattern"), "132 - 20" ) ) returns false
c. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d", null, ””, "pattern"), "132 - 20" ) ) returns null
This change is completely backward compatible.
## How was this patch tested?
Added some more test cases for pattern match use case in the following:
a. CollectionFunctionsSuite.scala
b. DataFrameFunctionsSuite.scala
c. ExpressionToSQLSuite.scala
jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619
…tch.
## What changes were proposed in this pull request?
This change adds new expression ArrayContainsWithPatternMatch , which does the pattern matching for string types and works in the same way as ArrayContains for all other data types.
For eg.
a. ArrayContainsWithPatternMatch ( Seq ( “\\d\\d\\s-\\s\\d\\d”, null, "", "pattern"), "12 - 20" ) returns true
b. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d", "", "pattern"), "132 - 20" ) ) returns false
c. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d", null, ””, "pattern"), "132 - 20" ) ) returns null
This change is completely backward compatible.
## How was this patch tested?
Added some more test cases for pattern match use case in the following:
a. CollectionFunctionsSuite.scala
b. DataFrameFunctionsSuite.scala
c. ExpressionToSQLSuite.scala
jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619
…kagargnitk/spark into array_contains_with_pattern_match
|
Reverted the previous changes that i did in ArrayContains and now a new expression is added as ArrayContainsWithPatternMatch. |
## What changes were proposed in this pull request? This PR proposes to close some stale PRs and ones suggested to be closed by committer(s) or obviously inappropriate PRs (e.g. branch to branch). Closes apache#13458 Closes apache#15278 Closes apache#15294 Closes apache#15339 Closes apache#15283 ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15356 from HyukjinKwon/closing-prs.
What changes were proposed in this pull request?
This change adds support for pattern matching in arrayContains expression for the string arrays.
For eg.
a. arrayContains ( Seq ( “\d\d\s-\s\d\d”, null, "", "pattern"), "12 - 20" ) returns true
b. arrayContains ( Seq ( "\d\d\s-\s\d\d", "", "pattern"), "132 - 20" ) ) returns false
c. arrayContains ( Seq ( "\d\d\s-\s\d\d", null, ””, "pattern"), "132 - 20" ) ) returns null
This change is completely backward compatible.
How was this patch tested?
Added some more test cases for pattern match use case in the following:
a. CollectionFunctionsSuite.scala
b. DataFrameFunctionsSuite.scala
c. ExpressionToSQLSuite.scala
jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619