Skip to content

Conversation

@priyankagargnitk
Copy link

@priyankagargnitk priyankagargnitk commented Sep 29, 2016

What changes were proposed in this pull request?

This change adds support for pattern matching in arrayContains expression for the string arrays.
For eg.
a. arrayContains ( Seq ( “\d\d\s-\s\d\d”, null, "", "pattern"), "12 - 20" ) returns true
b. arrayContains ( Seq ( "\d\d\s-\s\d\d", "", "pattern"), "132 - 20" ) ) returns false
c. arrayContains ( Seq ( "\d\d\s-\s\d\d", null, ””, "pattern"), "132 - 20" ) ) returns null

This change is completely backward compatible.

How was this patch tested?

Added some more test cases for pattern match use case in the following:
a. CollectionFunctionsSuite.scala
b. DataFrameFunctionsSuite.scala
c. ExpressionToSQLSuite.scala

jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619

…ns Expression.

## What changes were proposed in this pull request?
This change adds support for pattern matching in arrayContains expression for the string arrays.
For eg.
        a. arrayContains ( Seq ( “\\d\\d\\s-\\s\\d\\d”,  null, "", "pattern"), "12 - 20" ) returns true
        b. arrayContains ( Seq ( "\\d\\d\\s-\\s\\d\\d",  "", "pattern"), "132 - 20" ) ) returns  false
        c. arrayContains ( Seq ( "\\d\\d\\s-\\s\\d\\d",  null, ””, "pattern"), "132 - 20" ) ) returns  null

This change is completely backward compatible.

## How was this patch tested?
Added some more test cases for pattern match use case in the following:
         a. CollectionFunctionsSuite.scala
         b. DataFrameFunctionsSuite.scala
         c. ExpressionToSQLSuite.scala

jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ExpressionDescription(
usage = "_FUNC_(array, value) - Returns TRUE if the array contains the value.",
extended = " > SELECT _FUNC_(array(1, 2, 3), 2);\n true")
usage = """_FUNC_(array, value) - Returns TRUE if the array contains the value or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this changes the behavior of this method and even makes it a little surprising. Before, ..array("Mr.X"), "Mrox".. didn't match but now it does. Accepting strings and regexes in the same place is inherently ambiguous. I don't know if we'd change the meaning of an existing function like this.

Copy link
Author

@priyankagargnitk priyankagargnitk Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in that case wecan add one more expression , something like ArrayContainsWithPatternMatch? whats your thought about this?

…ns Expression.

## What changes were proposed in this pull request?
This change adds new expression ArrayContainsWithPatternMatch , which does the pattern matching for string types and works in the same way as ArrayContains for all other data types.

For eg.
        a. ArrayContainsWithPatternMatch ( Seq ( “\\d\\d\\s-\\s\\d\\d”,  null, "", "pattern"), "12 - 20" ) returns true
        b. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d",  "", "pattern"), "132 - 20" ) ) returns  false
        c. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d",  null, ””, "pattern"), "132 - 20" ) ) returns  null

This change is completely backward compatible.

## How was this patch tested?
Added some more test cases for pattern match use case in the following:
         a. CollectionFunctionsSuite.scala
         b. DataFrameFunctionsSuite.scala
         c. ExpressionToSQLSuite.scala

jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619
…tch.

## What changes were proposed in this pull request?
This change adds new expression ArrayContainsWithPatternMatch , which does the pattern matching for string types and works in the same way as ArrayContains for all other data types.

For eg.
        a. ArrayContainsWithPatternMatch ( Seq ( “\\d\\d\\s-\\s\\d\\d”,  null, "", "pattern"), "12 - 20" ) returns true
        b. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d",  "", "pattern"), "132 - 20" ) ) returns  false
        c. ArrayContainsWithPatternMatch ( Seq ( "\\d\\d\\s-\\s\\d\\d",  null, ””, "pattern"), "132 - 20" ) ) returns  null

This change is completely backward compatible.

## How was this patch tested?
Added some more test cases for pattern match use case in the following:
         a. CollectionFunctionsSuite.scala
         b. DataFrameFunctionsSuite.scala
         c. ExpressionToSQLSuite.scala

jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17619
…kagargnitk/spark into array_contains_with_pattern_match
@priyankagargnitk
Copy link
Author

Reverted the previous changes that i did in ArrayContains and now a new expression is added as ArrayContainsWithPatternMatch.

@asfgit asfgit closed this in 5e9f32d Oct 6, 2016
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
## What changes were proposed in this pull request?

This PR proposes to close some stale PRs and ones suggested to be closed by committer(s) or obviously inappropriate PRs (e.g. branch to branch).

Closes apache#13458
Closes apache#15278
Closes apache#15294
Closes apache#15339
Closes apache#15283

## How was this patch tested?

N/A

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#15356 from HyukjinKwon/closing-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants