Skip to content

[SPARK-2097][SQL] UDF Support#1063

Closed
marmbrus wants to merge 18 commits intoapache:masterfrom
marmbrus:udfs
Closed

[SPARK-2097][SQL] UDF Support#1063
marmbrus wants to merge 18 commits intoapache:masterfrom
marmbrus:udfs

Conversation

@marmbrus
Copy link
Copy Markdown
Contributor

This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.

Scala:

registerFunction("strLenScala", (_: String).length)
sql("SELECT strLenScala('test')")

Python:

sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
sqlCtx.sql("SELECT strLenPython('test')")

Java:

sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
  @Override
  public Integer call(String str) throws Exception {
    return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql("SELECT stringLengthJava('test')");

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 10, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16509/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 10, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class HiveContext(sc: SparkContext) extends SQLContext(sc) with UdfRegistration{
protected[sql] trait UdfRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16509/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 10, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16510/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 10, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class HiveContext(sc: SparkContext) extends SQLContext(sc) with UdfRegistration{
protected[sql] trait UdfRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16510/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 10, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16512/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 10, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class HiveContext(sc: SparkContext) extends SQLContext(sc) with UdfRegistration{
protected[sql] trait UdfRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16512/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 23, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17064/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 23, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class HiveContext(sc: SparkContext) extends SQLContext(sc) with UdfRegistration{
protected[sql] trait UdfRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17064/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 24, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17084/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 24, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class HiveContext(sc: SparkContext) extends SQLContext(sc) with UdfRegistration{
protected[sql] trait UdfRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17084/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 28, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17267/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 28, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class SimpleFunctionRegistry extends FunctionRegistry {
protected[sql] trait UdfRegistration {
class JavaSQLContext(val sqlContext: SQLContext) extends FunctionRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17267/consoleFull

marmbrus added 2 commits July 28, 2014 10:59
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/api/java/JavaSQLContext.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
	sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
	sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 31, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17544/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 31, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class SimpleFunctionRegistry extends FunctionRegistry {
protected[sql] trait UdfRegistration {
class JavaSQLContext(val sqlContext: SQLContext) extends FunctionRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17544/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 1, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17638/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 1, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class SimpleFunctionRegistry extends FunctionRegistry {
protected[sql] trait UdfRegistration {
class JavaSQLContext(val sqlContext: SQLContext) extends FunctionRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17638/consoleFull

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to say that a Hive UDF can be overridden?

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17763/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA results for PR 1063:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class SimpleFunctionRegistry extends FunctionRegistry {
protected[sql] trait UdfRegistration {
class JavaSQLContext(val sqlContext: SQLContext) extends FunctionRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17763/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17777/consoleFull

Conflicts:
	sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
	sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala
@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17779/consoleFull

@marmbrus marmbrus changed the title [WIP][SPARK-2097][SQL] UDF Support [SPARK-2097][SQL] UDF Support Aug 2, 2014
@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA tests have started for PR 1063. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17781/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA results for PR 1063:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class SimpleFunctionRegistry extends FunctionRegistry {
protected[sql] trait UdfRegistration {
class JavaSQLContext(val sqlContext: SQLContext) extends FunctionRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17777/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA results for PR 1063:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class SimpleFunctionRegistry extends FunctionRegistry {
protected[sql] trait UDFRegistration {
class JavaSQLContext(val sqlContext: SQLContext) extends UDFRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17779/consoleFull

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 2, 2014

QA results for PR 1063:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait OverrideFunctionRegistry extends FunctionRegistry {
class SimpleFunctionRegistry extends FunctionRegistry {
protected[sql] trait UDFRegistration {
class JavaSQLContext(val sqlContext: SQLContext) extends UDFRegistration {
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode {
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17781/consoleFull

@marmbrus
Copy link
Copy Markdown
Contributor Author

marmbrus commented Aug 2, 2014

Thanks for looking this over! I've merged to master and 1.1.

asfgit pushed a commit that referenced this pull request Aug 3, 2014
This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.

Scala:
```scala
registerFunction("strLenScala", (_: String).length)
sql("SELECT strLenScala('test')")
```
Python:
```python
sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
sqlCtx.sql("SELECT strLenPython('test')")
```
Java:
```java
sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
  Override
  public Integer call(String str) throws Exception {
    return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql("SELECT stringLengthJava('test')");
```

Author: Michael Armbrust <michael@databricks.com>

Closes #1063 from marmbrus/udfs and squashes the following commits:

9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.

(cherry picked from commit 158ad0b)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@asfgit asfgit closed this in 158ad0b Aug 3, 2014
@marmbrus marmbrus deleted the udfs branch August 27, 2014 20:57
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.

Scala:
```scala
registerFunction("strLenScala", (_: String).length)
sql("SELECT strLenScala('test')")
```
Python:
```python
sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
sqlCtx.sql("SELECT strLenPython('test')")
```
Java:
```java
sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
  Override
  public Integer call(String str) throws Exception {
    return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql("SELECT stringLengthJava('test')");
```

Author: Michael Armbrust <michael@databricks.com>

Closes apache#1063 from marmbrus/udfs and squashes the following commits:

9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.
@DanielMe
Copy link
Copy Markdown

Excuse my naive question, however, it seems that this does not use the regular Hive UDF API, right? (Like when I would run hiveContext.sql("CREATE TEMPORARY FUNCTION [...]") ). Is there any particular reason for that? I noticed, that UDFs created using this mechanism won't show up in the SHOW FUNCTIONS list. Would it be difficult to achieve that?

Also, the Hive API allows to add description strings to a UDF (which obviously only makes sense if you can use DESCRIBE FUNCTION). It would be nice if something similar would exists for UDFs defined over the spark interface.

@marmbrus
Copy link
Copy Markdown
Contributor Author

The biggest reason for the divergence is this API is much lighter weight (you can define functions in a single line, inline with the rest of your program). We can certainly consider adding more support for function listing metadata in the future, but you are the first to ask for this.

@DanielMe
Copy link
Copy Markdown

Okay, thanks for the clarification. Initially, I had naively assumed that the functionality you added was just a layer above the Hive API hence it was a bit confusing that SHOW FUNCTIONS did not list the UDFs. For my usecase I can easily work around that limitation so it's not that big of a deal.

wangyum added a commit that referenced this pull request May 26, 2023
* [SPARK-31168][BUILD] Upgrade Scala to 2.12.14

### What changes were proposed in this pull request?

This PR is the 4th try to upgrade Scala 2.12.x in order to see the feasibility.
- #27929 (Upgrade Scala to 2.12.11, wangyum )
- #30940 (Upgrade Scala to 2.12.12, viirya )
- #31223 (Upgrade Scala to 2.12.13, dongjoon-hyun )

Note that Scala 2.12.14 has the following fix for Apache Spark community.
- Fix cyclic error in runtime reflection (protobuf), a regression that prevented Spark upgrading to 2.12.13

REQUIREMENTS:
- [x] `silencer` library is released via ghik/silencer#66
- [x] `genjavadoc` library is released via lightbend/genjavadoc#282

### Why are the changes needed?

Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11/2.12.12/2.12.13. This will bring all the bug fixes.
- https://github.com/scala/scala/releases/tag/v2.12.14
- https://github.com/scala/scala/releases/tag/v2.12.13
- https://github.com/scala/scala/releases/tag/v2.12.12
- https://github.com/scala/scala/releases/tag/v2.12.11

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug-fixed version.

### How was this patch tested?

Pass the CIs.

Closes #32697 from dongjoon-hyun/SPARK-31168.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

(cherry picked from commit 6c4b60f)

* [SPARK-31168][BUILD] Upgrade Scala to 2.12.14

### What changes were proposed in this pull request?

This PR is the 4th try to upgrade Scala 2.12.x in order to see the feasibility.
- #27929 (Upgrade Scala to 2.12.11, wangyum )
- #30940 (Upgrade Scala to 2.12.12, viirya )
- #31223 (Upgrade Scala to 2.12.13, dongjoon-hyun )

Note that Scala 2.12.14 has the following fix for Apache Spark community.
- Fix cyclic error in runtime reflection (protobuf), a regression that prevented Spark upgrading to 2.12.13

REQUIREMENTS:
- [x] `silencer` library is released via ghik/silencer#66
- [x] `genjavadoc` library is released via lightbend/genjavadoc#282

### Why are the changes needed?

Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11/2.12.12/2.12.13. This will bring all the bug fixes.
- https://github.com/scala/scala/releases/tag/v2.12.14
- https://github.com/scala/scala/releases/tag/v2.12.13
- https://github.com/scala/scala/releases/tag/v2.12.12
- https://github.com/scala/scala/releases/tag/v2.12.11

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug-fixed version.

### How was this patch tested?

Pass the CIs.

Closes #32697 from dongjoon-hyun/SPARK-31168.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

(cherry picked from commit 6c4b60f)

* [SPARK-36759][BUILD] Upgrade Scala to 2.12.15

### What changes were proposed in this pull request?

This PR aims to upgrade Scala to 2.12.15 to support Java 17/18 better.

### Why are the changes needed?

Scala 2.12.15 improves compatibility with JDK 17 and 18:

https://github.com/scala/scala/releases/tag/v2.12.15

- Avoids IllegalArgumentException in JDK 17+ for lambda deserialization
- Upgrades to ASM 9.2, for JDK 18 support in optimizer

### Does this PR introduce _any_ user-facing change?

Yes, this is a Scala version change.

### How was this patch tested?

Pass the CIs

Closes #33999 from dongjoon-hyun/SPARK-36759.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

(cherry picked from commit 16f1f71)

* [SPARK-36759][BUILD][FOLLOWUP] Update version in scala-2.12 profile and doc

### What changes were proposed in this pull request?

This is a follow-up to fix the leftover during switching the Scala version.

### Why are the changes needed?

This should be consistent.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This is not tested by UT. We need to check manually. There is no more `2.12.14`.
```
$ git grep 2.12.14
R/pkg/tests/fulltests/test_sparkSQL.R:               c(as.Date("2012-12-14"), as.Date("2013-12-15"), as.Date("2014-12-16")))
data/mllib/ridge-data/lpsa.data:3.5307626,0.987291634724086 -0.36279314978779 -0.922212414640967 0.232904453212813 -0.522940888712441 1.79270085261407 0.342627053981254 1.26288870310799
sql/hive/src/test/resources/data/files/over10k:-3|454|65705|4294967468|62.12|14.32|true|mike white|2013-03-01 09:11:58.703087|40.18|joggying
```

Closes #34020 from dongjoon-hyun/SPARK-36759-2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

(cherry picked from commit adbea25)

* [SPARK-39414][BUILD] Upgrade Scala to 2.12.16

### What changes were proposed in this pull request?
This PR aims to upgrade Scala to 2.12.16

### Why are the changes needed?
This version bring some bug fix and  start to try to support Java 19

scala/scala@v2.12.15...v2.12.16

- [Upgrade to asm 9.3, for JDK19 support](scala/scala#10000)
- [Fix codegen for MH.invoke etc under JDK 17 -release](scala/scala#9930)
- [Deprecation related SecurityManager on JDK 17 ](scala/scala#9775)

### Does this PR introduce _any_ user-facing change?
Yes, this is a Scala version change.

### How was this patch tested?
Pass Github Actions

Closes #36807 from LuciferYang/SPARK-39414.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

(cherry picked from commit ed875a8)

* fix

* fix

* fix

* fix
�[0m[�[31merror�[0m] �[0m[warn] /home/jenkins/workspace/spark-sql-catalyst-3.0/core/src/main/scala/org/apache/spark/scheduler/SpillableTaskResultGetter.scala:36: non-variable type argument org.apache.spark.scheduler.DirectTaskResult[_] in type pattern scala.runtime.NonLocalReturnControl[org.apache.spark.scheduler.DirectTaskResult[_]] is unchecked since it is eliminated by erasure�[0m
�[0m[�[31merror�[0m] �[0m[warn] private[spark] class SpillableTaskResultGetter(sparkEnv: SparkEnv, scheduler: TaskSchedulerImpl)�[0m
�[0m[�[31merror�[0m] �[0m[warn] �[0m

* fix
�[0m[�[31merror�[0m] �[0m[warn] /home/jenkins/workspace/spark-sql-catalyst-3.0/mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala:287: match may not be exhaustive.�[0m
�[0m[�[31merror�[0m] �[0mIt would fail on the following input: ~(~(_, (x: String forSome x not in "^")), _)�[0m
�[0m[�[31merror�[0m] �[0m[warn]   private val pow: Parser[Term] = term ~ "^" ~ "^[1-9]\\d*".r ^^ {�[0m
�[0m[�[31merror�[0m] �[0m[warn] �[0m
�[0m[�[31merror�[0m] �[0m[warn] /home/jenkins/workspace/spark-sql-catalyst-3.0/mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala:301: match may not be exhaustive.�[0m
�[0m[�[31merror�[0m] �[0mIt would fail on the following input: ~(~(_, (x: String forSome x not in "~")), _)�[0m
�[0m[�[31merror�[0m] �[0m[warn]     (label ~ "~" ~ expr) ^^ { case r ~ "~" ~ t => ParsedRFormula(r, t.asTerms.terms) }�[0m
�[0m[�[31merror�[0m] �[0m[warn] �[0m

* fix

Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants