Conversation
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/api/java/JavaSQLContext.scala sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
There was a problem hiding this comment.
Add a comment to say that a Hive UDF can be overridden?
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA tests have started for PR 1063. This patch merges cleanly. |
Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA tests have started for PR 1063. This patch merges cleanly. |
|
QA results for PR 1063: |
|
QA results for PR 1063: |
|
QA results for PR 1063: |
|
Thanks for looking this over! I've merged to master and 1.1. |
This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.
Scala:
```scala
registerFunction("strLenScala", (_: String).length)
sql("SELECT strLenScala('test')")
```
Python:
```python
sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
sqlCtx.sql("SELECT strLenPython('test')")
```
Java:
```java
sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
Override
public Integer call(String str) throws Exception {
return str.length();
}
}, DataType.IntegerType);
sqlContext.sql("SELECT stringLengthJava('test')");
```
Author: Michael Armbrust <michael@databricks.com>
Closes #1063 from marmbrus/udfs and squashes the following commits:
9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.
(cherry picked from commit 158ad0b)
Signed-off-by: Michael Armbrust <michael@databricks.com>
This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.
Scala:
```scala
registerFunction("strLenScala", (_: String).length)
sql("SELECT strLenScala('test')")
```
Python:
```python
sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
sqlCtx.sql("SELECT strLenPython('test')")
```
Java:
```java
sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
Override
public Integer call(String str) throws Exception {
return str.length();
}
}, DataType.IntegerType);
sqlContext.sql("SELECT stringLengthJava('test')");
```
Author: Michael Armbrust <michael@databricks.com>
Closes apache#1063 from marmbrus/udfs and squashes the following commits:
9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.
|
Excuse my naive question, however, it seems that this does not use the regular Hive UDF API, right? (Like when I would run Also, the Hive API allows to add description strings to a UDF (which obviously only makes sense if you can use |
|
The biggest reason for the divergence is this API is much lighter weight (you can define functions in a single line, inline with the rest of your program). We can certainly consider adding more support for function listing metadata in the future, but you are the first to ask for this. |
|
Okay, thanks for the clarification. Initially, I had naively assumed that the functionality you added was just a layer above the Hive API hence it was a bit confusing that |
* [SPARK-31168][BUILD] Upgrade Scala to 2.12.14 ### What changes were proposed in this pull request? This PR is the 4th try to upgrade Scala 2.12.x in order to see the feasibility. - #27929 (Upgrade Scala to 2.12.11, wangyum ) - #30940 (Upgrade Scala to 2.12.12, viirya ) - #31223 (Upgrade Scala to 2.12.13, dongjoon-hyun ) Note that Scala 2.12.14 has the following fix for Apache Spark community. - Fix cyclic error in runtime reflection (protobuf), a regression that prevented Spark upgrading to 2.12.13 REQUIREMENTS: - [x] `silencer` library is released via ghik/silencer#66 - [x] `genjavadoc` library is released via lightbend/genjavadoc#282 ### Why are the changes needed? Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11/2.12.12/2.12.13. This will bring all the bug fixes. - https://github.com/scala/scala/releases/tag/v2.12.14 - https://github.com/scala/scala/releases/tag/v2.12.13 - https://github.com/scala/scala/releases/tag/v2.12.12 - https://github.com/scala/scala/releases/tag/v2.12.11 ### Does this PR introduce _any_ user-facing change? Yes, but this is a bug-fixed version. ### How was this patch tested? Pass the CIs. Closes #32697 from dongjoon-hyun/SPARK-31168. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 6c4b60f) * [SPARK-31168][BUILD] Upgrade Scala to 2.12.14 ### What changes were proposed in this pull request? This PR is the 4th try to upgrade Scala 2.12.x in order to see the feasibility. - #27929 (Upgrade Scala to 2.12.11, wangyum ) - #30940 (Upgrade Scala to 2.12.12, viirya ) - #31223 (Upgrade Scala to 2.12.13, dongjoon-hyun ) Note that Scala 2.12.14 has the following fix for Apache Spark community. - Fix cyclic error in runtime reflection (protobuf), a regression that prevented Spark upgrading to 2.12.13 REQUIREMENTS: - [x] `silencer` library is released via ghik/silencer#66 - [x] `genjavadoc` library is released via lightbend/genjavadoc#282 ### Why are the changes needed? Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11/2.12.12/2.12.13. This will bring all the bug fixes. - https://github.com/scala/scala/releases/tag/v2.12.14 - https://github.com/scala/scala/releases/tag/v2.12.13 - https://github.com/scala/scala/releases/tag/v2.12.12 - https://github.com/scala/scala/releases/tag/v2.12.11 ### Does this PR introduce _any_ user-facing change? Yes, but this is a bug-fixed version. ### How was this patch tested? Pass the CIs. Closes #32697 from dongjoon-hyun/SPARK-31168. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 6c4b60f) * [SPARK-36759][BUILD] Upgrade Scala to 2.12.15 ### What changes were proposed in this pull request? This PR aims to upgrade Scala to 2.12.15 to support Java 17/18 better. ### Why are the changes needed? Scala 2.12.15 improves compatibility with JDK 17 and 18: https://github.com/scala/scala/releases/tag/v2.12.15 - Avoids IllegalArgumentException in JDK 17+ for lambda deserialization - Upgrades to ASM 9.2, for JDK 18 support in optimizer ### Does this PR introduce _any_ user-facing change? Yes, this is a Scala version change. ### How was this patch tested? Pass the CIs Closes #33999 from dongjoon-hyun/SPARK-36759. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 16f1f71) * [SPARK-36759][BUILD][FOLLOWUP] Update version in scala-2.12 profile and doc ### What changes were proposed in this pull request? This is a follow-up to fix the leftover during switching the Scala version. ### Why are the changes needed? This should be consistent. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is not tested by UT. We need to check manually. There is no more `2.12.14`. ``` $ git grep 2.12.14 R/pkg/tests/fulltests/test_sparkSQL.R: c(as.Date("2012-12-14"), as.Date("2013-12-15"), as.Date("2014-12-16"))) data/mllib/ridge-data/lpsa.data:3.5307626,0.987291634724086 -0.36279314978779 -0.922212414640967 0.232904453212813 -0.522940888712441 1.79270085261407 0.342627053981254 1.26288870310799 sql/hive/src/test/resources/data/files/over10k:-3|454|65705|4294967468|62.12|14.32|true|mike white|2013-03-01 09:11:58.703087|40.18|joggying ``` Closes #34020 from dongjoon-hyun/SPARK-36759-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit adbea25) * [SPARK-39414][BUILD] Upgrade Scala to 2.12.16 ### What changes were proposed in this pull request? This PR aims to upgrade Scala to 2.12.16 ### Why are the changes needed? This version bring some bug fix and start to try to support Java 19 scala/scala@v2.12.15...v2.12.16 - [Upgrade to asm 9.3, for JDK19 support](scala/scala#10000) - [Fix codegen for MH.invoke etc under JDK 17 -release](scala/scala#9930) - [Deprecation related SecurityManager on JDK 17 ](scala/scala#9775) ### Does this PR introduce _any_ user-facing change? Yes, this is a Scala version change. ### How was this patch tested? Pass Github Actions Closes #36807 from LuciferYang/SPARK-39414. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit ed875a8) * fix * fix * fix * fix �[0m[�[31merror�[0m] �[0m[warn] /home/jenkins/workspace/spark-sql-catalyst-3.0/core/src/main/scala/org/apache/spark/scheduler/SpillableTaskResultGetter.scala:36: non-variable type argument org.apache.spark.scheduler.DirectTaskResult[_] in type pattern scala.runtime.NonLocalReturnControl[org.apache.spark.scheduler.DirectTaskResult[_]] is unchecked since it is eliminated by erasure�[0m �[0m[�[31merror�[0m] �[0m[warn] private[spark] class SpillableTaskResultGetter(sparkEnv: SparkEnv, scheduler: TaskSchedulerImpl)�[0m �[0m[�[31merror�[0m] �[0m[warn] �[0m * fix �[0m[�[31merror�[0m] �[0m[warn] /home/jenkins/workspace/spark-sql-catalyst-3.0/mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala:287: match may not be exhaustive.�[0m �[0m[�[31merror�[0m] �[0mIt would fail on the following input: ~(~(_, (x: String forSome x not in "^")), _)�[0m �[0m[�[31merror�[0m] �[0m[warn] private val pow: Parser[Term] = term ~ "^" ~ "^[1-9]\\d*".r ^^ {�[0m �[0m[�[31merror�[0m] �[0m[warn] �[0m �[0m[�[31merror�[0m] �[0m[warn] /home/jenkins/workspace/spark-sql-catalyst-3.0/mllib/src/main/scala/org/apache/spark/ml/feature/RFormulaParser.scala:301: match may not be exhaustive.�[0m �[0m[�[31merror�[0m] �[0mIt would fail on the following input: ~(~(_, (x: String forSome x not in "~")), _)�[0m �[0m[�[31merror�[0m] �[0m[warn] (label ~ "~" ~ expr) ^^ { case r ~ "~" ~ t => ParsedRFormula(r, t.asTerms.terms) }�[0m �[0m[�[31merror�[0m] �[0m[warn] �[0m * fix Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: yangjie01 <yangjie01@baidu.com>
This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.
Scala:
Python:
Java: