[SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function #41932

beliefer · 2023-07-11T05:24:56Z

What changes were proposed in this pull request?

#41687 added call_function and deprecate call_udf for Scala API.

Some times, the function name can be qualified, we should let users use it to invoke persistent functions as well.

Why are the changes needed?

Support qualified function name for call_function.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New test cases.

beliefer · 2023-07-11T05:33:44Z

ping @cloud-fan cc @HyukjinKwon @zhengruifeng @dongjoon-hyun

beliefer · 2023-07-12T01:04:33Z

The CI failure looks unrelated.

cloud-fan · 2023-07-12T04:12:10Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala

can we add a test in HiveUDFSuite to make sure we can invoke a persist hive function?

zhengruifeng · 2023-07-12T04:31:31Z

will this also support Spark Connect?

zhengruifeng · 2023-07-12T04:55:41Z

call_udf and callUDF directly invoke call_function, are we going to also make them support qualified name?

beliefer · 2023-07-12T06:00:38Z

call_udf and callUDF directly invoke call_function, are we going to also make them support qualified name?

Consider the end users, I think we should keep the behavior of call_udf not be changed.

cloud-fan · 2023-07-13T01:00:49Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

shall we add a private method that takes a Seq[String], so that we can call it if a method does not want to support qualified names?

cloud-fan · 2023-07-13T01:01:39Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala

this is not a persist function. Can you check other tests in this file? we need to use CREATE FUNCTION to create persist functions.

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

cloud-fan · 2023-07-14T02:15:28Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala

can we also test calling the function with the qualified name? spark_catalog.default.custom_func

zhengruifeng · 2023-07-14T05:33:05Z

@beliefer branch cut is soon, shall we also support it in Spark Connect? Otherwise, the behaviors will be different

beliefer · 2023-07-14T06:50:47Z

@beliefer branch cut is soon, shall we also support it in Spark Connect? Otherwise, the behaviors will be different

It's better to support too.

zhengruifeng · 2023-07-18T02:35:15Z

.../client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala

@LuciferYang would you mind help checking this part? I am not familiar with this

run the following commands:

build/sbt clean build/sbt "connect-client-jvm/test" -Phive

there is 1 test failed:

[info] - call_function *** FAILED *** (150 milliseconds) [info] org.apache.spark.SparkException: [CANNOT_LOAD_FUNCTION_CLASS] Cannot load class test.org.apache.spark.sql.MyDoubleSum when registering the function `spark_catalog`.`default`.`custom_sum`, please make sure it is on the classpath. [info] at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53) [info] at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30) [info] at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38) [info] at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:80) [info] at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:133) [info] at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:150) [info] at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813) [info] at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252) [info] at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812) [info] at org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$139(ClientE2ETestSuite.scala:1175) [info] at org.apache.spark.sql.connect.client.util.RemoteSparkSession.$anonfun$test$1(RemoteSparkSession.scala:246)

@beliefer we should add (LocalProject("sql") / Test / Keys.package).value to

spark/project/SparkBuild.scala

Lines 875 to 878 in 228b5db

buildTestDeps := {

(LocalProject("assembly") / Compile / Keys.`package`).value

(LocalProject("catalyst") / Test / Keys.`package`).value

},

then the sql test jar will build&package before testing.

For maven, let me do more check

@LuciferYang Thank you for you reminder. I will add it.

maven test ClientE2ETestSuite and ReplE2ESuite is ok, but there are another 68 maven tests failed of connect-jvm-client module, will tracking with a new ticket. @zhengruifeng

To clarify, in my local testing:

master branch: All tests passed.

with this pr: 68 TESTS FAILED

I think this is truly unrelated to this pr and I think the way the –jars is being used in the code is incorrect now.

When submitting the args as

--jars spark-catalyst-xx.jar --jars spark-connect-client-jvm-xx.jar --jars spark-sql-xx.jar

the final effective arg will be --jars spark-sql-xx.jar, if we enable debugging logs, we will found that only the Added JAR logs related to spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar and spark-connect_2.12-3.5.0-SNAPSHOT.jar are present.

23/07/19 14:00:34 INFO SparkContext: Added JAR file:///Users/yangjie01/SourceCode/git/spark-mine-12/sql/core/target/spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar at spark://localhost:56841/jars/spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar with timestamp 1689746434318 23/07/19 14:00:34 INFO SparkContext: Added JAR file:/Users/yangjie01/SourceCode/git/spark-mine-12/connector/connect/server/target/spark-connect_2.12-3.5.0-SNAPSHOT.jar at spark://localhost:56841/jars/spark-connect_2.12-3.5.0-SNAPSHOT.jar with timestamp 1689746434318

and the configuration item “spark.jars” also only includes these two jars.

Array((spark.app.name,org.apache.spark.sql.connect.SimpleSparkConnectService), (spark.jars,file:///Users/yangjie01/SourceCode/git/spark-mine-12/sql/core/target/spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar,file:/Users/yangjie01/SourceCode/git/spark-mine-12/connector/connect/server/target/spark-connect_2.12-3.5.0-SNAPSHOT.jar), ...

We should correct the usage of –jars to --jars spark-catalyst-xx.jar,spark-connect-client-jvm-xx.jar,spark-sql-xx.jar, then the maven test should pass.

I think we can merge this pr first and then fix this issue separately. But, @beliefer if you prefer, you can also address this issue in this one :)

@LuciferYang Thank you for the investigation. I will take it.

Both maven and sbt ok now, thanks @beliefer

Thank you for the double check. @LuciferYang

cloud-fan · 2023-07-18T02:44:13Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala

Suggested change

* function name that can be qualified using the SQL syntax

* function name that follows the SQL identifier syntax (can be quoted, can be qualified)

cloud-fan · 2023-07-18T02:46:12Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala

Call a SQL function. It supports any function

cloud-fan · 2023-07-18T02:46:41Z

...connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala

Suggested change

case "call_function" if fun.getArgumentsCount > 1 =>

case "call_function" if fun.getArgumentsCount >= 1 =>

We should support no-arg function as well.

cloud-fan · 2023-07-18T02:47:43Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala

shall we add a new proto message for it? Currently it may conflict with calling a temp function named call_function.

Yeah. Good suggestion.

cloud-fan · 2023-07-19T03:47:15Z

connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala

I think it's OK to not test persist functions in spark connect, as it seems hard to include the jar containing the UDF. The client-side implementation is quite simple: constructs a small proto message and server-side turns it to UnresolvedFunction. Making sure it works for builtin function is good enough.

I have fixed the issue. Please wait for the CI

Is it really worth it? @zhengruifeng

I have communicated with @zhengruifeng and he agreed your opinion. Let's remove the test case for connect.

+1, I think we don't need to include this change in this PR

cloud-fan · 2023-07-19T03:47:51Z

connector/connect/common/src/main/protobuf/spark/connect/expressions.proto

Suggested change

// (Required) Name of the SQL function.

// (Required) Unparsed name of the SQL function.

cloud-fan · 2023-07-19T03:49:35Z

python/pyspark/sql/functions.py

can we update the doc in all places?

cloud-fan · 2023-07-19T03:50:04Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

let's make sure the docs are consistent in all places.

beliefer · 2023-07-20T05:00:03Z

The CI failure is unrelated to this PR.

…rameters for jars ### What changes were proposed in this pull request? #41932 try to add test case for connect, then we found the maven build failure based on the bug discussed at #41932 (comment) After some communication, cloud-fan and zhengruifeng suggested to ignore the test case for connect. So I commit this PR to fix the bug. ### Why are the changes needed? Fix the bug that `SparkConnectServerUtils` generated incorrect parameters for jars. ### Does this PR introduce _any_ user-facing change? 'No'. Just update the inner implementation. ### How was this patch tested? N/A Closes #42121 from beliefer/SPARK-44519. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

…rameters for jars ### What changes were proposed in this pull request? #41932 try to add test case for connect, then we found the maven build failure based on the bug discussed at #41932 (comment) After some communication, cloud-fan and zhengruifeng suggested to ignore the test case for connect. So I commit this PR to fix the bug. ### Why are the changes needed? Fix the bug that `SparkConnectServerUtils` generated incorrect parameters for jars. ### Does this PR introduce _any_ user-facing change? 'No'. Just update the inner implementation. ### How was this patch tested? N/A Closes #42121 from beliefer/SPARK-44519. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4644344) Signed-off-by: yangjie01 <yangjie01@baidu.com>

cloud-fan · 2023-07-25T00:53:40Z

the failure is unrelated, merging to master/3.5, thanks!

…ion name for call_function ### What changes were proposed in this pull request? #41687 added `call_function` and deprecate `call_udf` for Scala API. Some times, the function name can be qualified, we should let users use it to invoke persistent functions as well. ### Why are the changes needed? Support qualified function name for `call_function`. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes #41932 from beliefer/SPARK-44131_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d97a4e2) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

beliefer · 2023-07-25T02:09:07Z

@cloud-fan @zhengruifeng @LuciferYang Thank you for all!

github-actions bot added the SQL label Jul 11, 2023

cloud-fan reviewed Jul 12, 2023

View reviewed changes

cloud-fan reviewed Jul 13, 2023

View reviewed changes

beliefer force-pushed the SPARK-44131_followup branch from 820fb59 to 3fcf654 Compare July 13, 2023 04:41

cloud-fan reviewed Jul 14, 2023

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/functions.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Jul 14, 2023

View reviewed changes

beliefer force-pushed the SPARK-44131_followup branch from 58aad28 to 4513675 Compare July 16, 2023 07:06

github-actions bot added CORE PYTHON CONNECT labels Jul 16, 2023

beliefer force-pushed the SPARK-44131_followup branch from 4513675 to 80323e9 Compare July 16, 2023 07:36

zhengruifeng reviewed Jul 18, 2023

View reviewed changes

cloud-fan reviewed Jul 18, 2023

View reviewed changes

beliefer changed the title ~~[SPARK-44131][SQL][FOLLOWUP] Support qualified function name for call_function~~ [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function Jul 18, 2023

beliefer force-pushed the SPARK-44131_followup branch from 37ec1f6 to c553691 Compare July 18, 2023 07:35

github-actions bot added the BUILD label Jul 18, 2023

beliefer force-pushed the SPARK-44131_followup branch from f220e63 to 1891af5 Compare July 18, 2023 08:17

beliefer added 2 commits July 19, 2023 11:00

[SPARK-44131][SQL][FOLLOWUP] Parse the function name for call_function

7753fed

Update code

d313956

cloud-fan reviewed Jul 19, 2023

View reviewed changes

python/pyspark/sql/functions.py Outdated

Copy link

Contributor

cloud-fan Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update the doc in all places?

cloud-fan reviewed Jul 19, 2023

View reviewed changes

Update code

7456dc5

beliefer force-pushed the SPARK-44131_followup branch from 411fdb7 to 7456dc5 Compare July 19, 2023 09:51

github-actions bot removed the CORE label Jul 19, 2023

beliefer mentioned this pull request Jul 24, 2023

[SPARK-44519][CONNECT] SparkConnectServerUtils generated incorrect parameters for jars #42121

Closed

Remove test for connect

691fb9b

github-actions bot removed the BUILD label Jul 24, 2023

cloud-fan approved these changes Jul 25, 2023

View reviewed changes

cloud-fan closed this in d97a4e2 Jul 25, 2023

	buildTestDeps := {
	(LocalProject("assembly") / Compile / Keys.`package`).value
	(LocalProject("catalyst") / Test / Keys.`package`).value
	},

	* function name that can be qualified using the SQL syntax
	* function name that follows the SQL identifier syntax (can be quoted, can be qualified)

	case "call_function" if fun.getArgumentsCount > 1 =>
	case "call_function" if fun.getArgumentsCount >= 1 =>

	// (Required) Name of the SQL function.
	// (Required) Unparsed name of the SQL function.

[SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function #41932

[SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function #41932

Uh oh!

Conversation

beliefer commented Jul 11, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

beliefer commented Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beliefer commented Jul 12, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Jul 12, 2023

Uh oh!

zhengruifeng commented Jul 12, 2023

Uh oh!

beliefer commented Jul 12, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Jul 14, 2023

Uh oh!

beliefer commented Jul 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beliefer commented Jul 11, 2023 •

edited

Loading

LuciferYang Jul 18, 2023 •

edited

Loading

LuciferYang Jul 19, 2023 •

edited

Loading

cloud-fan Jul 19, 2023 •

edited

Loading