Skip to content

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

#41687 added call_function and deprecate call_udf for Scala API.

Some times, the function name can be qualified, we should let users use it to invoke persistent functions as well.

Why are the changes needed?

Support qualified function name for call_function.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New test cases.

@github-actions github-actions bot added the SQL label Jul 11, 2023
@beliefer
Copy link
Contributor Author

beliefer commented Jul 11, 2023

@beliefer
Copy link
Contributor Author

The CI failure looks unrelated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test in HiveUDFSuite to make sure we can invoke a persist hive function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@zhengruifeng
Copy link
Contributor

will this also support Spark Connect?

@zhengruifeng
Copy link
Contributor

call_udf and callUDF directly invoke call_function, are we going to also make them support qualified name?

@beliefer
Copy link
Contributor Author

call_udf and callUDF directly invoke call_function, are we going to also make them support qualified name?

Consider the end users, I think we should keep the behavior of call_udf not be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a private method that takes a Seq[String], so that we can call it if a method does not want to support qualified names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a persist function. Can you check other tests in this file? we need to use CREATE FUNCTION to create persist functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@beliefer beliefer force-pushed the SPARK-44131_followup branch from 820fb59 to 3fcf654 Compare July 13, 2023 04:41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test calling the function with the qualified name? spark_catalog.default.custom_func

@zhengruifeng
Copy link
Contributor

@beliefer branch cut is soon, shall we also support it in Spark Connect? Otherwise, the behaviors will be different

@beliefer
Copy link
Contributor Author

@beliefer branch cut is soon, shall we also support it in Spark Connect? Otherwise, the behaviors will be different

It's better to support too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang would you mind help checking this part? I am not familiar with this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@LuciferYang LuciferYang Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run the following commands:

build/sbt clean
build/sbt "connect-client-jvm/test" -Phive

there is 1 test failed:

[info] - call_function *** FAILED *** (150 milliseconds)
[info]   org.apache.spark.SparkException: [CANNOT_LOAD_FUNCTION_CLASS] Cannot load class test.org.apache.spark.sql.MyDoubleSum when registering the function `spark_catalog`.`default`.`custom_sum`, please make sure it is on the classpath.
[info]   at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
[info]   at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
[info]   at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
[info]   at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:80)
[info]   at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:133)
[info]   at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:150)
[info]   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
[info]   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
[info]   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
[info]   at org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$139(ClientE2ETestSuite.scala:1175)
[info]   at org.apache.spark.sql.connect.client.util.RemoteSparkSession.$anonfun$test$1(RemoteSparkSession.scala:246)

@beliefer we should add (LocalProject("sql") / Test / Keys.package).value to

buildTestDeps := {
(LocalProject("assembly") / Compile / Keys.`package`).value
(LocalProject("catalyst") / Test / Keys.`package`).value
},

then the sql test jar will build&package before testing.

For maven, let me do more check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang Thank you for you reminder. I will add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maven test ClientE2ETestSuite and ReplE2ESuite is ok, but there are another 68 maven tests failed of connect-jvm-client module, will tracking with a new ticket. @zhengruifeng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, in my local testing:

  • master branch: All tests passed.
  • with this pr: 68 TESTS FAILED

Copy link
Contributor

@LuciferYang LuciferYang Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is truly unrelated to this pr and I think the way the –jars is being used in the code is incorrect now.

When submitting the args as

--jars spark-catalyst-xx.jar
--jars spark-connect-client-jvm-xx.jar
--jars spark-sql-xx.jar

the final effective arg will be --jars spark-sql-xx.jar, if we enable debugging logs, we will found that only the Added JAR logs related to spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar and spark-connect_2.12-3.5.0-SNAPSHOT.jar are present.

23/07/19 14:00:34 INFO SparkContext: Added JAR file:///Users/yangjie01/SourceCode/git/spark-mine-12/sql/core/target/spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar at spark://localhost:56841/jars/spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar with timestamp 1689746434318
23/07/19 14:00:34 INFO SparkContext: Added JAR file:/Users/yangjie01/SourceCode/git/spark-mine-12/connector/connect/server/target/spark-connect_2.12-3.5.0-SNAPSHOT.jar at spark://localhost:56841/jars/spark-connect_2.12-3.5.0-SNAPSHOT.jar with timestamp 1689746434318

and the configuration item “spark.jars” also only includes these two jars.

Array((spark.app.name,org.apache.spark.sql.connect.SimpleSparkConnectService), (spark.jars,file:///Users/yangjie01/SourceCode/git/spark-mine-12/sql/core/target/spark-sql_2.12-3.5.0-SNAPSHOT-tests.jar,file:/Users/yangjie01/SourceCode/git/spark-mine-12/connector/connect/server/target/spark-connect_2.12-3.5.0-SNAPSHOT.jar), ...

We should correct the usage of –jars to --jars spark-catalyst-xx.jar,spark-connect-client-jvm-xx.jar,spark-sql-xx.jar, then the maven test should pass.

I think we can merge this pr first and then fix this issue separately. But, @beliefer if you prefer, you can also address this issue in this one :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang Thank you for the investigation. I will take it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both maven and sbt ok now, thanks @beliefer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the double check. @LuciferYang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* function name that can be qualified using the SQL syntax
* function name that follows the SQL identifier syntax (can be quoted, can be qualified)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call a SQL function. It supports any function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case "call_function" if fun.getArgumentsCount > 1 =>
case "call_function" if fun.getArgumentsCount >= 1 =>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support no-arg function as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a new proto message for it? Currently it may conflict with calling a temp function named call_function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Good suggestion.

@beliefer beliefer changed the title [SPARK-44131][SQL][FOLLOWUP] Support qualified function name for call_function [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function Jul 18, 2023
@beliefer beliefer force-pushed the SPARK-44131_followup branch from 37ec1f6 to c553691 Compare July 18, 2023 07:35
@github-actions github-actions bot added the BUILD label Jul 18, 2023
@beliefer beliefer force-pushed the SPARK-44131_followup branch from f220e63 to 1891af5 Compare July 18, 2023 08:17
Copy link
Contributor

@cloud-fan cloud-fan Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to not test persist functions in spark connect, as it seems hard to include the jar containing the UDF. The client-side implementation is quite simple: constructs a small proto message and server-side turns it to UnresolvedFunction. Making sure it works for builtin function is good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the issue. Please wait for the CI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth it? @zhengruifeng

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have communicated with @zhengruifeng and he agreed your opinion. Let's remove the test case for connect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think we don't need to include this change in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (Required) Name of the SQL function.
// (Required) Unparsed name of the SQL function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update the doc in all places?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make sure the docs are consistent in all places.

@beliefer beliefer force-pushed the SPARK-44131_followup branch from 411fdb7 to 7456dc5 Compare July 19, 2023 09:51
@github-actions github-actions bot removed the CORE label Jul 19, 2023
@beliefer
Copy link
Contributor Author

The CI failure is unrelated to this PR.

@github-actions github-actions bot removed the BUILD label Jul 24, 2023
LuciferYang pushed a commit that referenced this pull request Jul 24, 2023
…rameters for jars

### What changes were proposed in this pull request?
#41932 try to add test case for connect, then we found the maven build failure based on the bug discussed at #41932 (comment)

After some communication, cloud-fan and zhengruifeng suggested to ignore the test case for connect.
So I commit this PR to fix the bug.

### Why are the changes needed?
Fix the bug that `SparkConnectServerUtils` generated incorrect parameters for jars.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
N/A

Closes #42121 from beliefer/SPARK-44519.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
LuciferYang pushed a commit that referenced this pull request Jul 24, 2023
…rameters for jars

### What changes were proposed in this pull request?
#41932 try to add test case for connect, then we found the maven build failure based on the bug discussed at #41932 (comment)

After some communication, cloud-fan and zhengruifeng suggested to ignore the test case for connect.
So I commit this PR to fix the bug.

### Why are the changes needed?
Fix the bug that `SparkConnectServerUtils` generated incorrect parameters for jars.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
N/A

Closes #42121 from beliefer/SPARK-44519.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 4644344)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@cloud-fan
Copy link
Contributor

the failure is unrelated, merging to master/3.5, thanks!

@cloud-fan cloud-fan closed this in d97a4e2 Jul 25, 2023
cloud-fan pushed a commit that referenced this pull request Jul 25, 2023
…ion name for call_function

### What changes were proposed in this pull request?
#41687 added `call_function` and deprecate `call_udf` for Scala API.

Some times, the function name can be qualified, we should let users use it to invoke persistent functions as well.

### Why are the changes needed?
Support qualified function name for `call_function`.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes #41932 from beliefer/SPARK-44131_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d97a4e2)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@beliefer
Copy link
Contributor Author

@cloud-fan @zhengruifeng @LuciferYang Thank you for all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants