[SPARK-35378][SQL] Eagerly execute commands in QueryExecution instead of caller sides #32513

beliefer · 2021-05-12T03:21:44Z

What changes were proposed in this pull request?

Currently, Spark eagerly executes commands on the caller side of QueryExecution, which is a bit hacky as QueryExecution is not aware of it and leads to confusion.

For example, if you run sql("show tables").collect(), you will see two queries with identical query plans in the web UI.

The first query is triggered at Dataset.logicalPlan, which eagerly executes the command.
The second query is triggered at Dataset.collect, which is the normal query execution.

From the web UI, it's hard to tell that these two queries are caused by eager command execution.

This PR proposes to move the eager command execution to QueryExecution, and turn the command plan to CommandResult to indicate that command has been executed already. Now sql("show tables").collect() still triggers two queries, but the quey plans are not identical. The second query becomes:

In addition to the UI improvements, this PR also has other benefits:

Simplifies code as caller side no need to worry about eager command execution. QueryExecution takes care of it.
It helps [SPARK-35283][SQL] Support query some DDL with CTES #32442 , where there can be more plan nodes above commands, and we need to replace commands with something like local relation that produces unsafe rows.

Why are the changes needed?

Explained above.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing tests

SparkQA · 2021-05-12T04:22:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42941/

SparkQA · 2021-05-12T04:22:56Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42941/

SparkQA · 2021-05-12T07:46:31Z

Test build #138416 has finished for PR 32513 at commit 40cea6b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-05-12T07:50:43Z

ping @cloud-fan @wangyum @maropu @viirya

cloud-fan · 2021-05-13T17:34:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

    // We can't clone `logical` here, which will reset the `_analyzed` flag.
-    sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker)
+    sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) match {
+      case c: Command => c


Why do we leave out the root node Command? Can we remove the command execution logic in Dataset.logicalPlan and execute all the commands here?

I want to maintain the Command here and ensure its behavior.

OK. Let's unify the behavior eagerly execute the commands.

SparkQA · 2021-05-14T04:38:37Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43064/

SparkQA · 2021-05-14T04:40:11Z

Test build #138543 has finished for PR 32513 at commit a82ed76.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-05-14T04:50:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

-    sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker)
+    sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) transform {
+      // SPARK-35378: Eagerly execute LeafRunnableCommand so that query command with CTE
+      case r: LeafRunnableCommand =>


we should run all Commands, not just LeafRunnableCommand

SparkQA · 2021-05-17T10:49:11Z

Test build #138624 has finished for PR 32513 at commit 4c9b3cf.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AgeExample(birthday: Expression, child: Expression) extends RuntimeReplaceable
class SessionExtensionsWithLoader extends SparkSessionExtensionsProvider
class SessionExtensionsWithoutLoader extends SparkSessionExtensionsProvider
case class AvroWrite(
case class KafkaWrite(
case class TryEval(child: Expression) extends UnaryExpression with NullIntolerant
case class TryAdd(left: Expression, right: Expression, child: Expression)
case class TryDivide(left: Expression, right: Expression, child: Expression)
new RuntimeException(s\"Failed to convert value $value (class of $cls) \" +
trait FileWrite extends Write
case class CSVWrite(
case class JsonWrite(
case class OrcWrite(
case class ParquetWrite(
case class TextWrite(
class ConsoleWrite(schema: StructType, options: CaseInsensitiveStringMap)
class ForeachWrite[T](
class MemoryWrite(sink: MemorySink, schema: StructType, needTruncate: Boolean) extends Write

SparkQA · 2021-05-17T10:54:53Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43144/

SparkQA · 2021-05-17T11:06:50Z

Test build #138626 has finished for PR 32513 at commit e3b8454.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-17T11:08:19Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43146/

SparkQA · 2021-05-17T12:27:10Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43148/

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

cloud-fan · 2021-05-17T14:48:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

+      case c: Command =>
+        val subQueryExecution = sparkSession.sessionState.executePlan(c)
+        LocalRelation(c.output,
+          subQueryExecution.executedPlan.executeCollect(), false, Some(subQueryExecution.id))


How about we create new query plan nodes: CommandResult and CommandResultExec

case class CommandResult(qe: QueryExecution) extends LeadNode { def innerChildren = Seq(qe.analyzedPlan) def output = qe.logicalPlan.output } case class CommandResultExec(qe: QueryExecution) extends LeafNodeExec { def innerChildren = Seq(qe.executedPlan) def output = qe.logicalPlan.output }

then both UI and EXPLAIN can have pretty output.

SparkQA · 2021-05-17T16:03:47Z

Test build #138628 has finished for PR 32513 at commit 6516cc4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-08T09:10:05Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43985/

SparkQA · 2021-06-08T09:44:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43985/

SparkQA · 2021-06-08T12:31:55Z

Test build #139462 has finished for PR 32513 at commit 83d2710.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-08T14:54:56Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala

   * callback functions.
   */
-  private def runCommand(name: String)(command: LogicalPlan): Unit = {
+  private def runCommand()(command: LogicalPlan): Unit = {


nit: shall this just be private def runCommand(command: LogicalPlan)?

SparkQA · 2021-06-08T17:19:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44018/

SparkQA · 2021-06-08T17:58:53Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44018/

SparkQA · 2021-06-08T19:29:39Z

Test build #139494 has finished for PR 32513 at commit d15e166.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-09T04:45:39Z

thanks, merging to master!

beliefer · 2021-06-09T04:47:05Z

@cloud-fan Thanks for your hard work! @viirya @yaooqinn Thanks for review too.

cloud-fan · 2021-06-15T16:08:29Z

sql/core/src/main/scala/org/apache/spark/sql/expressions/CommandResult.scala

+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.expressions


This is a public package which makes CommandResult a public API. This is unexpected. We should move this class to org.apache.spark.sql.catalyst.plans.logical. @beliefer can you help to make this change?

…ataFrameWriterV2 ### What changes were proposed in this pull request? This is a followup of #32513 It's hard to keep the command execution name for `DataFrameWriter`, as the command logical plan is a bit messy (DS v1, file source and hive and different command logical plans) and sometimes it's hard to distinguish "insert" and "save". However, `DataFrameWriterV2` only produce v2 commands which are pretty clean. It's easy to keep the command execution name for them. ### Why are the changes needed? less breaking changes. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #32919 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ical ### What changes were proposed in this pull request? #32513 added the case class `CommandResult` in package `org.apache.spark.sql.expression`. It is not suitable, so this PR move `CommandResult` from `org.apache.spark.sql.expression` to `org.apache.spark.sql.catalyst.plans.logical`. ### Why are the changes needed? Make `CommandResult` in suitable package. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? No need. Closes #32942 from beliefer/SPARK-35378-followup. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? #32513 added the case class `CommandResult` so as we can eagerly execute command locally. But we forgot to update `isLocal` of `Dataset`. ### Why are the changes needed? `Dataset.isLocal` should consider `CommandResult`. ### Does this PR introduce _any_ user-facing change? Yes. If the SQL plan is `CommandResult`, `Dataset.isLocal` must return true. ### How was this patch tested? No test. Closes #32963 from beliefer/SPARK-35378-followup2. Authored-by: gengjiaan <gengjiaan@360.cn> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ultExec.executeCollect() ### What changes were proposed in this pull request? This PR is a follow-up for #32513 and fixes an issue introduced by that patch. CommandResultExec is supposed to return `UnsafeRow` records in all of the `executeXYZ` methods but `executeCollect` was left out which causes issues like this one: ``` Error in SQL statement: ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow ``` We need to return `unsafeRows` instead of `rows` in `executeCollect` similar to other methods in the class. ### Why are the changes needed? Fixes a bug in CommandResultExec. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a unit test to check the return type of all commands. Closes #36632 from sadikovi/fix-command-exec. Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ultExec.executeCollect() ### What changes were proposed in this pull request? This PR is a follow-up for #32513 and fixes an issue introduced by that patch. CommandResultExec is supposed to return `UnsafeRow` records in all of the `executeXYZ` methods but `executeCollect` was left out which causes issues like this one: ``` Error in SQL statement: ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow ``` We need to return `unsafeRows` instead of `rows` in `executeCollect` similar to other methods in the class. ### Why are the changes needed? Fixes a bug in CommandResultExec. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a unit test to check the return type of all commands. Closes #36632 from sadikovi/fix-command-exec. Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit a0decfc) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ultExec.executeCollect() ### What changes were proposed in this pull request? This PR is a follow-up for apache#32513 and fixes an issue introduced by that patch. CommandResultExec is supposed to return `UnsafeRow` records in all of the `executeXYZ` methods but `executeCollect` was left out which causes issues like this one: ``` Error in SQL statement: ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow ``` We need to return `unsafeRows` instead of `rows` in `executeCollect` similar to other methods in the class. ### Why are the changes needed? Fixes a bug in CommandResultExec. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a unit test to check the return type of all commands. Closes apache#36632 from sadikovi/fix-command-exec. Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit a0decfc) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

beliefer added 2 commits May 12, 2021 11:15

Support LeafRunnableCommand as sub query

dad709c

Support LeafRunnableCommand as sub query

0338a23

github-actions bot added the SQL label May 12, 2021

Support LeafRunnableCommand as sub query

40cea6b

cloud-fan reviewed May 13, 2021

View reviewed changes

beliefer changed the title ~~[SPARK-35378][SQL] Convert LeafRunnableCommand to LocalRelation when query with CTE~~ [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE May 14, 2021

Unify the behavior eagerly execute the commands

a82ed76

cloud-fan reviewed May 14, 2021

View reviewed changes

beliefer changed the title ~~[SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE~~ [SPARK-35378][SQL] Eagerly execute Command so that query command with CTE May 14, 2021

beliefer and others added 4 commits May 17, 2021 18:25

Update code

bf296fb

Merge branch 'master' into SPARK-35378

babe0d0

Update code

803a12a

Merge branch 'SPARK-35378' of github.com:beliefer/spark into SPARK-35378

4c9b3cf

Update code

e3b8454

Update code

6516cc4

cloud-fan reviewed May 17, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 17, 2021

View reviewed changes

Optimize code

ddbb5bb

Update code

8054799

cloud-fan reviewed Jun 8, 2021

View reviewed changes

Update code

d15e166

cloud-fan closed this in 8013f98 Jun 9, 2021

beliefer mentioned this pull request Jun 10, 2021

[SPARK-35283][SQL] Support query some DDL with CTES #32852

Closed

cloud-fan mentioned this pull request Jun 15, 2021

[SPARK-35378][SQL][FOLLOWUP] Restore the command execution name for DataFrameWriterV2 #32919

Closed

cloud-fan reviewed Jun 15, 2021

View reviewed changes

beliefer mentioned this pull request Jun 17, 2021

[SPARK-35378][SQL][FOLLOWUP] Move CommandResult to catalyst.plans.logical #32942

Closed

beliefer mentioned this pull request Jun 18, 2021

[SPARK-35378][SQL][FOLLOWUP] isLocal should consider CommandResult #32963

Closed

sadikovi mentioned this pull request May 23, 2022

[SPARK-35378][SQL][FOLLOW-UP] Fix incorrect return type in CommandResultExec.executeCollect() #36632

Closed

lsm1 mentioned this pull request Nov 17, 2022

[KYUUBI #3435] Command should not execute when plan only mode is set to PHYSICAL or EXECUTION apache/kyuubi#3439

Closed

3 tasks

wForget mentioned this pull request Jul 5, 2024

[SPARK-48817][SQL] Eagerly execute union multi commands together #47224

Closed

[SPARK-35378][SQL] Eagerly execute commands in QueryExecution instead of caller sides #32513

[SPARK-35378][SQL] Eagerly execute commands in QueryExecution instead of caller sides #32513

Uh oh!

Conversation

beliefer commented May 12, 2021 • edited by cloud-fan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented May 12, 2021

Uh oh!

SparkQA commented May 12, 2021

Uh oh!

SparkQA commented May 12, 2021

Uh oh!

beliefer commented May 12, 2021

Uh oh!

cloud-fan May 13, 2021

Choose a reason for hiding this comment

Uh oh!

beliefer May 14, 2021

Choose a reason for hiding this comment

Uh oh!

beliefer May 14, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 14, 2021

Uh oh!

SparkQA commented May 14, 2021

Uh oh!

cloud-fan May 14, 2021

Choose a reason for hiding this comment

Uh oh!

beliefer May 14, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 17, 2021

Uh oh!

SparkQA commented May 17, 2021

Uh oh!

SparkQA commented May 17, 2021

Uh oh!

SparkQA commented May 17, 2021

Uh oh!

SparkQA commented May 17, 2021

Uh oh!

Uh oh!

cloud-fan May 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 17, 2021

Uh oh!

SparkQA commented Jun 8, 2021

Uh oh!

SparkQA commented Jun 8, 2021

Uh oh!

SparkQA commented Jun 8, 2021

Uh oh!

cloud-fan Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 8, 2021

Uh oh!

SparkQA commented Jun 8, 2021

Uh oh!

SparkQA commented Jun 8, 2021

Uh oh!

cloud-fan commented Jun 9, 2021

Uh oh!

beliefer commented Jun 9, 2021

Uh oh!

cloud-fan Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beliefer Jun 16, 2021

Choose a reason for hiding this comment

beliefer commented May 12, 2021 •

edited by cloud-fan

Loading

cloud-fan May 17, 2021 •

edited

Loading

cloud-fan Jun 15, 2021 •

edited

Loading