[SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly #31245

beliefer · 2021-01-19T14:55:13Z

What changes were proposed in this pull request?

The current implement of some DDL not unify the output and not pass the output properly to physical command.
Such as: The ShowTables output attributes namespace, but ShowTablesCommand output attributes database.

As the query plan, this PR pass the output attributes from ShowTables to ShowTablesCommand, ShowTableExtended to ShowTablesCommand.

Take show tables and show table extended like 'tbl' as example.
The output before this PR:
show tables

database	tableName	isTemporary
default	tbl	false

If catalog is v2 session catalog, the output before this PR:

namespace	tableName
default	tbl

show table extended like 'tbl'

database	tableName	isTemporary	information
default	tbl	false	Database: default...

The output after this PR:
show tables

namespace	tableName	isTemporary
default	tbl	false

show table extended like 'tbl'

namespace	tableName	isTemporary	information
default	tbl	false	Database: default...

Why are the changes needed?

This PR have benefits as follows:
First, Unify schema for the output of SHOW TABLES.
Second, pass the output attributes could keep the expr ID unchanged, so that avoid bugs when we apply more operators above the command output dataframe.

Does this PR introduce any user-facing change?

Yes.
The output schema of SHOW TABLES replace database by namespace.

How was this patch tested?

Jenkins test.

SparkQA · 2021-01-19T15:54:34Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38814/

SparkQA · 2021-01-19T16:42:52Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38814/

SparkQA · 2021-01-19T18:07:55Z

Test build #134229 has finished for PR 31245 at commit a28ac3f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

HyukjinKwon · 2021-01-20T00:46:05Z

cc @MaxGekk and @cloud-fan

AngersZhuuuu · 2021-01-20T02:17:38Z

I am doing https://issues.apache.org/jira/browse/SPARK-33630 now and since Show Tables's output attribute is not same between ShowTables and ShowTablesCommand cause error. I think this pr is important for https://issues.apache.org/jira/browse/SPARK-33630

beliefer · 2021-01-20T04:52:35Z

I am doing https://issues.apache.org/jira/browse/SPARK-33630 now and since Show Tables's output attribute is not same between ShowTables and ShowTablesCommand cause error. I think after this PR, this pr is important for https://issues.apache.org/jira/browse/SPARK-33630

Yes. Thanks for your comment.

AngersZhuuuu · 2021-01-20T05:03:33Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

Seems you missed information column?

The output should be delivered by logical nodes(ShowTables and ShowTableExtended) in normal.

The output should be delivered by logical nodes(ShowTables and ShowTableExtended) in normal.

Yea, got it， thanks.

SparkQA · 2021-01-20T06:45:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38840/

MaxGekk

This PR can break user code, potentially. Need to update the SQL migration guide at least.

MaxGekk · 2021-01-20T06:52:50Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

We have unified tests for SHOW TABLES - *.ShowTablesSuite. Please, put related tests there.

Thanks a lot!

SparkQA · 2021-01-20T07:14:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38840/

SparkQA · 2021-01-20T07:36:25Z

Test build #134254 has finished for PR 31245 at commit ff27d1f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-20T13:58:13Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38858/

SparkQA · 2021-01-20T14:47:10Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38858/

SparkQA · 2021-01-20T15:33:04Z

Test build #134271 has finished for PR 31245 at commit ca6291f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-21T08:00:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38900/

SparkQA · 2021-01-21T08:27:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38900/

SparkQA · 2021-01-21T09:39:23Z

Test build #134313 has finished for PR 31245 at commit 6f0196d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-21T11:49:01Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38919/

SparkQA · 2021-01-21T11:53:19Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38919/

SparkQA · 2021-01-21T14:47:18Z

Test build #134332 has finished for PR 31245 at commit 319e450.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-01-21T15:00:43Z

retest this please

SparkQA · 2021-01-21T18:59:15Z

Test build #134337 has finished for PR 31245 at commit 319e450.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-22T00:50:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38930/

SparkQA · 2021-01-22T00:54:53Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38930/

SparkQA · 2021-02-05T10:42:55Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39513/

SparkQA · 2021-02-05T12:21:24Z

Test build #134922 has finished for PR 31245 at commit 02c36fd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-02-05T12:25:33Z

docs/sql-migration-guide.md


  - In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`.
+
+  - In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.


Since Spark 3.2, to ... -> To ...

cloud-fan · 2021-02-05T12:25:47Z

docs/sql-migration-guide.md

+
+  - In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.
+
+  - In Spark 3.2, the output schema of `SHOW TABLE EXTENDED` becomes `namespace: string, tableName: string, isTemporary: boolean, information: string`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and no change for the v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.


cloud-fan · 2021-02-05T12:26:27Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

-    case ShowTables(DatabaseInSessionCatalog(db), pattern) =>
-      ShowTablesCommand(Some(db), pattern)
+    case ShowTables(DatabaseInSessionCatalog(db), pattern, output) =>
+      val requiredOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {


requiredOutput -> newOutput

cloud-fan · 2021-02-05T12:26:34Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

-        partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _)))) =>
+        partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _))),
+        output) =>
+      val requiredOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {


sql/core/src/test/resources/sql-tests/results/show-tables.sql.out

SparkQA · 2021-02-05T13:51:01Z

Test build #134930 has finished for PR 31245 at commit 28ad51e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-07T04:09:18Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39555/

SparkQA · 2021-02-07T04:42:48Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39555/

SparkQA · 2021-02-07T07:33:19Z

Test build #134972 has finished for PR 31245 at commit 3c73d4a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-07T09:03:14Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39570/

SparkQA · 2021-02-07T09:32:47Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39570/

SparkQA · 2021-02-07T12:32:13Z

Test build #134987 has finished for PR 31245 at commit 4cd60cc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-02-08T02:13:42Z

cc @cloud-fan

cloud-fan · 2021-02-08T08:39:56Z

The last commit just updates a comment, merging to master, thanks!

beliefer · 2021-02-08T08:42:32Z

@cloud-fan Thanks for your work! @HyukjinKwon @MaxGekk Thanks for your review!

SparkQA · 2021-02-08T09:14:13Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39598/

SparkQA · 2021-02-08T09:43:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39598/

SparkQA · 2021-02-08T12:45:02Z

Test build #135015 has finished for PR 31245 at commit 815d36b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-02-08T12:47:57Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala

    }
  }
+
+  test("SPARK-34157 Unify output of SHOW TABLES and pass output attributes properly") {


Let's keep the format consistent next time: SPARK-34157 :

The consistent format is SPARK-34157: w/o any gaps ;-)

HyukjinKwon · 2021-02-08T12:48:22Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala

+        assert(sql("show tables").schema.fieldNames.deep ==
+          Seq("namespace", "tableName", "isTemporary"))
+        assert(sql("show table extended like 'tbl'").collect()(0).length == 4)
+        assert(sql("show table extended like 'tbl'").schema.fieldNames.deep ==


Seems like this broke Scala 2.13 build. I made a followup here #31526

…using Array.deep ### What changes were proposed in this pull request? This PR is a followup of #31245: ``` [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:112:53: value deep is not a member of Array[String] [error] assert(sql("show tables").schema.fieldNames.deep == [error] ^ [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:115:72: value deep is not a member of Array[String] [error] assert(sql("show table extended like 'tbl'").schema.fieldNames.deep == [error] ^ [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:121:55: value deep is not a member of Array[String] [error] assert(sql("show tables").schema.fieldNames.deep == [error] ^ [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:124:74: value deep is not a member of Array[String] [error] assert(sql("show table extended like 'tbl'").schema.fieldNames.deep == [error] ^ ``` It broke Scala 2.13 build. This PR works around by using ScalaTests' `===` that can compare `Array`s safely. ### Why are the changes needed? To fix the build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? CI in this PR should test it out. Closes #31526 from HyukjinKwon/SPARK-34157. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

github-actions bot added the SQL label Jan 19, 2021

HyukjinKwon reviewed Jan 20, 2021

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed Jan 20, 2021

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala Outdated Show resolved Hide resolved

AngersZhuuuu reviewed Jan 20, 2021

View reviewed changes

MaxGekk reviewed Jan 20, 2021

View reviewed changes

AngersZhuuuu mentioned this pull request Jan 20, 2021

[SPARK-33630][SQL] Support SHOW TABLES command as table valued function #31257

Closed

github-actions bot added CORE PYTHON labels Jan 22, 2021

cloud-fan reviewed Feb 5, 2021

View reviewed changes

sql/core/src/test/resources/sql-tests/results/show-tables.sql.out Show resolved Hide resolved

Update code

3c73d4a

Optimize code

4cd60cc

Revert show-tables.sql

815d36b

cloud-fan approved these changes Feb 8, 2021

View reviewed changes

cloud-fan closed this in 2c243c9 Feb 8, 2021

HyukjinKwon mentioned this pull request Feb 8, 2021

[SPARK-34157][BUILD][FOLLOW-UP] Fix Scala 2.13 compilation error via using Array.deep #31526

Closed

HyukjinKwon reviewed Feb 8, 2021

View reviewed changes

MaxGekk mentioned this pull request Mar 1, 2021

[SPARK-34560][SQL] Generate unique output attributes in the SHOW TABLES logical node #31675

Closed


		- In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`.

		- In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.

[SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly #31245

[SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly #31245

Uh oh!

Conversation

beliefer commented Jan 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Jan 19, 2021

Uh oh!

SparkQA commented Jan 19, 2021

Uh oh!

SparkQA commented Jan 19, 2021

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon commented Jan 20, 2021

Uh oh!

AngersZhuuuu commented Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beliefer commented Jan 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

beliefer commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 22, 2021

Uh oh!

SparkQA commented Jan 22, 2021

Uh oh!

SparkQA commented Feb 5, 2021

Uh oh!

SparkQA commented Feb 5, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

beliefer commented Jan 19, 2021 •

edited

Loading

AngersZhuuuu commented Jan 20, 2021 •

edited

Loading

HyukjinKwon Feb 8, 2021 •

edited

Loading