Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Jan 19, 2021

What changes were proposed in this pull request?

The current implement of some DDL not unify the output and not pass the output properly to physical command.
Such as: The ShowTables output attributes namespace, but ShowTablesCommand output attributes database.

As the query plan, this PR pass the output attributes from ShowTables to ShowTablesCommand, ShowTableExtended to ShowTablesCommand.

Take show tables and show table extended like 'tbl' as example.
The output before this PR:
show tables

database tableName isTemporary
default tbl false

If catalog is v2 session catalog, the output before this PR:

namespace tableName
default tbl

show table extended like 'tbl'

database tableName isTemporary information
default tbl false Database: default...

The output after this PR:
show tables

namespace tableName isTemporary
default tbl false

show table extended like 'tbl'

namespace tableName isTemporary information
default tbl false Database: default...

Why are the changes needed?

This PR have benefits as follows:
First, Unify schema for the output of SHOW TABLES.
Second, pass the output attributes could keep the expr ID unchanged, so that avoid bugs when we apply more operators above the command output dataframe.

Does this PR introduce any user-facing change?

Yes.
The output schema of SHOW TABLES replace database by namespace.

How was this patch tested?

Jenkins test.

@github-actions github-actions bot added the SQL label Jan 19, 2021
@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38814/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38814/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Test build #134229 has finished for PR 31245 at commit a28ac3f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @MaxGekk and @cloud-fan

@AngersZhuuuu
Copy link
Contributor

AngersZhuuuu commented Jan 20, 2021

I am doing https://issues.apache.org/jira/browse/SPARK-33630 now and since Show Tables's output attribute is not same between ShowTables and ShowTablesCommand cause error. I think this pr is important for https://issues.apache.org/jira/browse/SPARK-33630

@beliefer
Copy link
Contributor Author

I am doing https://issues.apache.org/jira/browse/SPARK-33630 now and since Show Tables's output attribute is not same between ShowTables and ShowTablesCommand cause error. I think after this PR, this pr is important for https://issues.apache.org/jira/browse/SPARK-33630

Yes. Thanks for your comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you missed information column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output should be delivered by logical nodes(ShowTables and ShowTableExtended) in normal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output should be delivered by logical nodes(ShowTables and ShowTableExtended) in normal.

Yea, got it, thanks.

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38840/

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR can break user code, potentially. Need to update the SQL migration guide at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have unified tests for SHOW TABLES - *.ShowTablesSuite. Please, put related tests there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38840/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134254 has finished for PR 31245 at commit ff27d1f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38858/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38858/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134271 has finished for PR 31245 at commit ca6291f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38900/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38900/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Test build #134313 has finished for PR 31245 at commit 6f0196d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38919/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38919/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Test build #134332 has finished for PR 31245 at commit 319e450.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Test build #134337 has finished for PR 31245 at commit 319e450.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38930/

@SparkQA
Copy link

SparkQA commented Jan 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38930/

@SparkQA
Copy link

SparkQA commented Feb 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39513/

@SparkQA
Copy link

SparkQA commented Feb 5, 2021

Test build #134922 has finished for PR 31245 at commit 02c36fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


- In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`.

- In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Spark 3.2, to ... -> To ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


- In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.

- In Spark 3.2, the output schema of `SHOW TABLE EXTENDED` becomes `namespace: string, tableName: string, isTemporary: boolean, information: string`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and no change for the v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

case ShowTables(DatabaseInSessionCatalog(db), pattern) =>
ShowTablesCommand(Some(db), pattern)
case ShowTables(DatabaseInSessionCatalog(db), pattern, output) =>
val requiredOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiredOutput -> newOutput

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _)))) =>
partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _))),
output) =>
val requiredOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented Feb 5, 2021

Test build #134930 has finished for PR 31245 at commit 28ad51e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39555/

@SparkQA
Copy link

SparkQA commented Feb 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39555/

@SparkQA
Copy link

SparkQA commented Feb 7, 2021

Test build #134972 has finished for PR 31245 at commit 3c73d4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39570/

@SparkQA
Copy link

SparkQA commented Feb 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39570/

@SparkQA
Copy link

SparkQA commented Feb 7, 2021

Test build #134987 has finished for PR 31245 at commit 4cd60cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2021

cc @cloud-fan

@cloud-fan
Copy link
Contributor

The last commit just updates a comment, merging to master, thanks!

@cloud-fan cloud-fan closed this in 2c243c9 Feb 8, 2021
@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2021

@cloud-fan Thanks for your work! @HyukjinKwon @MaxGekk Thanks for your review!

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39598/

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39598/

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Test build #135015 has finished for PR 31245 at commit 815d36b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("SPARK-34157 Unify output of SHOW TABLES and pass output attributes properly") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the format consistent next time: SPARK-34157 :

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consistent format is SPARK-34157: w/o any gaps ;-)

assert(sql("show tables").schema.fieldNames.deep ==
Seq("namespace", "tableName", "isTemporary"))
assert(sql("show table extended like 'tbl'").collect()(0).length == 4)
assert(sql("show table extended like 'tbl'").schema.fieldNames.deep ==
Copy link
Member

@HyukjinKwon HyukjinKwon Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this broke Scala 2.13 build. I made a followup here #31526

HyukjinKwon added a commit that referenced this pull request Feb 8, 2021
…using Array.deep

### What changes were proposed in this pull request?

This PR is a followup of #31245:

```
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:112:53: value deep is not a member of Array[String]
[error]         assert(sql("show tables").schema.fieldNames.deep ==
[error]                                                     ^
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:115:72: value deep is not a member of Array[String]
[error]         assert(sql("show table extended like 'tbl'").schema.fieldNames.deep ==
[error]                                                                        ^
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:121:55: value deep is not a member of Array[String]
[error]           assert(sql("show tables").schema.fieldNames.deep ==
[error]                                                       ^
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:124:74: value deep is not a member of Array[String]
[error]           assert(sql("show table extended like 'tbl'").schema.fieldNames.deep ==
[error]                                                                          ^
```

It broke Scala 2.13 build. This PR works around by using ScalaTests' `===` that can compare `Array`s safely.

### Why are the changes needed?

To fix the build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

CI in this PR should test it out.

Closes #31526 from HyukjinKwon/SPARK-34157.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants