[SPARK-34561][SQL] Fix drop/add columns from/to a dataset of v2 `DESCRIBE TABLE` #31676

MaxGekk · 2021-02-27T17:35:57Z

What changes were proposed in this pull request?

In the PR, I propose to generate "stable" output attributes per the logical node of the DESCRIBE TABLE command.

Why are the changes needed?

This fixes the issue demonstrated by the example:

val tbl = "testcat.ns1.ns2.tbl"
sql(s"CREATE TABLE $tbl (c0 INT) USING _")
val description = sql(s"DESCRIBE TABLE $tbl")
description.drop("comment")

The drop() method fails with the error:

org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]

	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)

Does this PR introduce any user-facing change?

Yes. After the changes, drop()/add() works as expected:

description.drop("comment").show()
+---------------+---------+
|       col_name|data_type|
+---------------+---------+
|             c0|      int|
|               |         |
| # Partitioning|         |
|Not partitioned|         |
+---------------+---------+

How was this patch tested?

Run new test:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite"

Run existing test suite:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"

SparkQA · 2021-02-27T18:26:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40127/

SparkQA · 2021-02-27T18:35:49Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40127/

SparkQA · 2021-02-27T22:48:01Z

Test build #135546 has finished for PR 31676 at commit 82b2cee.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-03-01T01:49:48Z

Looks like we should fix all other instances too? But could be done separately.

cloud-fan · 2021-03-01T05:26:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

    isExtended: Boolean) extends Command {
  override def children: Seq[LogicalPlan] = Seq(relation)
-  override def output: Seq[Attribute] = DescribeCommandSchema.describeTableAttributes()
+  override val output: Seq[Attribute] = DescribeCommandSchema.describeTableAttributes()


Shall we follow others like ShowTables and put the output as a parameter? Then it's more stable and the output won't change after copy/transformation.

Shall we follow others like ShowTables and put the output as a parameter?

Put the output as a parameter doesn't solve any problems.

Then it's more stable ...

I would say "super stable" even it is not necessary, see #31675

Then it's more stable and the output won't change after copy/transformation.

ok. val output will be re-initialized per every .copy(). I will make it as a case class parameter.

cloud-fan · 2021-03-01T05:27:23Z

also cc @beliefer @AngersZhuuuu who worked on similar issues before.

AngersZhuuuu · 2021-03-01T06:03:54Z

Looks like we should fix all other instances too? But could be done separately.

Yea, have fix some of this. But maybe there are still incorrect instance.
https://issues.apache.org/jira/browse/SPARK-34576
https://issues.apache.org/jira/browse/SPARK-34577

SparkQA · 2021-03-01T08:53:27Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40163/

SparkQA · 2021-03-01T09:21:02Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40163/

SparkQA · 2021-03-01T10:10:56Z

Test build #135582 has finished for PR 31676 at commit dcb97af.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-01T11:51:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40174/

SparkQA · 2021-03-01T12:26:41Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40174/

cloud-fan · 2021-03-01T14:20:47Z

thanks, merging to master/3.1! (it has many conflicts in branch-3.0 and may not worth to backport).

…RIBE TABLE` In the PR, I propose to generate "stable" output attributes per the logical node of the `DESCRIBE TABLE` command. This fixes the issue demonstrated by the example: ```scala val tbl = "testcat.ns1.ns2.tbl" sql(s"CREATE TABLE $tbl (c0 INT) USING _") val description = sql(s"DESCRIBE TABLE $tbl") description.drop("comment") ``` The `drop()` method fails with the error: ``` org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.; !Project [col_name#102, data_type#103] +- LocalRelation [col_name#29, data_type#30, comment#31] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50) ``` Yes. After the changes, `drop()`/`add()` works as expected: ```scala description.drop("comment").show() +---------------+---------+ | col_name|data_type| +---------------+---------+ | c0| int| | | | | # Partitioning| | |Not partitioned| | +---------------+---------+ ``` 1. Run new test: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite" ``` 2. Run existing test suite: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite" ``` Closes #31676 from MaxGekk/describe-table-drop-column. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 984ff39) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

SparkQA · 2021-03-01T14:57:12Z

Test build #135593 has finished for PR 31676 at commit 05667cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…RIBE TABLE` In the PR, I propose to generate "stable" output attributes per the logical node of the `DESCRIBE TABLE` command. This fixes the issue demonstrated by the example: ```scala val tbl = "testcat.ns1.ns2.tbl" sql(s"CREATE TABLE $tbl (c0 INT) USING _") val description = sql(s"DESCRIBE TABLE $tbl") description.drop("comment") ``` The `drop()` method fails with the error: ``` org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.; !Project [col_name#102, data_type#103] +- LocalRelation [col_name#29, data_type#30, comment#31] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50) ``` Yes. After the changes, `drop()`/`add()` works as expected: ```scala description.drop("comment").show() +---------------+---------+ | col_name|data_type| +---------------+---------+ | c0| int| | | | | # Partitioning| | |Not partitioned| | +---------------+---------+ ``` 1. Run new test: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite" ``` 2. Run existing test suite: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite" ``` Closes apache#31676 from MaxGekk/describe-table-drop-column. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 984ff39) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

MaxGekk added 3 commits February 27, 2021 20:07

Add a test

8175baa

Fix

853ff89

Minor improvement

82b2cee

github-actions bot added the SQL label Feb 27, 2021

HyukjinKwon approved these changes Mar 1, 2021

View reviewed changes

cloud-fan reviewed Mar 1, 2021

View reviewed changes

Address Wenchen's review comments

dcb97af

cloud-fan approved these changes Mar 1, 2021

View reviewed changes

Re-gen describe.sql.out

05667cf

cloud-fan approved these changes Mar 1, 2021

View reviewed changes

cloud-fan closed this in 984ff39 Mar 1, 2021

cloud-fan mentioned this pull request Mar 4, 2021

[SPARK-34576][SQL] Fix drop/add columns to a dataset of DESCRIBE COLUMN #31696

Closed

[SPARK-34561][SQL] Fix drop/add columns from/to a dataset of v2 DESCRIBE TABLE #31676

[SPARK-34561][SQL] Fix drop/add columns from/to a dataset of v2 DESCRIBE TABLE #31676

Uh oh!

Conversation

MaxGekk commented Feb 27, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Feb 27, 2021

Uh oh!

SparkQA commented Feb 27, 2021

Uh oh!

SparkQA commented Feb 27, 2021

Uh oh!

HyukjinKwon commented Mar 1, 2021

Uh oh!

cloud-fan Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

MaxGekk Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Mar 1, 2021

Uh oh!

AngersZhuuuu commented Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

cloud-fan commented Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-34561][SQL] Fix drop/add columns from/to a dataset of v2 `DESCRIBE TABLE` #31676

[SPARK-34561][SQL] Fix drop/add columns from/to a dataset of v2 `DESCRIBE TABLE` #31676

cloud-fan Mar 1, 2021 •

edited

Loading

AngersZhuuuu commented Mar 1, 2021 •

edited

Loading

cloud-fan commented Mar 1, 2021 •

edited

Loading