Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 27, 2021

What changes were proposed in this pull request?

In the PR, I propose to generate "stable" output attributes per the logical node of the DESCRIBE TABLE command.

Why are the changes needed?

This fixes the issue demonstrated by the example:

val tbl = "testcat.ns1.ns2.tbl"
sql(s"CREATE TABLE $tbl (c0 INT) USING _")
val description = sql(s"DESCRIBE TABLE $tbl")
description.drop("comment")

The drop() method fails with the error:

org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]

	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)

Does this PR introduce any user-facing change?

Yes. After the changes, drop()/add() works as expected:

description.drop("comment").show()
+---------------+---------+
|       col_name|data_type|
+---------------+---------+
|             c0|      int|
|               |         |
| # Partitioning|         |
|Not partitioned|         |
+---------------+---------+

How was this patch tested?

  1. Run new test:
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite"
  1. Run existing test suite:
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"

@github-actions github-actions bot added the SQL label Feb 27, 2021
@SparkQA
Copy link

SparkQA commented Feb 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40127/

@SparkQA
Copy link

SparkQA commented Feb 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40127/

@SparkQA
Copy link

SparkQA commented Feb 27, 2021

Test build #135546 has finished for PR 31676 at commit 82b2cee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks like we should fix all other instances too? But could be done separately.

isExtended: Boolean) extends Command {
override def children: Seq[LogicalPlan] = Seq(relation)
override def output: Seq[Attribute] = DescribeCommandSchema.describeTableAttributes()
override val output: Seq[Attribute] = DescribeCommandSchema.describeTableAttributes()
Copy link
Contributor

@cloud-fan cloud-fan Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we follow others like ShowTables and put the output as a parameter? Then it's more stable and the output won't change after copy/transformation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we follow others like ShowTables and put the output as a parameter?

Put the output as a parameter doesn't solve any problems.

Then it's more stable ...

I would say "super stable" even it is not necessary, see #31675

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it's more stable and the output won't change after copy/transformation.

ok. val output will be re-initialized per every .copy(). I will make it as a case class parameter.

@cloud-fan
Copy link
Contributor

also cc @beliefer @AngersZhuuuu who worked on similar issues before.

@AngersZhuuuu
Copy link
Contributor

AngersZhuuuu commented Mar 1, 2021

Looks like we should fix all other instances too? But could be done separately.

Yea, have fix some of this. But maybe there are still incorrect instance.
https://issues.apache.org/jira/browse/SPARK-34576
https://issues.apache.org/jira/browse/SPARK-34577

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40163/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40163/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135582 has finished for PR 31676 at commit dcb97af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40174/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40174/

@cloud-fan
Copy link
Contributor

cloud-fan commented Mar 1, 2021

thanks, merging to master/3.1! (it has many conflicts in branch-3.0 and may not worth to backport).

@cloud-fan cloud-fan closed this in 984ff39 Mar 1, 2021
cloud-fan pushed a commit that referenced this pull request Mar 1, 2021
…RIBE TABLE`

In the PR, I propose to generate "stable" output attributes per the logical node of the `DESCRIBE TABLE` command.

This fixes the issue demonstrated by the example:
```scala
val tbl = "testcat.ns1.ns2.tbl"
sql(s"CREATE TABLE $tbl (c0 INT) USING _")
val description = sql(s"DESCRIBE TABLE $tbl")
description.drop("comment")
```
The `drop()` method fails with the error:
```
org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]

	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)
```

Yes. After the changes, `drop()`/`add()` works as expected:
```scala
description.drop("comment").show()
+---------------+---------+
|       col_name|data_type|
+---------------+---------+
|             c0|      int|
|               |         |
| # Partitioning|         |
|Not partitioned|         |
+---------------+---------+
```

1. Run new test:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite"
```
2. Run existing test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"
```

Closes #31676 from MaxGekk/describe-table-drop-column.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 984ff39)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135593 has finished for PR 31676 at commit 05667cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…RIBE TABLE`

In the PR, I propose to generate "stable" output attributes per the logical node of the `DESCRIBE TABLE` command.

This fixes the issue demonstrated by the example:
```scala
val tbl = "testcat.ns1.ns2.tbl"
sql(s"CREATE TABLE $tbl (c0 INT) USING _")
val description = sql(s"DESCRIBE TABLE $tbl")
description.drop("comment")
```
The `drop()` method fails with the error:
```
org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]

	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)
```

Yes. After the changes, `drop()`/`add()` works as expected:
```scala
description.drop("comment").show()
+---------------+---------+
|       col_name|data_type|
+---------------+---------+
|             c0|      int|
|               |         |
| # Partitioning|         |
|Not partitioned|         |
+---------------+---------+
```

1. Run new test:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite"
```
2. Run existing test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"
```

Closes apache#31676 from MaxGekk/describe-table-drop-column.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 984ff39)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants