-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34561][SQL] Fix drop/add columns from/to a dataset of v2 DESCRIBE TABLE
#31676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #135546 has finished for PR 31676 at commit
|
|
Looks like we should fix all other instances too? But could be done separately. |
| isExtended: Boolean) extends Command { | ||
| override def children: Seq[LogicalPlan] = Seq(relation) | ||
| override def output: Seq[Attribute] = DescribeCommandSchema.describeTableAttributes() | ||
| override val output: Seq[Attribute] = DescribeCommandSchema.describeTableAttributes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we follow others like ShowTables and put the output as a parameter? Then it's more stable and the output won't change after copy/transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we follow others like ShowTables and put the output as a parameter?
Put the output as a parameter doesn't solve any problems.
Then it's more stable ...
I would say "super stable" even it is not necessary, see #31675
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it's more stable and the output won't change after copy/transformation.
ok. val output will be re-initialized per every .copy(). I will make it as a case class parameter.
|
also cc @beliefer @AngersZhuuuu who worked on similar issues before. |
Yea, have fix some of this. But maybe there are still incorrect instance. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135582 has finished for PR 31676 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
thanks, merging to master/3.1! (it has many conflicts in branch-3.0 and may not worth to backport). |
…RIBE TABLE`
In the PR, I propose to generate "stable" output attributes per the logical node of the `DESCRIBE TABLE` command.
This fixes the issue demonstrated by the example:
```scala
val tbl = "testcat.ns1.ns2.tbl"
sql(s"CREATE TABLE $tbl (c0 INT) USING _")
val description = sql(s"DESCRIBE TABLE $tbl")
description.drop("comment")
```
The `drop()` method fails with the error:
```
org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)
```
Yes. After the changes, `drop()`/`add()` works as expected:
```scala
description.drop("comment").show()
+---------------+---------+
| col_name|data_type|
+---------------+---------+
| c0| int|
| | |
| # Partitioning| |
|Not partitioned| |
+---------------+---------+
```
1. Run new test:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite"
```
2. Run existing test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"
```
Closes #31676 from MaxGekk/describe-table-drop-column.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 984ff39)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
Test build #135593 has finished for PR 31676 at commit
|
…RIBE TABLE`
In the PR, I propose to generate "stable" output attributes per the logical node of the `DESCRIBE TABLE` command.
This fixes the issue demonstrated by the example:
```scala
val tbl = "testcat.ns1.ns2.tbl"
sql(s"CREATE TABLE $tbl (c0 INT) USING _")
val description = sql(s"DESCRIBE TABLE $tbl")
description.drop("comment")
```
The `drop()` method fails with the error:
```
org.apache.spark.sql.AnalysisException: Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)
```
Yes. After the changes, `drop()`/`add()` works as expected:
```scala
description.drop("comment").show()
+---------------+---------+
| col_name|data_type|
+---------------+---------+
| c0| int|
| | |
| # Partitioning| |
|Not partitioned| |
+---------------+---------+
```
1. Run new test:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DataSourceV2SQLSuite"
```
2. Run existing test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"
```
Closes apache#31676 from MaxGekk/describe-table-drop-column.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 984ff39)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to generate "stable" output attributes per the logical node of the
DESCRIBE TABLEcommand.Why are the changes needed?
This fixes the issue demonstrated by the example:
The
drop()method fails with the error:Does this PR introduce any user-facing change?
Yes. After the changes,
drop()/add()works as expected:How was this patch tested?