-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly #31245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #134229 has finished for PR 31245 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
|
cc @MaxGekk and @cloud-fan |
|
I am doing https://issues.apache.org/jira/browse/SPARK-33630 now and since |
Yes. Thanks for your comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you missed information column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output should be delivered by logical nodes(ShowTables and ShowTableExtended) in normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output should be delivered by logical nodes(ShowTables and ShowTableExtended) in normal.
Yea, got it, thanks.
|
Kubernetes integration test starting |
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR can break user code, potentially. Need to update the SQL migration guide at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have unified tests for SHOW TABLES - *.ShowTablesSuite. Please, put related tests there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
|
Kubernetes integration test status success |
|
Test build #134254 has finished for PR 31245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #134271 has finished for PR 31245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134313 has finished for PR 31245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #134332 has finished for PR 31245 at commit
|
|
retest this please |
|
Test build #134337 has finished for PR 31245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status success |
|
Test build #134922 has finished for PR 31245 at commit
|
docs/sql-migration-guide.md
Outdated
|
|
||
| - In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`. | ||
|
|
||
| - In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Spark 3.2, to ... -> To ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
docs/sql-migration-guide.md
Outdated
|
|
||
| - In Spark 3.2, the output schema of `SHOW TABLES` becomes `namespace: string, tableName: string, isTemporary: boolean`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and there is no `isTemporary` field for v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`. | ||
|
|
||
| - In Spark 3.2, the output schema of `SHOW TABLE EXTENDED` becomes `namespace: string, tableName: string, isTemporary: boolean, information: string`. In Spark 3.1 or earlier, the `namespace` field was named `database` for the builtin catalog, and no change for the v2 catalogs. Since Spark 3.2, to restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| case ShowTables(DatabaseInSessionCatalog(db), pattern) => | ||
| ShowTablesCommand(Some(db), pattern) | ||
| case ShowTables(DatabaseInSessionCatalog(db), pattern, output) => | ||
| val requiredOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiredOutput -> newOutput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _)))) => | ||
| partitionSpec @ (None | Some(UnresolvedPartitionSpec(_, _))), | ||
| output) => | ||
| val requiredOutput = if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Test build #134930 has finished for PR 31245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134972 has finished for PR 31245 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134987 has finished for PR 31245 at commit
|
|
cc @cloud-fan |
|
The last commit just updates a comment, merging to master, thanks! |
|
@cloud-fan Thanks for your work! @HyukjinKwon @MaxGekk Thanks for your review! |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135015 has finished for PR 31245 at commit
|
| } | ||
| } | ||
|
|
||
| test("SPARK-34157 Unify output of SHOW TABLES and pass output attributes properly") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the format consistent next time: SPARK-34157 :
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consistent format is SPARK-34157: w/o any gaps ;-)
| assert(sql("show tables").schema.fieldNames.deep == | ||
| Seq("namespace", "tableName", "isTemporary")) | ||
| assert(sql("show table extended like 'tbl'").collect()(0).length == 4) | ||
| assert(sql("show table extended like 'tbl'").schema.fieldNames.deep == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this broke Scala 2.13 build. I made a followup here #31526
…using Array.deep ### What changes were proposed in this pull request? This PR is a followup of #31245: ``` [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:112:53: value deep is not a member of Array[String] [error] assert(sql("show tables").schema.fieldNames.deep == [error] ^ [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:115:72: value deep is not a member of Array[String] [error] assert(sql("show table extended like 'tbl'").schema.fieldNames.deep == [error] ^ [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:121:55: value deep is not a member of Array[String] [error] assert(sql("show tables").schema.fieldNames.deep == [error] ^ [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:124:74: value deep is not a member of Array[String] [error] assert(sql("show table extended like 'tbl'").schema.fieldNames.deep == [error] ^ ``` It broke Scala 2.13 build. This PR works around by using ScalaTests' `===` that can compare `Array`s safely. ### Why are the changes needed? To fix the build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? CI in this PR should test it out. Closes #31526 from HyukjinKwon/SPARK-34157. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
The current implement of some DDL not unify the output and not pass the output properly to physical command.
Such as: The
ShowTablesoutput attributesnamespace, butShowTablesCommandoutput attributesdatabase.As the query plan, this PR pass the output attributes from
ShowTablestoShowTablesCommand,ShowTableExtendedtoShowTablesCommand.Take
show tablesandshow table extended like 'tbl'as example.The output before this PR:
show tablesIf catalog is v2 session catalog, the output before this PR:
show table extended like 'tbl'The output after this PR:
show tablesshow table extended like 'tbl'Why are the changes needed?
This PR have benefits as follows:
First, Unify schema for the output of SHOW TABLES.
Second, pass the output attributes could keep the expr ID unchanged, so that avoid bugs when we apply more operators above the command output dataframe.
Does this PR introduce any user-facing change?
Yes.
The output schema of
SHOW TABLESreplacedatabasebynamespace.How was this patch tested?
Jenkins test.