Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented May 5, 2021

What changes were proposed in this pull request?

Some command used to display metadata, such as: SHOW TABLES, SHOW TABLE EXTENDED,SHOW TBLPROPERTIES and so no.
If the output rows much than screen height, the output very unfriendly to developers.
So we should have a way to filter the output like the behavior of
WITH s AS (SHOW NAMESPACES) SELECT * FROM s WHERE namespace = 'query_ddl_namespace';

Why are the changes needed?

This PR provides a better way to display DDL when output rows much than screen height.

Does this PR introduce any user-facing change?

'Yes'. A new syntax.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label May 5, 2021
@SparkQA
Copy link

SparkQA commented May 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42695/

@SparkQA
Copy link

SparkQA commented May 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42695/

@SparkQA
Copy link

SparkQA commented May 5, 2021

Test build #138174 has finished for PR 32442 at commit aca1fcc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42711/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42711/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Test build #138190 has finished for PR 32442 at commit 09a04b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42726/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42727/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42726/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Test build #138205 has finished for PR 32442 at commit 0235cb4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42751/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Test build #138229 has finished for PR 32442 at commit 4f8b782.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented May 7, 2021

ping @cloud-fan

override def visitNamedQuery(ctx: NamedQueryContext): SubqueryAlias = withOrigin(ctx) {
val subQuery: LogicalPlan = plan(ctx.query).optionalMap(ctx.columnAliases)(
val logicalPlan = Option(ctx.query).map(plan).orElse(
Option(ctx.ddlStatementForQuery).map(visitDdlStatementForQuery)).get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can call visitDdlQuery and don't need to create visitDdlStatementForQuery

case columns: ShowColumnsContext => visitShowColumns(columns)
case views: ShowViewsContext => visitShowViews(views)
case functions: ShowFunctionsContext => visitShowFunctions(functions)
case _ => throw QueryParsingErrors.unsupportedDdlStatementForQueryError(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't happen, and is an assert like error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it.

: WITH namedQuery (',' namedQuery)*
;

ddlStatementForQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about informationQueries?

// SPARK-28620
"postgreSQL/float4.sql",
// SPARK-35283
"cte-ddl.sql",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it doesn't work in thriftserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the output schema of hive is different from spark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Hive doesn't support this syntax?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding a comment here to explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding a comment here to explain?

OK

Copy link
Contributor Author

@beliefer beliefer May 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the output schema of some DDL in Hive is differing from Spark SQL, we exclude it.
For example, the output schema of SHOW TABLES is (namespace, tableName, isTemporary) in Hive, but (tableName) in Spark SQL.

@cloud-fan
Copy link
Contributor

The syntax looks good, cc @yaooqinn @wangyum @viirya @maropu

@wangyum
Copy link
Member

wangyum commented May 7, 2021

+1. This syntax looks good.

@maropu
Copy link
Member

maropu commented May 7, 2021

We accept SHOW XXX DDLs in common table exprs but we don't accept them in a FROM clause like SELECT * FROM (SHOW NAMESPACES)? This new feature itself looks good. Btw, is this PR related to #31548? It seems the motivation is the same, but the approaches/the jira numbers are different?

@viirya
Copy link
Member

viirya commented May 7, 2021

CTE looks better if you want to use self join

WITH s AS (SHOW TABLES) SELECT ... FROM s left JOIN s right ON left.xxx = right.xxx

Okay, sounds good. BTW, after #31548, we can also put it in FROM clause, right?

@viirya
Copy link
Member

viirya commented May 7, 2021

Does this PR introduce any user-facing change?
'No'. Just a new syntax.

I think this is a user-facing change.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. We may also need to update sql syntax docs, e.g. docs/sql-ref-syntax-aux-show-tables.md .

@maropu
Copy link
Member

maropu commented May 7, 2021

Yea, a CTE is useful when it is referenced more than once like self-joins. But, in simple cases (e.g., filtering SHOW output rows as the description says), SELECT * FROM (SHOW XXX) WHERE ... seems easier, I think.

@beliefer
Copy link
Contributor Author

beliefer commented May 8, 2021

We accept SHOW XXX DDLs in common table exprs but we don't accept them in a FROM clause like SELECT * FROM (SHOW NAMESPACES)? This new feature itself looks good. Btw, is this PR related to #31548? It seems the motivation is the same, but the approaches/the jira numbers are different?

Yes, this PR related to #31548. A discussion #31548 (comment)

@beliefer
Copy link
Contributor Author

beliefer commented May 8, 2021

It would be nice to update the SQL doc, too.

I updated docs/sql-ref-syntax-qry-select-cte.md

@github-actions github-actions bot added the DOCS label May 8, 2021
@beliefer
Copy link
Contributor Author

beliefer commented May 8, 2021

Looks okay. We may also need to update sql syntax docs, e.g. docs/sql-ref-syntax-aux-show-tables.md .

I updated docs/sql-ref-syntax-qry-select-cte.md

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42797/

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42797/

@SparkQA
Copy link

SparkQA commented May 8, 2021

Test build #138275 has finished for PR 32442 at commit afed495.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

DROP TABLE test_show_table_properties;
DROP TABLE test_show_tables;
USE default;
DROP NAMESPACE query_ddl_namespace; No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a newline character.


informationQuery
: SHOW (DATABASES | NAMESPACES) ((FROM | IN) multipartIdentifier)? (LIKE? pattern=STRING)? #showNamespaces
| SHOW TABLES ((FROM | IN) multipartIdentifier)? (LIKE? pattern=STRING)? #showTables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do not support SHOW TABLE EXTENDED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

SHOW TABLES;
WITH s AS (SHOW TABLES) SELECT * FROM s;
WITH s AS (SHOW TABLES) SELECT * FROM s WHERE tableName = 'test_show_tables';
WITH s(ns, tn, t) AS (SHOW TABLES) SELECT * FROM s WHERE tn = 'test_show_tables';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add more tests? For example:

WITH s(ns, tn, t) AS (SHOW TABLES) SELECT tn FROM s;
WITH s(ns, tn, t) AS (SHOW TABLES) SELECT tn FROM s ORDER BY rn;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

struct<>
-- !query output
java.lang.ClassCastException
org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the reason yet. It seems we should fix this issue in another PR?
cc @cloud-fan @wangyum @maropu @viirya

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42851/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42851/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Test build #138330 has finished for PR 32442 at commit 171c6ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42854/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Test build #138333 has finished for PR 32442 at commit 171c6ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer beliefer closed this May 17, 2021
cloud-fan added a commit that referenced this pull request Jun 9, 2021
… of caller sides

### What changes were proposed in this pull request?
Currently, Spark eagerly executes commands on the caller side of `QueryExecution`, which is a bit hacky as `QueryExecution` is not aware of it and leads to confusion.

For example, if you run `sql("show tables").collect()`, you will see two queries with identical query plans in the web UI.
![image](https://user-images.githubusercontent.com/3182036/121193729-a72d0480-c8a0-11eb-8b12-379019607ad5.png)
![image](https://user-images.githubusercontent.com/3182036/121193822-bc099800-c8a0-11eb-9d2a-34ab1329e2f7.png)
![image](https://user-images.githubusercontent.com/3182036/121193845-c0ce4c00-c8a0-11eb-96d0-ef604a4dfab0.png)

The first query is triggered at `Dataset.logicalPlan`, which eagerly executes the command.
The second query is triggered at `Dataset.collect`, which is the normal query execution.

From the web UI, it's hard to tell that these two queries are caused by eager command execution.

This PR proposes to move the eager command execution to `QueryExecution`, and turn the command plan to `CommandResult` to indicate that command has been executed already. Now `sql("show tables").collect()` still triggers two queries, but the quey plans are not identical. The second query becomes:
![image](https://user-images.githubusercontent.com/3182036/121194850-b3659180-c8a1-11eb-9abf-2980f84f089d.png)

In addition to the UI improvements, this PR also has other benefits:
1. Simplifies code as caller side no need to worry about eager command execution. `QueryExecution` takes care of it.
2. It helps #32442 , where there can be more plan nodes above commands, and we need to replace commands with something like local relation that produces unsafe rows.

### Why are the changes needed?
Explained above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing tests

Closes #32513 from beliefer/SPARK-35378.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants