[SPARK-34127][SQL] Support table valued command #31548

beliefer · 2021-02-11T02:59:01Z

What changes were proposed in this pull request?

Some command used to display some metadata, such as: SHOW TABLES, SHOW TBLPROPERTIES and so no.
If the output rows much than screen height, the output very unfriendly to developers.
So we should have a way to filter the output like the behavior of SELECT ... FROM ... WHERE ....
We could reference the implement of table valued function.

Why are the changes needed?

This PR provides a better way to display DDL when output rows much than screen height.

Does this PR introduce any user-facing change?

'No'. Just a new syntax.

How was this patch tested?

Jenkins test.

SparkQA · 2021-02-11T04:03:35Z

Test build #135109 has finished for PR 31548 at commit cf6ca82.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-12T03:36:18Z

Test build #135117 has finished for PR 31548 at commit 23040df.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-12T13:54:06Z

Test build #135124 has finished for PR 31548 at commit 6bdeb3e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-12T14:39:40Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39705/

SparkQA · 2021-02-12T16:16:00Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39705/

SparkQA · 2021-02-13T03:33:56Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39713/

SparkQA · 2021-02-13T04:08:37Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39713/

SparkQA · 2021-02-13T07:10:43Z

Test build #135132 has finished for PR 31548 at commit ce6f644.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-02-13T11:46:02Z

cc @wangyum

wangyum · 2021-02-14T12:04:04Z

cc @cloud-fan

HyukjinKwon · 2021-02-16T04:56:58Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedCommands.scala

+    case ShowTableProperties(_, _, _) => true
+    case ShowPartitions(_, _, _) => true
+    case ShowColumns(_, _, _) => true
+    // TODO   case ShowViews(_, _, _) => true


Can we file a JIRA and make it id'ed todo? e.g) TODO(SPARK-XXXX): blah blah

If #31508 and #31519 before this PR merged, I will implement the code. Otherwise, I will create two tickets later.

maropu · 2021-02-18T02:13:02Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala

+/**
+ * A table-valued command, e.g.
+ * {{{
+ *   select tableName from command("show tables");


We cannot add a new built-in function for this purpose instead? The implementation will be simpler than the current syntax-based approach, I think.

First, this PR references the table-valued-function. So, I call the syntax is table-valued-command.
Second, Users want use filter clause, group by, ... on table-valued-command. In fact, This PR adopt the syntax-based approach is much simpler than built-in function.

Yes. This implementation can make these command easy to use:

Save result to table.

Query the specified column.

Filter / Group by / Order by the specified column.

This is the syntax of Teradata:

SELECT tbl.DatabaseName, tbl.TableName, SUM(spc.CurrentPerm)/1024.00 as TableSize FROM DBC.TablesV tbl JOIN DBC.TableSize spc ON tbl.DatabaseName = spc.DatabaseName AND tbl.TableName = spc.TableName WHERE tbl.DatabaseName NOT IN ('All', 'Crashdumps', 'DBC', 'dbcmngr', 'Default', 'External_AP', 'EXTUSER', 'LockLogShredder', 'PUBLIC', 'Sys_Calendar', 'SysAdmin', 'SYSBAR', 'SYSJDBC', 'SYSLIB', 'SystemFe', 'SYSUDTLIB', 'SYSUIF', 'TD_SERVER_DB', 'TDStats', 'TD_SYSGPL', 'TD_SYSXML', 'TDMaps', 'TDPUSER', 'TDQCD', 'tdwm', 'SQLJ', 'TD_SYSFNLIB', 'SYSSPATIAL') AND TableKind = 'T' GROUP BY tbl.DatabaseName, tbl.TableName ORDER BY TableSize DESC;

SparkQA · 2021-02-19T08:01:49Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39842/

SparkQA · 2021-02-19T08:37:52Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39842/

SparkQA · 2021-02-19T09:02:24Z

Test build #135262 has finished for PR 31548 at commit 280f39c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-19T12:45:09Z

Test build #135265 has finished for PR 31548 at commit 7558dcb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-02-20T04:01:38Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39866/

SparkQA · 2021-02-24T11:05:49Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39996/

SparkQA · 2021-02-24T12:41:46Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40002/

SparkQA · 2021-02-24T13:10:48Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40002/

SparkQA · 2021-02-24T16:27:44Z

Test build #135422 has finished for PR 31548 at commit 99676ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-03-04T02:51:21Z

cc @cloud-fan

SparkQA · 2021-04-06T04:59:43Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41506/

SparkQA · 2021-04-06T04:59:44Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41506/

SparkQA · 2021-04-06T08:48:27Z

Test build #136929 has finished for PR 31548 at commit 399b028.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-04-06T14:33:21Z

The use case is totally valid, but we may need more discussion about how to design the API to implement this use case.

I have a few thoughts:

Use information schema. This is the most standard way, but unfortunately Spark follows Hive and creates a lot of special commands to provide metadata information (e.g. SHOW TABLES). It's a lot of effort to switch to information schema in Spark.
Allow creating views with these special commands, e.g. CREATE VIEW v AS SHOW TABLES.
Table-valued command in this function.

Personally I prefer option 2 as it's more predictable. We need extra effort for option 3 to define the behaviors of FROM COMMAND("abc") or FROM COMMAND("DROP TABLE"), while in option 2 we can carefully change the parser to only allow certain commands in CREATE VIEW.

If we want to add information schema in the future, they are read-only views and it's helpful if option 2 is done and we can already create views with SHOW TABLES, etc. already.

What do you think? cc @viirya @maropu @yaooqinn

yaooqinn · 2021-04-06T16:29:06Z

Allow creating views with these special commands, e.g. CREATE VIEW v AS SHOW TABLES.

Hmm.. SHOW TABLES shows v here?

cloud-fan · 2021-04-06T17:04:10Z

SHOW TABLES shows v here?

Ah good point. One way is to eagerly execute the command when creating the view, so v is excluded from the result, but it's a bit tricky to make the view not lazy.

Another idea is to allow SHOW TABLES etc. as subqueries, e.g. SELECT ... FROM (SHOW TABLES).

viirya · 2021-04-07T09:06:37Z

Ah good point. One way is to eagerly execute the command when creating the view, so v is excluded from the result, but it's a bit tricky to make the view not lazy.

Another idea is to allow SHOW TABLES etc. as subqueries, e.g. SELECT ... FROM (SHOW TABLES).

Making a non lazy view sounds weird to me. We then create another kind of special view just for this kind of commands.

The subquery approach looks more promising to me.

SparkQA · 2021-04-07T09:24:06Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41584/

SparkQA · 2021-04-07T11:27:28Z

Test build #137006 has finished for PR 31548 at commit add16cb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait FunctionRegistryBase[T]
trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] with Logging
trait EmptyFunctionRegistryBase[T] extends FunctionRegistryBase[T]
trait FunctionRegistry extends FunctionRegistryBase[Expression]
trait TableFunctionRegistry extends FunctionRegistryBase[LogicalPlan]
case class ResolveTableValuedFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan]

SparkQA · 2021-04-19T10:44:54Z

Test build #137585 has finished for PR 31548 at commit add16cb.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
trait FunctionRegistryBase[T]
trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] with Logging
trait EmptyFunctionRegistryBase[T] extends FunctionRegistryBase[T]
trait FunctionRegistry extends FunctionRegistryBase[Expression]
trait TableFunctionRegistry extends FunctionRegistryBase[LogicalPlan]
case class ResolveTableValuedFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan]

SparkQA · 2021-04-27T07:27:16Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42508/

SparkQA · 2021-04-27T07:27:17Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42508/

SparkQA · 2021-04-27T11:08:24Z

Test build #137988 has finished for PR 31548 at commit 1d4a6b3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-04-27T11:10:51Z

ping @cloud-fan

github-actions · 2021-08-06T00:08:55Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Support table valued command

c3a2761

github-actions bot added the SQL label Feb 11, 2021

Add test case

cf6ca82

Update docs

23040df

github-actions bot added the DOCS label Feb 12, 2021

Update code

6bdeb3e

Update code

ce6f644

HyukjinKwon reviewed Feb 16, 2021

View reviewed changes

maropu reviewed Feb 18, 2021

View reviewed changes

beliefer added 4 commits February 19, 2021 12:27

Merge branch 'master' into SPARK-34127

ae2860a

Support show (views|functions)

a9d41d3

Put error

74aad6b

Improve error msg

280f39c

Update golden files

7558dcb

Update golden file

2eeb9d7

Remove Whitespace

99676ab

maropu mentioned this pull request Feb 28, 2021

[SPARK-33630][SQL] Support SHOW TABLES command as table valued function #31257

Closed

beliefer added 2 commits March 18, 2021 13:48

Merge branch 'master' into SPARK-34127

862b3ef

Merge branch 'master' into SPARK-34127

399b028

Merge branch 'master' into SPARK-34127

add16cb

Merge branch 'master' into SPARK-34127

1d4a6b3

maropu mentioned this pull request May 7, 2021

[SPARK-35283][SQL] Support query some DDL with CTES #32442

Closed

beliefer mentioned this pull request Jun 10, 2021

[SPARK-35283][SQL] Support query some DDL with CTES #32852

Closed

github-actions bot added the Stale label Aug 6, 2021

github-actions bot closed this Aug 7, 2021

[SPARK-34127][SQL] Support table valued command #31548

[SPARK-34127][SQL] Support table valued command #31548

Uh oh!

Conversation

beliefer commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Feb 11, 2021

Uh oh!

SparkQA commented Feb 12, 2021

Uh oh!

SparkQA commented Feb 12, 2021

Uh oh!

SparkQA commented Feb 12, 2021

Uh oh!

SparkQA commented Feb 12, 2021

Uh oh!

SparkQA commented Feb 13, 2021

Uh oh!

SparkQA commented Feb 13, 2021

Uh oh!

SparkQA commented Feb 13, 2021

Uh oh!

beliefer commented Feb 13, 2021

Uh oh!

wangyum commented Feb 14, 2021

Uh oh!

HyukjinKwon Feb 16, 2021

Choose a reason for hiding this comment

Uh oh!

beliefer Feb 16, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

maropu Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

beliefer Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

wangyum Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 19, 2021

Uh oh!

SparkQA commented Feb 19, 2021

Uh oh!

SparkQA commented Feb 19, 2021

Uh oh!

SparkQA commented Feb 19, 2021

Uh oh!

SparkQA commented Feb 20, 2021

Uh oh!

SparkQA commented Feb 24, 2021

Uh oh!

SparkQA commented Feb 24, 2021

Uh oh!

SparkQA commented Feb 24, 2021

Uh oh!

SparkQA commented Feb 24, 2021

Uh oh!

beliefer commented Mar 4, 2021

Uh oh!

SparkQA commented Apr 6, 2021

Uh oh!

SparkQA commented Apr 6, 2021

Uh oh!

SparkQA commented Apr 6, 2021

Uh oh!

cloud-fan commented Apr 6, 2021

Uh oh!

yaooqinn commented Apr 6, 2021

Uh oh!

cloud-fan commented Apr 6, 2021

beliefer commented Feb 11, 2021 •

edited

Loading