Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Feb 11, 2021

What changes were proposed in this pull request?

Some command used to display some metadata, such as: SHOW TABLES, SHOW TBLPROPERTIES and so no.
If the output rows much than screen height, the output very unfriendly to developers.
So we should have a way to filter the output like the behavior of SELECT ... FROM ... WHERE ....
We could reference the implement of table valued function.

Why are the changes needed?

This PR provides a better way to display DDL when output rows much than screen height.

Does this PR introduce any user-facing change?

'No'. Just a new syntax.

How was this patch tested?

Jenkins test.

@github-actions github-actions bot added the SQL label Feb 11, 2021
@SparkQA
Copy link

SparkQA commented Feb 11, 2021

Test build #135109 has finished for PR 31548 at commit cf6ca82.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the DOCS label Feb 12, 2021
@SparkQA
Copy link

SparkQA commented Feb 12, 2021

Test build #135117 has finished for PR 31548 at commit 23040df.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 12, 2021

Test build #135124 has finished for PR 31548 at commit 6bdeb3e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39705/

@SparkQA
Copy link

SparkQA commented Feb 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39705/

@SparkQA
Copy link

SparkQA commented Feb 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39713/

@SparkQA
Copy link

SparkQA commented Feb 13, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39713/

@SparkQA
Copy link

SparkQA commented Feb 13, 2021

Test build #135132 has finished for PR 31548 at commit ce6f644.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

cc @wangyum

@wangyum
Copy link
Member

wangyum commented Feb 14, 2021

cc @cloud-fan

case ShowTableProperties(_, _, _) => true
case ShowPartitions(_, _, _) => true
case ShowColumns(_, _, _) => true
// TODO case ShowViews(_, _, _) => true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file a JIRA and make it id'ed todo? e.g) TODO(SPARK-XXXX): blah blah

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #31508 and #31519 before this PR merged, I will implement the code. Otherwise, I will create two tickets later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

/**
* A table-valued command, e.g.
* {{{
* select tableName from command("show tables");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot add a new built-in function for this purpose instead? The implementation will be simpler than the current syntax-based approach, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this PR references the table-valued-function. So, I call the syntax is table-valued-command.
Second, Users want use filter clause, group by, ... on table-valued-command. In fact, This PR adopt the syntax-based approach is much simpler than built-in function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This implementation can make these command easy to use:

  1. Save result to table.
  2. Query the specified column.
  3. Filter / Group by / Order by the specified column.

This is the syntax of Teradata:

SELECT  tbl.DatabaseName,
        tbl.TableName,
        SUM(spc.CurrentPerm)/1024.00 as TableSize
FROM    DBC.TablesV tbl
JOIN    DBC.TableSize spc
ON  tbl.DatabaseName = spc.DatabaseName
AND tbl.TableName = spc.TableName
WHERE   tbl.DatabaseName NOT IN ('All', 'Crashdumps', 'DBC', 'dbcmngr', 
        'Default', 'External_AP', 'EXTUSER', 'LockLogShredder', 'PUBLIC',
        'Sys_Calendar', 'SysAdmin', 'SYSBAR', 'SYSJDBC', 'SYSLIB', 
        'SystemFe', 'SYSUDTLIB', 'SYSUIF', 'TD_SERVER_DB',  'TDStats',
        'TD_SYSGPL', 'TD_SYSXML', 'TDMaps', 'TDPUSER', 'TDQCD',
        'tdwm',  'SQLJ', 'TD_SYSFNLIB',  'SYSSPATIAL')
AND TableKind = 'T'
GROUP BY    tbl.DatabaseName,
            tbl.TableName
ORDER BY TableSize DESC;

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39842/

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39842/

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Test build #135262 has finished for PR 31548 at commit 280f39c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Test build #135265 has finished for PR 31548 at commit 7558dcb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39866/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39996/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40002/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40002/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Test build #135422 has finished for PR 31548 at commit 99676ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Mar 4, 2021

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Apr 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41506/

@SparkQA
Copy link

SparkQA commented Apr 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41506/

@SparkQA
Copy link

SparkQA commented Apr 6, 2021

Test build #136929 has finished for PR 31548 at commit 399b028.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

The use case is totally valid, but we may need more discussion about how to design the API to implement this use case.

I have a few thoughts:

  1. Use information schema. This is the most standard way, but unfortunately Spark follows Hive and creates a lot of special commands to provide metadata information (e.g. SHOW TABLES). It's a lot of effort to switch to information schema in Spark.
  2. Allow creating views with these special commands, e.g. CREATE VIEW v AS SHOW TABLES.
  3. Table-valued command in this function.

Personally I prefer option 2 as it's more predictable. We need extra effort for option 3 to define the behaviors of FROM COMMAND("abc") or FROM COMMAND("DROP TABLE"), while in option 2 we can carefully change the parser to only allow certain commands in CREATE VIEW.

If we want to add information schema in the future, they are read-only views and it's helpful if option 2 is done and we can already create views with SHOW TABLES, etc. already.

What do you think? cc @viirya @maropu @yaooqinn

@yaooqinn
Copy link
Member

yaooqinn commented Apr 6, 2021

Allow creating views with these special commands, e.g. CREATE VIEW v AS SHOW TABLES.

Hmm.. SHOW TABLES shows v here?

@cloud-fan
Copy link
Contributor

SHOW TABLES shows v here?

Ah good point. One way is to eagerly execute the command when creating the view, so v is excluded from the result, but it's a bit tricky to make the view not lazy.

Another idea is to allow SHOW TABLES etc. as subqueries, e.g. SELECT ... FROM (SHOW TABLES).

@viirya
Copy link
Member

viirya commented Apr 7, 2021

Ah good point. One way is to eagerly execute the command when creating the view, so v is excluded from the result, but it's a bit tricky to make the view not lazy.

Another idea is to allow SHOW TABLES etc. as subqueries, e.g. SELECT ... FROM (SHOW TABLES).

Making a non lazy view sounds weird to me. We then create another kind of special view just for this kind of commands.

The subquery approach looks more promising to me.

@SparkQA
Copy link

SparkQA commented Apr 7, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41584/

@SparkQA
Copy link

SparkQA commented Apr 7, 2021

Test build #137006 has finished for PR 31548 at commit add16cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait FunctionRegistryBase[T]
  • trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] with Logging
  • trait EmptyFunctionRegistryBase[T] extends FunctionRegistryBase[T]
  • trait FunctionRegistry extends FunctionRegistryBase[Expression]
  • trait TableFunctionRegistry extends FunctionRegistryBase[LogicalPlan]
  • case class ResolveTableValuedFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan]

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137585 has finished for PR 31548 at commit add16cb.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • trait FunctionRegistryBase[T]
  • trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] with Logging
  • trait EmptyFunctionRegistryBase[T] extends FunctionRegistryBase[T]
  • trait FunctionRegistry extends FunctionRegistryBase[Expression]
  • trait TableFunctionRegistry extends FunctionRegistryBase[LogicalPlan]
  • case class ResolveTableValuedFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan]

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42508/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42508/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Test build #137988 has finished for PR 31548 at commit 1d4a6b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

ping @cloud-fan

@github-actions
Copy link

github-actions bot commented Aug 6, 2021

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Aug 6, 2021
@github-actions github-actions bot closed this Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants