Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 27, 2021

What changes were proposed in this pull request?

In the PR, I propose to generate unique attributes in the logical nodes of the SHOW TABLES command.

Also, this PR fixes similar issues in other logical nodes:

  • ShowTableExtended
  • ShowViews
  • ShowTableProperties
  • ShowFunctions
  • ShowColumns
  • ShowPartitions
  • ShowNamespaces

Why are the changes needed?

This fixes the issue which is demonstrated by the example below:

scala> val show1 = sql("SHOW TABLES IN ns1")
show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string ... 1 more field]

scala> val show2 = sql("SHOW TABLES IN ns2")
show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string ... 1 more field]

scala> show1.show
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
|      ns1|     tbl1|      false|
+---------+---------+-----------+


scala> show2.show
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
|      ns2|     tbl2|      false|
+---------+---------+-----------+


scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.
  at org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)

Does this PR introduce any user-facing change?

Yes. After the changes, the example above works as expected:

scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
+---------+---------+-----------+---------+---------+-----------+
|namespace|tableName|isTemporary|namespace|tableName|isTemporary|
+---------+---------+-----------+---------+---------+-----------+
|      ns1|     tbl1|      false|      ns2|     tbl2|      false|
+---------+---------+-----------+---------+---------+-----------+

How was this patch tested?

By running the new test:

$  build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowTablesSuite"

@github-actions github-actions bot added the SQL label Feb 27, 2021
@SparkQA
Copy link

SparkQA commented Feb 27, 2021

Test build #135542 has finished for PR 31675 at commit 00e814c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40167/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40167/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40164/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40164/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135584 has finished for PR 31675 at commit b9d4ce7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135586 has finished for PR 31675 at commit c131cd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Mar 1, 2021

The change LGTM, but I'm wondering if we need to add a self-join test for every command that has output.

Shall we simply make this fix for all the existing commands that have output in the constructor parameter (should be less than 5) without tests as the issue is so obvious?

@AngersZhuuuu
Copy link
Contributor

For all changed output we can refer to https://issues.apache.org/jira/browse/SPARK-34156

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 1, 2021

Shall we simply make this fix for all the existing commands that have output in the constructor parameter (should be less than 5) without tests as the issue is so obvious?

I would propose to have one test for at least the command SHOW TABLES (to prevent future problems while refactoring/fixing like https://issues.apache.org/jira/browse/SPARK-34156), and just fix other logical nodes w/o tests.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 70f6267 Mar 1, 2021
@cloud-fan
Copy link
Contributor

@beliefer @AngersZhuuuu let's use this new approach when you add output to commands next time.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135598 has finished for PR 31675 at commit db5d5c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor

@beliefer @AngersZhuuuu let's use this new approach when you add output to commands next time.

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants