Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jun 1, 2016

What changes were proposed in this pull request?

IF NOT EXISTS in INSERT OVERWRITE should not support dynamic partitions. If we specify IF NOT EXISTS, the inserted statement is not shown in the table.

This PR is to issue an exception in this case, just like what Hive does. Also issue an exception if users specify IF NOT EXISTS if users do not specify any PARTITION specification.

How was this patch tested?

Added test cases into PlanParserSuite and InsertIntoHiveTableSuite

@SparkQA
Copy link

SparkQA commented Jun 1, 2016

Test build #59753 has finished for PR 13447 at commit 61f0632.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2016

Test build #59755 has finished for PR 13447 at commit 70d9cad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @liancheng @hvanhovell @yhuai

val tableIdent = visitTableIdentifier(ctx.tableIdentifier)
val partitionKeys = Option(ctx.partitionSpec).map(visitPartitionSpec).getOrElse(Map.empty)

if (ctx.EXISTS != null && ctx.partitionSpec == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could enforce this in grammar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it. Thanks!

@hvanhovell
Copy link
Contributor

@gatorsmile the overall approach seems good to me. We are currently fixing this for the SQL codepath. I was wondering if are other codepaths that can cause this unwanted behavior? If there are then we should move the check to the InsertInto class.

@gatorsmile
Copy link
Member Author

I think your concern is valid. Will add an assert in InsertIntoTable.

So far, dynamic partitioning is used by the insertInto API. However, there is no way to specify IF NOT EXISTS in the other API insertInto.

data.write.mode(SaveMode.Overwrite).partitionBy("part").insertInto("partitioned")

The default value is always false, as shown below:

However, there still exist other bugs in insertInto. Will submit it later.

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59865 has finished for PR 13447 at commit be187d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59873 has finished for PR 13447 at commit bbaad66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 7, 2016

Test build #60098 has finished for PR 13447 at commit bbaad66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@hvanhovell Does the latest code changes resolve your comment? Thanks!

@yhuai
Copy link
Contributor

yhuai commented Jun 10, 2016

Seems the behavior of the following query is weird (if it works). (the case without IF NOT EXISTS)

INSERT OVERWRITE TABLE table_with_partition
partition (p1='a',p2)
SELECT 'blarr3'

For this case, the number of column (1) of the query is less than the required number of columns (2). There should be an exception, right?

@gatorsmile
Copy link
Member Author

gatorsmile commented Jun 11, 2016

@yhuai You are right. This is not a good test case for verifying this. I will add a case like

      sql(
        """
          |INSERT OVERWRITE TABLE table_with_partition
          |partition (p1='a',p2) IF NOT EXISTS
          |SELECT 'blarr3', 'newPartition'
        """.stripMargin)

The above statement returns an empty result set, which is still a wrong answer.

Regarding your follow-up question, I also realized this is another bug we should capture, as shown above: #13447 (comment). My original plan is to submit a PR for addressing that issue after #12313 is merged. It sounds like we do not plan to do this in 2.0: #12313, I can do it now. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 11, 2016

Test build #60344 has finished for PR 13447 at commit 84ed71a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2016

Test build #60619 has finished for PR 13447 at commit 84ed71a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @hvanhovell @yhuai Could you please review this PR again? Thanks!

@yhuai
Copy link
Contributor

yhuai commented Jun 17, 2016

Thanks. LGTM. Merging to master and branch 2.0.

asfgit pushed a commit that referenced this pull request Jun 17, 2016
…T OVERWRITE for DYNAMIC PARTITION

#### What changes were proposed in this pull request?
`IF NOT EXISTS` in `INSERT OVERWRITE` should not support dynamic partitions. If we specify `IF NOT EXISTS`, the inserted statement is not shown in the table.

This PR is to issue an exception in this case, just like what Hive does. Also issue an exception if users specify `IF NOT EXISTS` if users do not specify any `PARTITION` specification.

#### How was this patch tested?
Added test cases into `PlanParserSuite` and `InsertIntoHiveTableSuite`

Author: gatorsmile <gatorsmile@gmail.com>

Closes #13447 from gatorsmile/insertIfNotExist.

(cherry picked from commit e5d703b)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in e5d703b Jun 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants