Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 9, 2017

What changes were proposed in this pull request?

Since Spark 2.0.0, SET commands do not pass the values to HiveClient. In most case, Spark handles well. However, for the dynamic partition insert, users meet the following misleading situation.

scala> spark.range(1001).selectExpr("id as key", "id as value").registerTempTable("t1001")

scala> sql("create table p (value int) partitioned by (key int)").show

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.spark.SparkException:
Dynamic partition strict mode requires at least one static partition column.
To turn this off set hive.exec.dynamic.partition.mode=nonstrict

scala> sql("set hive.exec.dynamic.partition.mode=nonstrict")

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.hadoop.hive.ql.metadata.HiveException:
Number of dynamic partitions created is 1001, which is more than 1000.
To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.

scala> sql("set hive.exec.max.dynamic.partitions=1001")

scala> sql("set hive.exec.max.dynamic.partitions").show(false)
+--------------------------------+-----+
|key                             |value|
+--------------------------------+-----+
|hive.exec.max.dynamic.partitions|1001 |
+--------------------------------+-----+

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.hadoop.hive.ql.metadata.HiveException:
Number of dynamic partitions created is 1001, which is more than 1000.
To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.

The last error is the same with the previous one. HiveClient does not know new value 1001. While users control hive.exec.dynamic.partition.mode, there is no way to change the default value of hive.exec.max.dynamic.partitions of HiveCilent with SET command.

The root cause is that hive parameters are passed to HiveClient on creating. So, the workaround is to use --hiveconf when starting spark-shell. However, it is still unchangeable in spark-shell. We had better handle this case without misleading error messages ending infinite loop.

How was this patch tested?

Manual.

@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74247 has started for PR 17223 at commit a874455.

"hive.exec.max.dynamic.partitions",
"hive.exec.max.dynamic.partitions.pernode",
"hive.exec.max.created.files",
"hive.error.on.empty.partition"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member Author

Retest this please

@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74255 has finished for PR 17223 at commit a874455.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Could you review this when you have sometime, @gatorsmile ?

@dongjoon-hyun
Copy link
Member Author

Hi, @cloud-fan .
Could you review this when you have sometime?

"hive.error.on.empty.partition"
).foreach { param =>
if (sqlConf.contains(param)) {
client.runSqlHive(s"set $param=${sqlConf.getConfString(param)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do it when users issuing the SET command? Is it a general issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be the best approach since this is a general issue for all unhandled hive param options. The reason to do this here is that SetCommand lives in sql/core and does not interact with this. Is there a way to invoke runSqlHive there?

@cloud-fan
Copy link
Contributor

Since hive client is shared among all sessions, we can't set hive conf dynamically, to keep session isolation. I think we should treat hive conf as static sql conf, and throw exception when users try to change them.

@dongjoon-hyun
Copy link
Member Author

I see. That's the reason why not to support that. Thank you, @cloud-fan.

@dongjoon-hyun
Copy link
Member Author

I'll close this PR and JIRA issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants