[SPARK-18360][SQL] default table path of tables in default database should depend on the location of default database #15812

cloud-fan · 2016-11-08T18:07:51Z

What changes were proposed in this pull request?

The current semantic of the warehouse config:

it's a static config, which means you can't change it once your spark application is launched.
Once a database is created, its location won't change even the warehouse path config is changed.
default database is a special case, although its location is fixed, but the locations of tables created in it are not. If a Spark app starts with warehouse path B(while the location of default database is A), then users create a table tbl in default database, its location will be B/tbl instead of A/tbl. If uses change the warehouse path config to C, and create another table tbl2, its location will still be B/tbl2 instead of C/tbl2.

rule 3 doesn't make sense and I think we made it by mistake, not intentionally. Data source tables don't follow rule 3 and treat default database like normal ones.

This PR fixes hive serde tables to make it consistent with data source tables.

How was this patch tested?

HiveSparkSubmitSuite

cloud-fan · 2016-11-08T18:18:46Z

Note that, it's not a completed fix, CTAS and InMemoryCatalog are still broken, I'm sending this PR to get some discussion.

Do we really need to make warehouse path a session-scoped runtime config? Does this config make sense when users connecting to a remote Hive metastore?

Spark app is (most of the time) not a long-running service, but a one shoot program. It's hard to define and reason about the semantic of the warehouse path config, shall we only enable it for local metastore?

cc @yhuai @srowen @rxin

rxin · 2016-11-08T18:21:44Z

I don't think it makes sense for it to be session specific, at least for now ...

srowen · 2016-11-08T18:37:00Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala

Don't you just want to assert this?

(Disregard my earlier comment, this doesn't relate to recent changes I made to the warehouse path)

yhuai · 2016-11-08T18:44:15Z

Yea. Warehouse location should not be session specific. Since we will propagate it to hive, it is shared by all sessions.

SparkQA · 2016-11-08T20:12:29Z

Test build #68351 has finished for PR 15812 at commit fa61496.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-11-16T09:08:10Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

this method just copies the code in the else branch before: https://github.com/apache/spark/pull/15812/files?diff=unified#diff-159191585e10542f013cb3a714f26075L208

cloud-fan · 2016-11-16T09:09:04Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala

use TestHiveContext so that the hive metadata is cleaned at the beginning.

SparkQA · 2016-11-16T10:52:58Z

Test build #68706 has finished for PR 15812 at commit bcd9d4b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

… location of default database

SparkQA · 2016-11-16T13:46:56Z

Test build #68716 has finished for PR 15812 at commit bd8fc95.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-17T07:37:10Z

Test build #68749 has finished for PR 15812 at commit 27be481.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-17T15:59:59Z

Test build #68784 has finished for PR 15812 at commit 3e2073c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-11-18T01:30:27Z

LGTM. Merging to master and branch 2.1.

…hould depend on the location of default database ## What changes were proposed in this pull request? The current semantic of the warehouse config: 1. it's a static config, which means you can't change it once your spark application is launched. 2. Once a database is created, its location won't change even the warehouse path config is changed. 3. default database is a special case, although its location is fixed, but the locations of tables created in it are not. If a Spark app starts with warehouse path B(while the location of default database is A), then users create a table `tbl` in default database, its location will be `B/tbl` instead of `A/tbl`. If uses change the warehouse path config to C, and create another table `tbl2`, its location will still be `B/tbl2` instead of `C/tbl2`. rule 3 doesn't make sense and I think we made it by mistake, not intentionally. Data source tables don't follow rule 3 and treat default database like normal ones. This PR fixes hive serde tables to make it consistent with data source tables. ## How was this patch tested? HiveSparkSubmitSuite Author: Wenchen Fan <wenchen@databricks.com> Closes #15812 from cloud-fan/default-db. (cherry picked from commit ce13c26) Signed-off-by: Yin Huai <yhuai@databricks.com>

…hould depend on the location of default database ## What changes were proposed in this pull request? The current semantic of the warehouse config: 1. it's a static config, which means you can't change it once your spark application is launched. 2. Once a database is created, its location won't change even the warehouse path config is changed. 3. default database is a special case, although its location is fixed, but the locations of tables created in it are not. If a Spark app starts with warehouse path B(while the location of default database is A), then users create a table `tbl` in default database, its location will be `B/tbl` instead of `A/tbl`. If uses change the warehouse path config to C, and create another table `tbl2`, its location will still be `B/tbl2` instead of `C/tbl2`. rule 3 doesn't make sense and I think we made it by mistake, not intentionally. Data source tables don't follow rule 3 and treat default database like normal ones. This PR fixes hive serde tables to make it consistent with data source tables. ## How was this patch tested? HiveSparkSubmitSuite Author: Wenchen Fan <wenchen@databricks.com> Closes apache#15812 from cloud-fan/default-db.

srowen reviewed Nov 8, 2016

View reviewed changes

cloud-fan force-pushed the default-db branch from fa61496 to bcd9d4b Compare November 16, 2016 09:01

cloud-fan changed the title ~~[SPARK-18360][SQL] warehouse path config should work for data source tables~~ [SPARK-18360][SQL] default table path of tables in default database should depend on the location of default database Nov 16, 2016

cloud-fan commented Nov 16, 2016

View reviewed changes

default table path of tables in default database should depend on the…

bd8fc95

… location of default database

cloud-fan force-pushed the default-db branch from bcd9d4b to bd8fc95 Compare November 16, 2016 12:28

change test

3e2073c

cloud-fan force-pushed the default-db branch from 27be481 to 3e2073c Compare November 17, 2016 14:07

asfgit closed this in ce13c26 Nov 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18360][SQL] default table path of tables in default database should depend on the location of default database #15812

[SPARK-18360][SQL] default table path of tables in default database should depend on the location of default database #15812

Uh oh!

cloud-fan commented Nov 8, 2016 •

edited

Loading

Uh oh!

cloud-fan commented Nov 8, 2016

Uh oh!

rxin commented Nov 8, 2016 •

edited

Loading

Uh oh!

srowen Nov 8, 2016

Uh oh!

yhuai commented Nov 8, 2016

Uh oh!

SparkQA commented Nov 8, 2016

Uh oh!

cloud-fan Nov 16, 2016

Uh oh!

cloud-fan Nov 16, 2016

Uh oh!

SparkQA commented Nov 16, 2016

Uh oh!

SparkQA commented Nov 16, 2016

Uh oh!

SparkQA commented Nov 17, 2016

Uh oh!

SparkQA commented Nov 17, 2016

Uh oh!

yhuai commented Nov 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-18360][SQL] default table path of tables in default database should depend on the location of default database #15812

[SPARK-18360][SQL] default table path of tables in default database should depend on the location of default database #15812

Uh oh!

Conversation

cloud-fan commented Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Nov 8, 2016

Uh oh!

rxin commented Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen Nov 8, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai commented Nov 8, 2016

Uh oh!

SparkQA commented Nov 8, 2016

Uh oh!

cloud-fan Nov 16, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 16, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 16, 2016

Uh oh!

SparkQA commented Nov 16, 2016

Uh oh!

SparkQA commented Nov 17, 2016

Uh oh!

SparkQA commented Nov 17, 2016

Uh oh!

yhuai commented Nov 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan commented Nov 8, 2016 •

edited

Loading

rxin commented Nov 8, 2016 •

edited

Loading