-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18360][SQL] default table path of tables in default database should depend on the location of default database #15812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note that, it's not a completed fix, CTAS and Do we really need to make warehouse path a session-scoped runtime config? Does this config make sense when users connecting to a remote Hive metastore? Spark app is (most of the time) not a long-running service, but a one shoot program. It's hard to define and reason about the semantic of the warehouse path config, shall we only enable it for local metastore? |
|
I don't think it makes sense for it to be session specific, at least for now ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you just want to assert this?
(Disregard my earlier comment, this doesn't relate to recent changes I made to the warehouse path)
|
Yea. Warehouse location should not be session specific. Since we will propagate it to hive, it is shared by all sessions. |
|
Test build #68351 has finished for PR 15812 at commit
|
fa61496 to
bcd9d4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method just copies the code in the else branch before: https://github.com/apache/spark/pull/15812/files?diff=unified#diff-159191585e10542f013cb3a714f26075L208
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use TestHiveContext so that the hive metadata is cleaned at the beginning.
|
Test build #68706 has finished for PR 15812 at commit
|
… location of default database
bcd9d4b to
bd8fc95
Compare
|
Test build #68716 has finished for PR 15812 at commit
|
|
Test build #68749 has finished for PR 15812 at commit
|
27be481 to
3e2073c
Compare
|
Test build #68784 has finished for PR 15812 at commit
|
|
LGTM. Merging to master and branch 2.1. |
…hould depend on the location of default database ## What changes were proposed in this pull request? The current semantic of the warehouse config: 1. it's a static config, which means you can't change it once your spark application is launched. 2. Once a database is created, its location won't change even the warehouse path config is changed. 3. default database is a special case, although its location is fixed, but the locations of tables created in it are not. If a Spark app starts with warehouse path B(while the location of default database is A), then users create a table `tbl` in default database, its location will be `B/tbl` instead of `A/tbl`. If uses change the warehouse path config to C, and create another table `tbl2`, its location will still be `B/tbl2` instead of `C/tbl2`. rule 3 doesn't make sense and I think we made it by mistake, not intentionally. Data source tables don't follow rule 3 and treat default database like normal ones. This PR fixes hive serde tables to make it consistent with data source tables. ## How was this patch tested? HiveSparkSubmitSuite Author: Wenchen Fan <wenchen@databricks.com> Closes #15812 from cloud-fan/default-db. (cherry picked from commit ce13c26) Signed-off-by: Yin Huai <yhuai@databricks.com>
…hould depend on the location of default database ## What changes were proposed in this pull request? The current semantic of the warehouse config: 1. it's a static config, which means you can't change it once your spark application is launched. 2. Once a database is created, its location won't change even the warehouse path config is changed. 3. default database is a special case, although its location is fixed, but the locations of tables created in it are not. If a Spark app starts with warehouse path B(while the location of default database is A), then users create a table `tbl` in default database, its location will be `B/tbl` instead of `A/tbl`. If uses change the warehouse path config to C, and create another table `tbl2`, its location will still be `B/tbl2` instead of `C/tbl2`. rule 3 doesn't make sense and I think we made it by mistake, not intentionally. Data source tables don't follow rule 3 and treat default database like normal ones. This PR fixes hive serde tables to make it consistent with data source tables. ## How was this patch tested? HiveSparkSubmitSuite Author: Wenchen Fan <wenchen@databricks.com> Closes apache#15812 from cloud-fan/default-db.
What changes were proposed in this pull request?
The current semantic of the warehouse config:
tblin default database, its location will beB/tblinstead ofA/tbl. If uses change the warehouse path config to C, and create another tabletbl2, its location will still beB/tbl2instead ofC/tbl2.rule 3 doesn't make sense and I think we made it by mistake, not intentionally. Data source tables don't follow rule 3 and treat default database like normal ones.
This PR fixes hive serde tables to make it consistent with data source tables.
How was this patch tested?
HiveSparkSubmitSuite