[SPARK-49152][SQL][FOLLOWUP] table location string should be Hadoop Path string#47759
[SPARK-49152][SQL][FOLLOWUP] table location string should be Hadoop Path string#47759cloud-fan wants to merge 2 commits intoapache:masterfrom
Conversation
| case SetTableLocation(ResolvedV1TableIdentifier(ident), None, location) => | ||
| AlterTableSetLocationCommand(ident, None, location) | ||
|
|
||
| // V2 catalog doesn't support setting partition location yet, we must use v1 command here. |
There was a problem hiding this comment.
not related to table location but is also a followup of #47660 to fix a missing case.
| private def qualifyLocInTableSpec(tableSpec: TableSpec): TableSpec = { | ||
| tableSpec.withNewLocation(tableSpec.location.map(loc => CatalogUtils.makeQualifiedPath( | ||
| CatalogUtils.stringToURI(loc), hadoopConf).toString)) | ||
| val newLoc = tableSpec.location.map { loc => |
There was a problem hiding this comment.
The code here follows SessionCatalog#makeQualifiedTablePath
| /** | ||
| * A reserved property to specify the location of the table. The files of the table | ||
| * should be under this location. | ||
| * should be under this location. The location is a Hadoop Path string. |
There was a problem hiding this comment.
Thank you for clarification.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, this looks correct to me.
| c INT) | ||
| USING parquet | ||
| LOCATION 'file:///path/to/table' | ||
| LOCATION 'file:/path/to/table' |
There was a problem hiding this comment.
I've investigated this locally. The new result is actually consistent with the production behavior (with Hive Metastore). What happens is:
- CREATE TABLE command qualifies the table location, and keeps it as
URIinCatalogTable#storage#locationUri - When saving the table into HMS, we have to turn
URIinto string, by usingnew Path(uri).toString. The Hadoop Path string will omit theauthoritycomponent if it's not present. Sofile:///pathbecomesfile:/path. - When reading the table back from HMS, we turn string back to
URI, but theauthoritycomponent won't be filled.
However, with InMemoryCatalog, we keep CatalogTable directly and there is no URI <-> string round trip. So the authority component is still there.
This actually doesn't matter, as empty authority is the same as no authority, in URI string.
There was a problem hiding this comment.
Thank you for the details.
|
Merged to master. |
|
Could you make a backporting PR to branch-3.5 seperately in order to make it sure all CIs pass, @cloud-fan ? |
…ath string This is a followup of apache#47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars. restore the unintentional behavior change. No, it's not released yet new test no Closes apache#47759 from cloud-fan/fix. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This is a followup of #47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars.
Why are the changes needed?
restore the unintentional behavior change.
Does this PR introduce any user-facing change?
No, it's not released yet
How was this patch tested?
new test
Was this patch authored or co-authored using generative AI tooling?
no