[SPARK-49152][SQL] V2SessionCatalog should use V2Command#47724
[SPARK-49152][SQL] V2SessionCatalog should use V2Command#47724amaliujia wants to merge 3 commits intoapache:branch-3.5from
Conversation
V2SessionCatalog should use V2Command when possible. This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog. No Existing tests. NO Closes apache#47660 from amaliujia/create_table_v2. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
|
||
| case ShowTableExtended( | ||
| DatabaseInSessionCatalog(db), | ||
| ResolvedV1Database(db), |
There was a problem hiding this comment.
| ResolvedV1Database(db), | |
| ResolvedV1Database(db), |
…/ResolveSessionCatalog.scala
|
thanks, merging to 3.5! |
### What changes were proposed in this pull request? V2SessionCatalog should use V2Command when possible. ### Why are the changes needed? This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #47724 from amaliujia/branch-3.5. Lead-authored-by: Rui Wang <rui.wang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
|
||
| object ResolvedV1Identifier { | ||
| def unapply(resolved: LogicalPlan): Option[TableIdentifier] = resolved match { | ||
| case ResolvedIdentifier(catalog, ident) if supportsV1Command(catalog) => |
There was a problem hiding this comment.
@amaliujia @cloud-fan
This change looks to have broken creating V1 table from a V2_SESSION_CATALOG_IMPLEMENTATION like Iceberg's SparkSessionCatalog
There was a problem hiding this comment.
Does Iceberg catalog extend DelegatingCatalogExtension?
There was a problem hiding this comment.
We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension
There was a problem hiding this comment.
Does Iceberg catalog extend DelegatingCatalogExtension?
Nope.
We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension
Even so, is it the right time to introduce such a behavior change in a bug fix release?
There was a problem hiding this comment.
We can consider it as a bug. People implementing DS V2 catalog APIs expect to see v2 commands to customize the table behaviors. And there is a backdoor: DelegatingCatalogExtension.
For iceberg, it should be easy to work around it by extending DelegatingCatalogExtension? Iceberg catalog can still keep all its methods unchanged, don't use the delegate.
There was a problem hiding this comment.
Iceberg's SparkSessionCatalog already extends a base class. There's no easy way to extend DelegatingCatalogExtension without a major refactoring.
There was a problem hiding this comment.
We need to make either the iceberg BaseCatalog or the Spark DelegatingCatalogExtension an interface. It looks easier to make BaseCatalog an interface?
### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 339dd5b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. apache/spark#47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? V2SessionCatalog should use V2Command when possible. ### Why are the changes needed? This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #47724 from amaliujia/branch-3.5. Lead-authored-by: Rui Wang <rui.wang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
V2SessionCatalog should use V2Command when possible.
Why are the changes needed?
This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
NO