-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29908][SQL] Alternative proposal for supporting partitioning through save for V2 tables #25833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #110927 has finished for PR 25833 at commit
|
|
Hi, @brkyvz . This seems to break the compilation. Could you take a look? |
|
Another option is that a V2 DataSource doesn't need to extend |
| extraOptions.toMap, | ||
| orCreate = true) // Create the table if it doesn't exist | ||
|
|
||
| case (other, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use AppendData when mode is append?
| */ | ||
| @Experimental | ||
| public interface TableCatalog extends CatalogPlugin { | ||
| public interface TableCatalog extends CatalogPlugin, SupportCreateTable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do tables managed by a TableProvider not require invalidateTable?
| } else if (paths.isEmpty) { | ||
| throw new IllegalArgumentException("Didn't specify the 'path' for file based table") | ||
| } | ||
| Identifier.of(Array.empty, paths.head) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a different class, PathIdentifier, so that we can easily identify these and handle them separately.
What changes were proposed in this pull request?
This is an alternative proposal to #25822 and #25651. The problem we're trying to solve is that when a catalog doesn't exist when using a data source, there is no good way to create a V2 table with partitioning and table property information. Spark users have been using data source options to connect to such data sources such as Kafka, JDBC tables through data source options, and it should be possible to continue to create tables as such.
This PR introduces a couple interfaces:
SupportsCreateTableandSupportsIdentifierTranslation.SupportsCreateTableare the parts that existed inTableCatalogthat are related to the creation/dropping of tables. This is pulled out, andTableCatalogextends this interface.SupportsIdentifierTranslationis a way for data sources to go from data source options to an internal identifier that can be used to describe how to access that table. ATableProvidercan extendSupportsIdentifierTranslationandSupportsCreateTableto be able to support the creation of tables without requiring an explicit catalog.This would:
DataFrameWriter.savewhen passing in partitioning information to data sourcesErrorIfExistsandIgnoreto be supported forDataFrameWriter.saveDataFrameWriterV2Why are the changes needed?
DataFrameWriter.saveis broken for all data sources that want to get partitioning information and support differentSaveModes that migrate from DataSource V1 to V2 APIs.Does this PR introduce any user-facing change?
The behavior of a DataSource that used to be DataSource V1 in Spark 2.4 can behave identically with DataSource V2 in Spark 3.0.
How was this patch tested?
Will add tests after comments