Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Sep 18, 2019

What changes were proposed in this pull request?

This is an alternative proposal to #25822 and #25651. The problem we're trying to solve is that when a catalog doesn't exist when using a data source, there is no good way to create a V2 table with partitioning and table property information. Spark users have been using data source options to connect to such data sources such as Kafka, JDBC tables through data source options, and it should be possible to continue to create tables as such.

This PR introduces a couple interfaces: SupportsCreateTable and SupportsIdentifierTranslation. SupportsCreateTable are the parts that existed in TableCatalog that are related to the creation/dropping of tables. This is pulled out, and TableCatalog extends this interface. SupportsIdentifierTranslation is a way for data sources to go from data source options to an internal identifier that can be used to describe how to access that table. A TableProvider can extend SupportsIdentifierTranslation and SupportsCreateTable to be able to support the creation of tables without requiring an explicit catalog.

This would:

  1. Fix the behavior for DataFrameWriter.save when passing in partitioning information to data sources
  2. Allow ErrorIfExists and Ignore to be supported for DataFrameWriter.save
  3. Open the path for supporting path based tables in DataFrameWriterV2

Why are the changes needed?

DataFrameWriter.save is broken for all data sources that want to get partitioning information and support different SaveModes that migrate from DataSource V1 to V2 APIs.

Does this PR introduce any user-facing change?

The behavior of a DataSource that used to be DataSource V1 in Spark 2.4 can behave identically with DataSource V2 in Spark 3.0.

How was this patch tested?

Will add tests after comments

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110927 has finished for PR 25833 at commit e48a5c4.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait FileDataSourceV2 extends SupportIdentifierTranslation with DataSourceRegister

@dongjoon-hyun
Copy link
Member

Hi, @brkyvz . This seems to break the compilation. Could you take a look?

@brkyvz
Copy link
Contributor Author

brkyvz commented Sep 18, 2019

@brkyvz
Copy link
Contributor Author

brkyvz commented Sep 18, 2019

Another option is that a V2 DataSource doesn't need to extend TableProvider for CreateTable and stuff to go through the V2SessionCatalog, and a DataSource can continue to re-use it's V1 APIs.

extraOptions.toMap,
orCreate = true) // Create the table if it doesn't exist

case (other, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use AppendData when mode is append?

*/
@Experimental
public interface TableCatalog extends CatalogPlugin {
public interface TableCatalog extends CatalogPlugin, SupportCreateTable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do tables managed by a TableProvider not require invalidateTable?

} else if (paths.isEmpty) {
throw new IllegalArgumentException("Didn't specify the 'path' for file based table")
}
Identifier.of(Array.empty, paths.head)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a different class, PathIdentifier, so that we can easily identify these and handle them separately.

@brkyvz brkyvz closed this Nov 11, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29127][SQL] Alternative proposal for supporting partitioning through save for V2 tables [SPARK-29908][SQL] Alternative proposal for supporting partitioning through save for V2 tables Nov 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants