Skip to content

Conversation

@milenkovicm
Copy link
Contributor

@milenkovicm milenkovicm commented Feb 9, 2025

Which issue does this PR close?

Depends on: #1176
Closes #1164.

Rationale for this change

Add support for insert into INSERT INTO

What changes are included in this PR?

This PR enables INSERT INTO support in datafusion. The reason this feature lands at this time is lack of support for LogicalPlan::DML serialization in datafusion, which was addressed in apache/datafusion#14079.

Important fact to note is that LogicalPlan::DML uses table reference, pointing to a table which would be inserted into. Having table reference is a problem with ballista, as client application has two unsychrnonised session contexts, the first one client side, with all required table definitions, and one on the scheduler which does not have any table providers. This issue does not exist with remote (shared) schema providers which client and scheduler context have access to.

With most options to address this problem listed in #1164 I propose the simplest to implement.

Proposed solution implements custom LogicalPlan::Extension which have LogicalPlan::DML and serializes referenced TableProvider.
BallistaLogicalCodecs implements ser/de of this new extension. To implement BallistaLogicalCodecs uses LogicalPlan::TableScan serialisation to serialise TableProvider.
This is a bit of a hack, but its working.

On the scheduler side, extended logical plan will be intercepted, table provider deserialised and registered in the local session context, logical plan extension will be replaced with original DML plan.

This functionality could be disabled setting ballista.planner.dml_extension to false.

Proposed approach does look "hacky", it would make more sense to address this issue in datafusion LogicalPlan::DML and replace table reference with table source, then we need to extract logic for serialising table source from LogicalPlan::TableScan and re-use it. IMHO it would make sense address this issue there.

Are there any user-facing changes?

@milenkovicm milenkovicm force-pushed the feat_insert_into branch 2 times, most recently from d518dd8 to 6045c00 Compare February 9, 2025 15:19
@milenkovicm
Copy link
Contributor Author

after merging apache/datafusion#14631 this PR is irrelevant if we want to wait for datafusion v.46 for it

@milenkovicm milenkovicm closed this Mar 6, 2025
@milenkovicm milenkovicm deleted the feat_insert_into branch March 19, 2025 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DDL Statement Propagation (INSERT INTO support)

1 participant