feat: INSERT INTO support
#1177
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Depends on: #1176
Closes #1164.
Rationale for this change
Add support for insert into
INSERT INTOWhat changes are included in this PR?
This PR enables
INSERT INTOsupport in datafusion. The reason this feature lands at this time is lack of support forLogicalPlan::DMLserialization in datafusion, which was addressed in apache/datafusion#14079.Important fact to note is that
LogicalPlan::DMLuses table reference, pointing to a table which would be inserted into. Having table reference is a problem with ballista, as client application has two unsychrnonised session contexts, the first one client side, with all required table definitions, and one on the scheduler which does not have any table providers. This issue does not exist with remote (shared) schema providers which client and scheduler context have access to.With most options to address this problem listed in #1164 I propose the simplest to implement.
Proposed solution implements custom
LogicalPlan::Extensionwhich haveLogicalPlan::DMLand serializes referencedTableProvider.BallistaLogicalCodecsimplements ser/de of this new extension. To implementBallistaLogicalCodecsusesLogicalPlan::TableScanserialisation to serialise TableProvider.This is a bit of a hack, but its working.
On the scheduler side, extended logical plan will be intercepted, table provider deserialised and registered in the local session context, logical plan extension will be replaced with original DML plan.
This functionality could be disabled setting
ballista.planner.dml_extensionto false.Proposed approach does look "hacky", it would make more sense to address this issue in datafusion
LogicalPlan::DMLand replace table reference with table source, then we need to extract logic for serialising table source fromLogicalPlan::TableScanand re-use it. IMHO it would make sense address this issue there.Are there any user-facing changes?