Skip to content
Back to Milestones

Efficient merge_insert

Open
No due date
Last updated Apr 8, 2026

We are migrating the implementation of merge insert to be more efficient, particularly with memory. During the refactor, we are taking the implementation that manually manipulates streams and replacing with an implementation that uses DataFusion to generate and optimize the whole plan.

PRD: Retire v1 Code Path

Three categories of operations still fall back to v1:

  1. WhenMatched::DoNothing (find-or-create pattern)
  2. Partial schema upserts (source has subset of target columns)
  3. Upserts when a scalar index exists on the join key

Goal: Migrate all three to v2, then delete v1. This is a correctness/parity migration, not a performance optimization pass.

Vertical Slices

  1. DoNothing on v2 — Add DoNothing to can_use_create_plan eligibility. Simplest slice; establishes the pattern.
  2. Partial schema upsert on v2 — Fill missing columns via projection in the logical plan. Write full rows. Does NOT replicate v1's column-rewrite optimization (future ticket).
  3. Research spike: indexed join strategy — Time-boxed evaluation of ScalarIndexJoinExec (#3480) vs finger search (#4648) vs wrapping v1 logic. Output: design for slice 4.
  4. Scalar-indexed joins on v2 — Implement approach from spike. Remove scalar index fallback. Retain use_index as escape hatch.
  5. Remove v1 code — Delete Merger, v1 branch in execute_uncommitted_impl, can_use_create_plan, v1-only helpers.

Existing Issue Disposition

Issue Action
#4194 (optimize merge insert with delete) Split into two post-retirement tickets: WhenMatched::Delete opt and WhenNotMatchedBySource::Delete opt
#4193 (partial schema / TakeExec) Reframe as follow-up optimization after slice 2
#4266 (TakeExec logical node) Deferred until #4193 optimization
#3480 (indexed merge insert) Spike (slice 3) updates description; slice 4 implements
#4583 (streaming vs materialized) Out of scope
#4648 (finger search) Feeds into slice 3 research

Out of Scope

  • Ergonomics improvements (milestone 8)
  • Performance optimizations: TakeExec column-rewrite avoidance, streaming vs materialized inputs, join order optimization
26% complete

List view