Skip to content

Copy/upsert data across tables in bulk (INSERT ... SELECT) #27320

@roji

Description

@roji

This issue tracks introducing an API to copy data inside the database, from one table (or several) into a destination table. Note this is different from bulk importing data from the client into the database (#27333).

This was split off from #795 (bulk update/delete).

Basic API

All SQL database support a variant of the INSERT statement which accepts a query instead of a list of values:

INSERT INTO x SELECT * FROM y

The column list can be specified (INSERT INTO x (a, b) SELECT ...) or omitted (INSERT INTO x SELECT ...). If it's omitted, the subquery must return the exact number of columns in the destination table, with the correct type. Since it's problematic to rely on table column ordering (e.g. can't be changed after creation), we should probably force the user to always explicitly provide the column list.

Basic proposals:

// Variant 1: begin from the source table, flow data through the LINQ operators to the destination.
// The naming corresponds to the SQL (`INSERT INTO`), no ambiguity with change-tracking operations on DbSet (like with Update).
ctx.Blogs1.Where(...).InsertInto(ctx.Blogs2, b => b.Name)

// Variant 1.1: Naming-wise, we could make it extra-explicit that it's a bulk operation.
// Also more consistent with BulkDelete/BulkUpdate (with or without `Into` suffix).
// May be slightly ambiguous with bulk import (from client).
ctx.Blogs1.Where(...).BulkInsert(ctx.Blogs2, b => new { b.Name, b.Url })

// Variant 2: we can flip the order, but this adds nesting which seems unnecessary:
ctx.Blogs2.InsertFrom(ctx.Blogs1.Where(...), b => new { b.Name, b.Url })

Static column compatibility checking

It would be great to statically enforce that the column list matches the incoming columns from the source table, e.g. with the following signature:

public static void InsertInto<TSource, TDestination>(
    this IQueryable<TSource> source,
    DbSet<TDestination> destination,
    Expression<Func<TDestination, TSource>> columnSelector)
    where TDestination : class
{
}

This works great for a single column:

ctx.Blogs.Select(b => b.Name).InsertInto(ctx.Customers, c => c.Foo1);

With multiple columns, this fails if the anonymous type's field names differ:

ctx.Blogs.Select(b => new { b.Name, b.Url }).InsertInto(ctx.Customers, c => new { c.Foo1, c.Foo2 });

Requiring the source's and column's anonymous types to have the same field names seems... problematic (we really do want to project across different columns).

If we had value tuples in expression trees (yet again), this would work quite well:

ctx.Blogs.Select(b => (b.Name, b.Url)).InsertInto(ctx.Customers, c => (c.Foo1, c.Foo2));

In any case, if we don't want this to depend on value tuple syntax, we could give up static-time enforcing with the following signature:

public static void InsertInto<TSource, TDestination, TColumns>(
    this IQueryable<TSource> source,
    DbSet<TDestination> destination,
    Expression<Func<TDestination, TColumns>> columnSelector)
    where TDestination : class
{
}

... and the query would fail at runtime if things are mismatched.

Finally, note that if we want to, we could have a specific overload for copying between tables mapped to shared type entity types - in this case no column list is necessary:

ctx.Blogs1.InsertInto(ctx.Blogs2);

Fancier examples

// With navigation:
ctx.Blogs1
    .Where(b => b.Posts.Any())
    .Select(b => new { Blog = b, FirstPost = b.Posts.OrdersBy(p => p.Popularity).Take(1) })
    .Select(x => new { x.Blog.Url, x.Blog.Name, x.FirstPost.Title, x.FirstPost.Author })
    .InsertInto(ctx.Foo, f => new { f.Url, f.Name, f.Title, f.Author });

// Fetching generated columns back (RETURNING/OUTPUT):
var ids = ctx.Customers
    .Select(...)
    .InsertInto(
        ctx.Blogs,
        columnSelector: b => new { b.Name, b.Url },
        returningSelector: b => b.Id)
    .ToList();

// Insert or ignore:
ctx.Customers.Select(...).InsertIntoOrIgnore(
    ctx.Blogs2,
    insertColumnSelector: b => new { b.Name, b.Url },
    uniquenessColumnSelector: b => b.Name);

// Insert or update (UPSERT). The update action can access both the existing row and the excluded (which we attempted to insert):
ctx.Customers.Select(...).InsertIntoOrUpdate(
    ctx.Blogs,
    insertColumnSelector: b => new { b.Name, b.Url },
    uniquenessColumnSelector: b => b.Name,
    updateAction: (existing, excluded) => new Blog
    {
        Name = existing.Name + "_updated",
        Url = excluded.Url
    });

// The same UPSERT with generated column fetch:
ctx.Customers.Select(...).InsertIntoOrUpdate(
    ctx.Blogs,
    insertColumnSelector: b => new { b.Name, b.Url },
    uniquenessColumnSelector: b => b.Name,
    updateAction: (existing, excluded) => new Blog
    {
        Name = existing.Name + "_updated",
        Url = excluded.Url
    },
    returningSelector: b => b.Id);


// With sproc as input on SQL Server (generates `INSERT ... EXECUTE`):
ctx.SprocSet.InsertInto(ctx.DbSetMappedToSproc, b => new { ... });

// With common table expression (depends on #26486):
EF.Functions.With(...).InsertInto(ctx.Blogs, b => new { ... });

Additional notes

  • SQL Server supports INSERT ... EXECUTE for copying the results of a sproc.
  • All providers support WITH with INSERT (WITH ... AS x INSERT INTO Y SELECT * FROM x) - this is important for recursive WITH queries. PostgreSQL even allows capturing the results of an UPDATE ... RETURNING with WITH, and inserting them into a table.
  • Just like with bulk update/delete, INSERT supports RETURNING/OUTPUT which returns columns; this is useful to get back auto-generated columns which aren't updated by the query (e.g. the ID). So like for update/delete, we'd have an overload which accepts an expression to determine the columns to return, and returns an IQueryable with those columns.

Documentation

Community implementations

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions