Skip to content

Query: Include for both navigations of One-To-Many relationship should use split query #12775

@smitpatel

Description

@smitpatel

Currently when we include a collection navigation, we send 2 queries to the server.
e.g.
db.Blogs.Include(b => b.Posts).ToList();
Generates

      SELECT [b].[Id]
      FROM [Blogs] AS [b]
      ORDER BY [b].[Id]

      SELECT [b.Posts].[Id], [b.Posts].[BlogId]
      FROM [Post] AS [b.Posts]
      INNER JOIN (
          SELECT [b0].[Id]
          FROM [Blogs] AS [b0]
      ) AS [t] ON [b.Posts].[BlogId] = [t].[Id]
      ORDER BY [t].[Id]

We do above because of the cardinality. i.e. When each blog has 10 related post, doing single query would cause each blog to be repeated 10 times in the join causing unnecessary data fetch from the server. It also causes "dreaded cartesian product" when you are doing multiple includes since each navigation would add additional join and more duplicated data. So we chose this design in EF Core.

But when you do include on other way round on same relationship.
e.g.
db.Set<Post>().Include(p => p.Blog).ToList();
Generates

  SELECT [p].[Id], [p].[BlogId], [p.Blog].[Id]
  FROM [Post] AS [p]
  LEFT JOIN [Blogs] AS [p.Blog] ON [p].[BlogId] = [p.Blog].[Id]

Here we are duplicating the Blogs for each post which has same parent. While we do not materialize each blog due to identity resolution, we still fetch the data from server and create value buffer.
We could actually use split query like first one and get related Blog (parent) in separate query without causing duplication.

Same is applicable for correlated subquery.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions