Skip to content

Add Enumerable.*By operators (DistinctBy, ExceptBy, IntersectBy, UnionBy, MinBy, MaxBy) #27687

@GSPP

Description

@GSPP

I propose to add LINQ methods of the pattern *By. For example:

    IEnumerable<TSource> DistinctBy<TSource, TKey>(
         IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer = null)

This method would behave like Distinct except that equality is determined based on the key provided by keySelector. The key could be any value including anonymous types and value tuples.

A motivating case could be this:

	IEnumerable<Order> allOrders = GetTransactions();
	IEnumerable<int> completedOrderIDs = GetCompletedOrderIDs();
	IEnumerable<Order> remainingOrders =
          allOrders.ExceptBy(completedOrderIDs, order => order.ID, orderID => orderID);

Logic like that is reasonably common in business logic code. It is not easy to implement without ExceptBy. In particular, the following is undesirable because it leads to quadratic cost and repeated enumeration:

	IEnumerable<Order> remainingOrders = allOrders.Where(o => !completedOrderIDs.Contains(o.ID));

In the past I have had a need for *By methods many times so I have written them myself. A web search reveals great interest. I believe there is a strong case for adding methods like this.

There has been interest on this issue tracker as well:

Proposed API

@eiriktsarpalis has provided the following shape for API review. Please refer to my original proposal below for reference.

namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        
        // Missing min & max overloads accepting custom comparers added for completeness
        public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
    }
}

and equivalent Queryable APIs:

namespace System.Linq
{
    public static class Queryable
    {
        public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
        public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);

        public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
        public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);

        // Missing min & max overloads accepting custom comparers added for completeness
        public static TResult Min<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
        public static TResult Max<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
    }
}

EDIT @eiriktsarpalis: the key change in this amendment is that the ExceptBy and IntersectBy overloads do not allow heterogeneous element types for the second parameter. This makes it less of a join-like construct and more compatible with both the existing Except and Intersect methods as well as the proposed signature for UnionBy. Please follow the conversation after this comment for more details on the issue.

Open Questions

  • The ExceptBy and IntersectBy methods can be generalized by admitting heterogeneous element types in the second parameter. This enables applications like the one cited in the first example, at the cost of requiring a separate keySelector argument for the second collection. Note that this generalization is not admissible in the UnionBy case.
  • Should the *By methods accept custom comparers for the key types in addition to key selector lambdas? While there is certainly precedent for similar methods in LINQ, there is also an element of over-engineering here: the natural equality/ordering semantics of ad-hoc key projections are almost always sufficient. YAGNI.

Original API Proposal

Here is the API proposal. All of these methods come with an overload with and without comparer. I kept the existing naming conventions and argument ordering.

namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);

        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
        public static IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond);
        public static IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer);

        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
        public static IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond);
        public static IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer);

        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);

        public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue);
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);

        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue);

        public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue);

        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue);
    }
}

Further notes:

  1. DistinctBy: The order of the output elements should be documented to be the same order as the input. Any reasonable implementation that comes to mind does it this way. For compatibility reasons this could not ever be changed anyway after the first version ships. Distinct should be similarly documented if not already done. Distinct already behaves this way.
  2. For ExceptBy and IntersectBy the same is true for the first input. Only items from the first input are ever returned and their order can be kept. Except does the same thing today.
  3. For UnionBy I'm not sure about the order.
  4. For MinBy/MaxBy it should be documented that the first element in the sequence with the minimum/maximum key is returned.
  5. MinBy/MaxBy can take a defaultValue which is used in case the sequence is empty. Overloads without default value throw in that case.
  6. I added overloads to Min/Max to bring them on par with the functionality added by MinBy/MaxBy (comparer and defaultValue).
  7. Note, that the input element types for ExceptBy and IntersectBy can be different. This is because we only ever return items of the first sequence. We need two key selectors in that case. There's a simpler overload that has only one element type as well.

Metadata

Metadata

Assignees

Labels

api-approvedAPI was approved in API review, it can be implementedapi-suggestionEarly API idea and discussion, it is NOT ready for implementationarea-System.Linqhelp wanted[up-for-grabs] Good issue for external contributors

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions