-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
I propose to add LINQ methods of the pattern *By. For example:
IEnumerable<TSource> DistinctBy<TSource, TKey>(
IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer = null)This method would behave like Distinct except that equality is determined based on the key provided by keySelector. The key could be any value including anonymous types and value tuples.
A motivating case could be this:
IEnumerable<Order> allOrders = GetTransactions();
IEnumerable<int> completedOrderIDs = GetCompletedOrderIDs();
IEnumerable<Order> remainingOrders =
allOrders.ExceptBy(completedOrderIDs, order => order.ID, orderID => orderID);Logic like that is reasonably common in business logic code. It is not easy to implement without ExceptBy. In particular, the following is undesirable because it leads to quadratic cost and repeated enumeration:
IEnumerable<Order> remainingOrders = allOrders.Where(o => !completedOrderIDs.Contains(o.ID));In the past I have had a need for *By methods many times so I have written them myself. A web search reveals great interest. I believe there is a strong case for adding methods like this.
There has been interest on this issue tracker as well:
- Feature Request: Overload for Distinct to receive a func. dotnet/runtime#27665
- Consider adding more LINQ operators from Ix.NET dotnet/runtime#19522
- Add commonly required Enumerable methods (DistinctBy, ExceptBy, AsChunked, ...) dotnet/runtime#14753
Proposed API
@eiriktsarpalis has provided the following shape for API review. Please refer to my original proposal below for reference.
namespace System.Linq
{
public static class Enumerable
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
// Missing min & max overloads accepting custom comparers added for completeness
public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
}
}and equivalent Queryable APIs:
namespace System.Linq
{
public static class Queryable
{
public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
// Missing min & max overloads accepting custom comparers added for completeness
public static TResult Min<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
public static TResult Max<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
}
}EDIT @eiriktsarpalis: the key change in this amendment is that the ExceptBy and IntersectBy overloads do not allow heterogeneous element types for the second parameter. This makes it less of a join-like construct and more compatible with both the existing Except and Intersect methods as well as the proposed signature for UnionBy. Please follow the conversation after this comment for more details on the issue.
Open Questions
- The
ExceptByandIntersectBymethods can be generalized by admitting heterogeneous element types in thesecondparameter. This enables applications like the one cited in the first example, at the cost of requiring a separatekeySelectorargument for the second collection. Note that this generalization is not admissible in theUnionBycase. - Should the
*Bymethods accept custom comparers for the key types in addition to key selector lambdas? While there is certainly precedent for similar methods in LINQ, there is also an element of over-engineering here: the natural equality/ordering semantics of ad-hoc key projections are almost always sufficient. YAGNI.
Original API Proposal
Here is the API proposal. All of these methods come with an overload with and without comparer. I kept the existing naming conventions and argument ordering.
namespace System.Linq
{
public static class Enumerable
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
public static IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond);
public static IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
public static IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond);
public static IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer);
public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);
public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue);
public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue);
}
}Further notes:
DistinctBy: The order of the output elements should be documented to be the same order as the input. Any reasonable implementation that comes to mind does it this way. For compatibility reasons this could not ever be changed anyway after the first version ships.Distinctshould be similarly documented if not already done.Distinctalready behaves this way.- For
ExceptByandIntersectBythe same is true for the first input. Only items from the first input are ever returned and their order can be kept.Exceptdoes the same thing today. - For
UnionByI'm not sure about the order. - For
MinBy/MaxByit should be documented that the first element in the sequence with the minimum/maximum key is returned. MinBy/MaxBycan take adefaultValuewhich is used in case the sequence is empty. Overloads without default value throw in that case.- I added overloads to
Min/Maxto bring them on par with the functionality added byMinBy/MaxBy(compareranddefaultValue). - Note, that the input element types for
ExceptByandIntersectBycan be different. This is because we only ever return items of the first sequence. We need two key selectors in that case. There's a simpler overload that has only one element type as well.