Optimize ImmutableHashSet<T>.IsProperSubsetOf to avoid unnecessary allocations#127368
Optimize ImmutableHashSet<T>.IsProperSubsetOf to avoid unnecessary allocations#127368aw0lid wants to merge 1 commit intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-collections |
| using (var e = new ImmutableHashSet<T>.Enumerator(origin.Root)) | ||
| { | ||
| while (e.MoveNext()) | ||
| { | ||
| if (!otherAsImmutableHashSet.Contains(e.Current)) | ||
| { | ||
| return false; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
| using (var e = new ImmutableHashSet<T>.Enumerator(origin.Root)) | |
| { | |
| while (e.MoveNext()) | |
| { | |
| if (!otherAsImmutableHashSet.Contains(e.Current)) | |
| { | |
| return false; | |
| } | |
| } | |
| } | |
| foreach (T value in origin) | |
| { | |
| if (!otherAsImmutableHashSet.Contains(value)) | |
| { | |
| return false; | |
| } | |
| } | |
Any reason why all of the loops are manually deconstructed instead of using foreach?
There was a problem hiding this comment.
In the existing implementation, the code iterates over other and queries the ImmutableHashSet, which costs ImmutableHashSet and query the HashSet<T> instead. This shifts the lookup cost to
Also, origin (the MutationInput) does not implement IEnumerable<T>, so a direct foreach is not possible. I had to deconstruct it this way to access the underlying nodes while maintaining the performance gains.
That said, I am waiting for my other PR (#126309) to be merged so I can leverage its logic here and further refine this implementation like following this:
private static bool IsProperSubsetOf(IEnumerable<T> other, MutationInput origin)
{
Requires.NotNull(other, nameof(other));
if (origin.Root.IsEmpty)
{
return other.Any();
}
if (other is ICollection<T> otherAsICollectionGeneric)
{
// We check for < instead of != because other is not guaranteed to be a set, it could be a collection with duplicates.
if (otherAsICollectionGeneric.Count <= origin.Count)
{
return false;
}
switch (other)
{
case ImmutableHashSet<T> otherAsImmutableHashSet:
if (origin.EqualityComparer.Equals(otherAsImmutableHashSet.KeyComparer))
{
return SetEqualsWithImmutableHashset(otherAsImmutableHashSet, origin);
}
break;
case HashSet<T> otherAsHashset:
if (origin.EqualityComparer.Equals(otherAsHashset.Comparer))
{
return SetEqualsWithHashset(otherAsHashset, origin);
}
break;
}
}
else if (other is ICollection otherAsICollection && otherAsICollection.Count <= origin.Count)
{
return false;
}
var otherSet = new HashSet<T>(other, origin.EqualityComparer);
if (otherSet.Count <= origin.Count)
{
return false;
}
return SetEqualsWithHashset(otherSet, origin);
}| return IsProperSubsetOfFastPath(otherAsICollectionGeneric, this.Origin); | ||
| } | ||
|
|
||
| else if (other is ICollection otherAsICollection && otherAsICollection.Count <= this.Origin.Count) |
There was a problem hiding this comment.
Is it common for types to implement ICollection but not ICollection<T> if they're already implementing IEnumerable<T>?
There was a problem hiding this comment.
True, it’s not very common, but these types still exist and are in use. Since we can easily capture the Count property from them to trigger an early exit, I thought: why not include them? It broadens the optimization's reach with very little overhead
Part of #127279
Summary
ImmutableHashSet<T>.IsProperSubsetOfalways creates a new intermediateHashSet<T>for theothercollection, leading to avoidable allocations and GC pressure, especially for large datasetsOptimization Logic
O(1) Pre-Scan: Immediately returns false if other is an
ICollectionwith a smaller or equal Count. By performing this validation upfront, the need for tracking variables likematchesandextraFoundis eliminated, as any complete match is now mathematically guaranteed to be a proper subset.Fast-Path Pattern Matching: Detects
ImmutableHashSet<T>andHashSet<T>to bypass intermediate allocations.Comparer Guard: Validates
EqualityComparercompatibility before triggering fast paths to ensure logical consistency.Short-Circuit Validation: Re-validates$O(n)$ enumeration.
Countwithin specialized paths for an immediate exit beforeReverse-Lookup Strategy: An architectural shift where the ImmutableHashSet (The Source) iterates and queries the other collection if was Hashset. This leverages the O(1) lookup of the HashSet instead of the O(log N) lookup of the immutable tree.
Zero-Allocation Execution: Direct iteration over compatible collections, eliminating the costly
new HashSet<T>(other)fallback.Deferred fallback: Reserves the expensive allocation solely for general
IEnumerabletypes.Click to expand Benchmark Source Code
Click to expand Benchmark Results
Benchmark Results (Before Optimization)
Benchmark Results (After Optimization)
Performance Analysis Summary (100,000 Elements)