Optimize ImmutableHashSet<T>.SetEquals to avoid unnecessary allocations#126309
Optimize ImmutableHashSet<T>.SetEquals to avoid unnecessary allocations#126309aw0lid wants to merge 1 commit intodotnet:mainfrom
Conversation
1eb3cee to
8c80f9d
Compare
8c80f9d to
6709a9e
Compare
6709a9e to
9910d86
Compare
9910d86 to
ff6af74
Compare
ff6af74 to
5f2749e
Compare
3c685c8 to
45c2c14
Compare
45c2c14 to
6a3ebf6
Compare
13cc045 to
1ab929a
Compare
|
Gentle ping in case this fell through the cracks |
1ab929a to
6a2294d
Compare
6a2294d to
620017b
Compare
|
@dotnet/area-system-collections for secondary review |
620017b to
9285ee9
Compare
9285ee9 to
85da68b
Compare
|
Gentle ping in case this fell through the cracks. |
|
|
||
| if (other is HashSet<T> otherAsHashSet) | ||
| { | ||
| if (otherAsHashSet.Comparer == origin.EqualityComparer) |
There was a problem hiding this comment.
HashSet<T> likewise has optimizations that kick in if the same equality comparer is used. There, it calls the Equals(object?) method of the equality comparer, so that it can detect that the comparers will give the same results even if they aren't exactly the same instance:
There was a problem hiding this comment.
I have actually experimented with using origin.EqualityComparer.Equals(otherAsHashSet.Comparer) instead of direct reference equality == to cover cases of different comparer instances, but the results showed clear performance regressions in the Fast Paths.
| Method | Time | Regression | Slowdown |
|---|---|---|---|
| BCL HashSet (Smaller Count) | 2.859 ns | 7.764 ns | +171.5% |
| HashSet (Diff Comparer - Small) | 2.917 ns | 7.740 ns | +165.3% |
| ImmutableHashSet (Larger Count) | 3.740 ns | 4.914 ns | +31.4% |
| ImmutableHashSet (Smaller Count) | 3.983 ns | 5.001 ns | +25.5% |
In my personal opinion, the Best Practice is to rely on EqualityComparer<T>.Default (which is a Singleton) or to unify comparer references when dealing with large datasets.
Therefore, I believe we should not sacrifice raw performance in these paths (a delay reaching 170%) just to cover cases resulting from the user not following optimal performance practices. Especially since Correctness is still fully guaranteed via the Fallback Path, but with a time penalty paid only by those who do not adhere to the Best Practice.
eiriktsarpalis
left a comment
There was a problem hiding this comment.
This change is adding a whole lot of runtime type checks. Is there tangible evidence (e.g. in the form of microbenchmarks) showing improvement here (both when other is a set but more importantly when it is not)?.
As the benchmark results indicate, there is no performance regression even in the fallback paths. This demonstrates that the added runtime type checks do not impact performance, while providing massive gains in the optimized paths |
13d12a2 to
edeeb71
Compare
edeeb71 to
d769373
Compare
What do the numbers show when comparing small (0-10 elements) or collections that are not equal? |
Fixes #90986, Part of #127279
Summary
ImmutableHashSet<T>.SetEqualsalways creates a new intermediateHashSet<T>for theothercollection, leading to avoidable allocations and GC pressure, especially for large datasetsOptimization Logic
falseifotheris anICollectionwith a smallerCount, avoiding any overhead.ImmutableHashSet<T>andHashSet<T>to bypass intermediate allocations.EqualityComparercompatibility before triggering fast paths to ensure logical consistency.Countwithin specialized paths for an immediate exit beforenew HashSet<T>(other)fallback.IEnumerabletypes.Click to expand Benchmark Source Code
Click to expand Benchmark Results
Benchmark Results (Before Optimization)
Benchmark Results (After Optimization)
Performance Analysis Summary (100,000 Elements)