Skip to content

Conversation

@2A5F
Copy link
Contributor

@2A5F 2A5F commented May 29, 2025

Modified ArraySortHelper and added ArraySortHelper<T, TComparer> to allow generic struct TComparer without memory allocation

Span.Sort<T, TComparer> will not box now

#39466
#39543

@jkotas
Copy link
Member

jkotas commented May 31, 2025

@EgorBot -intel

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    Comparison<int> c = (int x, int y) => x - y;

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => Array.Sort(a, c);
}

@jkotas jkotas added area-System.Memory tenet-performance Performance related issue and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 31, 2025
@jkotas
Copy link
Member

jkotas commented May 31, 2025

Could you please share perf numbers that demonstrate the performance improvements?

The one micro-benchmark that I have kicked off above shows ~10% regression: EgorBot/runtime-utils#370 (comment) .

@2A5F
Copy link
Contributor Author

2A5F commented May 31, 2025

No, I haven't done any performance testing.
Maybe passing TComparer by ref causes performance overhead.
Can this test bot output jit diffs?

@2A5F 2A5F marked this pull request as draft May 31, 2025 09:29
@adamsitnik
Copy link
Member

@2A5F You can use the benchmarks from the performance repo (we use them for ensuring there are no regressions):

https://github.com/dotnet/performance/blob/3499787dbbde3807402516462453e67097066d4a/src/benchmarks/micro/libraries/System.Collections/Sort.cs#L53-L54

You can both trace files and disassembler output:

https://github.com/dotnet/performance/tree/main/src/benchmarks/micro#quick-start

@2A5F 2A5F force-pushed the avoid-sort-allocations branch from c46122b to 1b22a69 Compare June 2, 2025 15:07
@2A5F
Copy link
Contributor Author

2A5F commented Jun 2, 2025

@adamsitnik Thanks for the notice
@jkotas After testing, I think it is impossible to unify the generic paths without regression, so for now I will only use generic in the Span.Sort<TComparer>.


Test Code

base on https://github.com/dotnet/performance/blob/3499787dbbde3807402516462453e67097066d4a/src/benchmarks/micro/libraries/System.Collections/Sort.cs

[Benchmark]
public void Span_ComparerStruct() => _arrays[_iterationIndex++].AsSpan(0, Size).Sort(new ComparableComparerStruct());

private readonly struct ComparableComparerStruct : IComparer<T>
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int Compare(T x, T y) => x.CompareTo(y);
}

All test on

BenchmarkDotNet v0.14.1-nightly.20250107.205, Windows 10 (10.0.20348.1906) (VMware)
AMD Ryzen 9 7950X 4.49GHz, 16 CPU, 32 logical and 32 physical cores
.NET SDK 10.0.100-preview.6.25281.103
  [Host]     : .NET 10.0.0 (10.0.25.28203), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-VBHUQD : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BXQXFN : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=5000  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1

Int32

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 3.562 μs 0.1579 μs 0.1819 μs 3.547 μs 3.312 μs 3.931 μs 0.30 0.02 - 1,126 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 11.943 μs 0.2768 μs 0.3188 μs 11.825 μs 11.523 μs 12.474 μs 1.00 0.04 - 1,910 B 88 B 1.00

BigStruct

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 4.677 μs 0.1230 μs 0.1417 μs 4.645 μs 4.362 μs 4.949 μs 0.34 0.01 - 1,352 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 13.659 μs 0.2965 μs 0.3173 μs 13.594 μs 13.215 μs 14.288 μs 1.00 0.03 - 2,426 B 88 B 1.00

IntClass

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 23.89 μs 0.541 μs 0.601 μs 23.92 μs 22.59 μs 24.86 μs 0.92 0.03 - 2,958 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 26.00 μs 0.519 μs 0.556 μs 26.01 μs 24.97 μs 26.85 μs 1.00 0.03 - 968 B 88 B 1.00

IntStruct

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 3.054 μs 0.1060 μs 0.1221 μs 3.006 μs 2.880 μs 3.293 μs 0.26 0.01 - 1,102 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 11.784 μs 0.2308 μs 0.2159 μs 11.787 μs 11.312 μs 12.174 μs 1.00 0.03 - 1,916 B 88 B 1.00

String

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 153.6 μs 1.99 μs 1.76 μs 153.6 μs 150.0 μs 156.8 μs 1.00 0.02 - 2,951 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 154.3 μs 2.19 μs 2.05 μs 153.6 μs 151.8 μs 158.0 μs 1.00 0.02 - 968 B 88 B 1.00

@2A5F
Copy link
Contributor Author

2A5F commented Jun 2, 2025

Ideas and todo:

  1. Consider get a pointer to interface method to simulate a value type delegate
  2. Delegate struct wrapper has 1% regression (this test result may be invalid), maybe this is acceptable, or try take out the pointer and rewrap it
  3. Null ICompear<T> path can replace to Compear<T>.Default to avoid interface virtual call

  1. Not feasible, C# does not have syntax to get instance method pointers. Even if use InlineIL.Fody to get it, the benchmark is not as good as the delegate.
  2. Since 1 is not feasible, then 2 is also not feasible.

@andrewjsaid
Copy link
Contributor

To avoid duplicate code, could you do something like this:

struct DelegateWrappedComparer<T> : IComparer<T>
{
    private Comparer<T> _comparer;

    public int Compare(T? x, T? y) => _comparer.Compare(x, y);
}

And point the "old" version of the method to the new one, with this wrapper?

Hope this is clear.

@2A5F
Copy link
Contributor Author

2A5F commented Jun 3, 2025

@andrewjsaid For Comparer<T>, I'm not sure if this makes sense. Perhaps a struct wrapper can convert an interface virtual call into an abstract class virtual call; however public apis uses only IComparer<T> not Comparer<T>. And Comparer<T>.Default have some jit magic (I don't know what it is, but the test results are like this) so dose not need struct wrapper.


Delegate is Comparison<T> and the test results have regression EgorBot/runtime-utils#370 (comment)


This test has expired/is invalid. I will retest the new code later when I have time.

(benchmarks are really slow)

@andrewjsaid
Copy link
Contributor

andrewjsaid commented Jun 3, 2025

My reasoning was that by using the struct wrapper, methods like the below would be specialized for that struct and the JIT would be able to inline the call to DelegateWrappedComparer<T>.CompareTo thus making the SwapIfGreater<TComparer>(...) and SwapIfGreater(...)` equivalent.

private static void SwapIfGreater<TComparer>(Span<T> keys, TComparer comparer, int i, int j)
  where TComparer : IComparer<T>, allows ref struct

However on further reflection I note that currently since SwapIfGreater is not generic, by specializing it per DelegateWrappedComparer<T> this would mean that instead of having a single re-usable method if T is a class (via __CANON) the JIT would have to create a new copy for each T which can create significant bloat. Not sure if worth the maintenance cost of duplicate methods.

@2A5F
Copy link
Contributor Author

2A5F commented Jun 3, 2025

However on further reflection I note that currently since SwapIfGreater is not generic, by specializing it per DelegateWrappedComparer this would mean that instead of having a single re-usable method if T is a class (via __CANON) the JIT would have to create a new copy for each T which can create significant bloat. Not sure if worth the maintenance cost of duplicate methods.

#39466 (comment)

@andrewjsaid
Copy link
Contributor

Oops already suggested 🤦
Too many issues / PRs with similar names I missed that one

Thanks for the link

@2A5F 2A5F requested a review from thaystg as a code owner September 25, 2025 04:26
@dotnet-policy-service dotnet-policy-service bot added the linkable-framework Issues associated with delivering a linker friendly framework label Sep 25, 2025
@2A5F 2A5F force-pushed the avoid-sort-allocations branch from 6c02cb3 to 25b1914 Compare September 25, 2025 04:32
@2A5F
Copy link
Contributor Author

2A5F commented Sep 25, 2025

I accidentally pushed to the wrong branch; the tags and review may need to be adjusted.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2025

Considering that, should we close this PR? or is there any suggestion to improve the regressed case?

The throughput should be fixable by introducing even more code duplication. (A separate question is whether the code duplication is worth it.)

@tarekgh
Copy link
Member

tarekgh commented Oct 1, 2025

I assume this is related?

https://github.com/dotnet/runtime/pull/116109/checks?check_run_id=51639673842

System.InvalidOperationException : Failed to compare two elements in the array.
---- System.NullReferenceException : Object reference not set to an instance of an object

CallStack
   at System.Collections.Generic.ArraySortHelperForTComparer`3[[System.IntPtr, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Int32, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Generic.IComparer`1[[System.IntPtr, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Sort(Span`1 keys, Span`1 values, IComparer`1 comparer) in /_/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs:line 1367
   at System.MemoryExtensions.Sort[IntPtr,Int32,IComparer`1](Span`1 keys, Span`1 items, IComparer`1 comparer) in /_/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs:line 4393
   at System.MemoryExtensions.Sort[IntPtr,Int32](Span`1 keys, Span`1 items) in /_/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs:line 4363
   at System.Numerics.Tensors.TensorShape..ctor(IntPtr linearLength, ReadOnlySpan`1 lengths, ReadOnlySpan`1 strides, TensorFlags flags) in /_/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorShape.cs:line 180
   at System.Numerics.Tensors.TensorShape.Create[Int32](Int32[] array, Int32 start, ReadOnlySpan`1 lengths, ReadOnlySpan`1 strides) in /_/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorShape.cs:line 893
   at System.Numerics.Tensors.ReadOnlyTensorSpan`1[[System.Int32, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(Int32[] array, Int32 start, ReadOnlySpan`1 lengths, ReadOnlySpan`1 strides) in /_/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/ReadOnlyTensorSpan_1.cs:line 102
   at System.Numerics.Tensors.Tests.ReadOnlyTensorSpanTests.ReadOnlyTensorSpanArrayConstructorTests() in /_/src/libraries/System.Numerics.Tensors/tests/ReadOnlyTensorSpanTests.cs:line 322
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args) in /_/src/mono/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.Mono.cs:line 22
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 57
----- Inner Stack Trace -----
   at System.Collections.Generic.ArraySortHelperForTComparer`3[[System.IntPtr, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Int32, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Generic.IComparer`1[[System.IntPtr, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].SwapIfGreaterWithValues(Span`1 keys, Span`1 values, IComparer`1 comparer, Int32 i, Int32 j) in /_/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs:line 1392
   at System.Collections.Generic.ArraySortHelperForTComparer`3[[System.IntPtr, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Int32, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Collections.Generic.IComparer`1[[System.IntPtr, System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=10.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].IntroSort(Span`1 keys, Span`1 values, Int32 depthLimit, IComparer`1 comparer) in /_/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs:line 1429
   at System.Collections.Generic.ArraySortHelperForTComparer`3[[System.IntPtr, System.Private.CoreLib, Version=10.0.0.0, Cul

@jkotas
Copy link
Member

jkotas commented Oct 1, 2025

Looks like a problem with #116109 (comment) . The special casing is needed to substitute null comparer with a default one.

@jkotas
Copy link
Member

jkotas commented Oct 1, 2025

@EgorBot -intel

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    int[] b = new int[10000];
    MyComparer c = new MyComparer();

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => Array.Sort(a, b, c);

    class MyComparer : IComparer<int>
    {
        public int Compare(int x, int y) => x-y;
    }   
}

@tarekgh
Copy link
Member

tarekgh commented Oct 2, 2025

Looks the benchmark numbers shared by @EgorBo not showing any regression. @jkotas what do you think?

@jkotas
Copy link
Member

jkotas commented Oct 3, 2025

@EgorBot -intel -arm

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    int[] b = new int[10000];
    MyComparer<string> c = new MyComparer<string>();

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => Array.Sort(a, b, c);

    struct MyComparer<T> : IComparer<int>
    {
        T state;
        long a,b,c,d,e,f;

        public int Compare(int x, int y) => x-y;
    }   
}

@jkotas
Copy link
Member

jkotas commented Oct 3, 2025

LGTM (with the caveats listed in #116109 (comment)).

@tarekgh tarekgh merged commit 1c955dd into dotnet:main Oct 3, 2025
144 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 3, 2025
@jkotas
Copy link
Member

jkotas commented Nov 20, 2025

@EgorBot -intel -arm -commit 1c955dd vs previous

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    int[] b = new int[10000];
    MyComparer<string> c = new MyComparer<string>();

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => MemoryExtensions.Sort(a, b, c);

    struct MyComparer<T> : IComparer<int>
    {
        T state;
        long a,b,c,d,e,f;

        public int Compare(int x, int y) => x-y;
    }   
}

@jkotas
Copy link
Member

jkotas commented Nov 20, 2025

@EgorBot -intel -arm -commit 1c955dd vs previous

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    int[] b = new int[10000];
    MyComparer c = new MyComparer();

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => MemoryExtensions.Sort(a, b, c);

    class MyComparer : IComparer<int>
    {
        public int Compare(int x, int y) => x-y;
    }   
}

@jkotas
Copy link
Member

jkotas commented Nov 28, 2025

Looks the EgorBot/runtime-utils#506 (comment) shared by @EgorBo not showing any regression. @jkotas what do you think?

@tarekgh The benchmark had mistake and it was not actually testing the code changed in this PR.

The updated results show up to ~3x regressions for comparers that are larger structs: EgorBot/runtime-utils#549 (comment) .

I think it is still fine to keep this change. Larger structs are frowned upon, so it should be ok to regress comparers that are larger structs and improve comparers that are small structs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Memory community-contribution Indicates that the PR has been added by a community member linkable-framework Issues associated with delivering a linker friendly framework tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants