-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue
Description
Description
See the following benchmark, which compares the performance of instantiating and passing a ValueTuple<int, int>, ValueTuple<int, int, int>, and two structs which are functionally equivalent to ValueTuple.
public class Test
{
private int sum = 0;
[Benchmark]
public int ValueTuple2()
{
sum = 0;
for (int i = 0; i < 100; i++)
{
M(new ValueTuple<int, int>(i, i));
}
return sum;
}
[Benchmark]
public int ValueTuple3()
{
sum = 0;
for (int i = 0; i < 100; i++)
{
M(new ValueTuple<int, int, int>(i, i, i));
}
return sum;
}
[Benchmark]
public int ValueTupleIsh2()
{
sum = 0;
for (int i = 0; i < 100; i++)
{
M(new ValueTupleIsh<int, int>(i, i));
}
return sum;
}
[Benchmark]
public int ValueTupleIsh3()
{
sum = 0;
for (int i = 0; i < 100; i++)
{
M(new ValueTupleIsh<int, int, int>(i, i, i));
}
return sum;
}
public void M((int, int) s)
{
sum += s.Item1 + s.Item2;
}
public void M((int, int, int) s)
{
sum += s.Item1 + s.Item2;
}
public void M(ValueTupleIsh<int, int> s)
{
sum += s.Item1 + s.Item2;
}
public void M(ValueTupleIsh<int, int, int> s)
{
sum += s.Item1 + s.Item2;
}
}
public struct ValueTupleIsh<T1, T2>
{
public T1 Item1;
public T2 Item2;
public ValueTupleIsh(T1 item1, T2 item2)
{
Item1 = item1;
Item2 = item2;
}
}
public struct ValueTupleIsh<T1, T2, T3>
{
public T1 Item1;
public T2 Item2;
public T3 Item3;
public ValueTupleIsh(T1 item1, T2 item2, T3 item3)
{
Item1 = item1;
Item2 = item2;
Item3 = item3;
}
}Data
This gives the following:
BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1555/22H2/2022Update/SunValley2)
11th Gen Intel Core i7-11850H 2.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 7.0.202
[Host] : .NET 6.0.15 (6.0.1523.11507), X64 RyuJIT AVX2
DefaultJob : .NET 6.0.15 (6.0.1523.11507), X64 RyuJIT AVX2
| Method | Mean | Error | StdDev |
|---|---|---|---|
| ValueTuple2 | 113.4 ns | 1.45 ns | 1.28 ns |
| ValueTuple3 | 661.5 ns | 7.37 ns | 6.90 ns |
| ValueTupleIsh2 | 112.0 ns | 1.22 ns | 1.08 ns |
| ValueTupleIsh3 | 111.5 ns | 1.03 ns | 0.86 ns |
As you can see, instantiating/passing a ValueTuple<int, int, int> is significantly slower than a ValueTuple<int, int>. This change is not seen with an equivalent struct.
Analysis
From SharpLab, we see:
Test.ValueTuple2()
L0000: xor eax, eax
L0002: mov [rcx+8], eax
L0005: mov edx, eax
L0007: add edx, [rcx+8]
L000a: add edx, eax
L000c: mov [rcx+8], edx
L000f: inc eax
L0011: cmp eax, 0x64
L0014: jl short L0005
L0016: mov eax, [rcx+8]
L0019: ret
Test.ValueTuple3()
L0000: sub rsp, 0x28
L0004: vzeroupper
L0007: xor eax, eax
L0009: mov [rcx+8], eax
L000c: nop [rax]
L0010: vxorps xmm0, xmm0, xmm0
L0014: vmovupd [rsp+0x18], xmm0
L001a: mov [rsp+0x18], eax
L001e: mov [rsp+0x1c], eax
L0022: mov [rsp+0x20], eax
L0026: vmovupd xmm0, [rsp+0x18]
L002c: vmovupd [rsp+8], xmm0
L0032: mov edx, [rcx+8]
L0035: add edx, [rsp+8]
L0039: add edx, [rsp+0xc]
L003d: mov [rcx+8], edx
L0040: inc eax
L0042: cmp eax, 0x64
L0045: jl short L0010
L0047: mov eax, [rcx+8]
L004a: add rsp, 0x28
L004e: ret
Test.ValueTupleIsh2()
L0000: xor eax, eax
L0002: mov [rcx+8], eax
L0005: mov edx, eax
L0007: add edx, [rcx+8]
L000a: add edx, eax
L000c: mov [rcx+8], edx
L000f: inc eax
L0011: cmp eax, 0x64
L0014: jl short L0005
L0016: mov eax, [rcx+8]
L0019: ret
Test.ValueTupleIsh3()
L0000: xor eax, eax
L0002: mov [rcx+8], eax
L0005: mov edx, eax
L0007: add edx, [rcx+8]
L000a: add edx, eax
L000c: mov [rcx+8], edx
L000f: inc eax
L0011: cmp eax, 0x64
L0014: jl short L0005
L0016: mov eax, [rcx+8]
L0019: retI'm not sure why the runtime has significantly different codegen for the ValueTuple<int, int, int> case, but it doesn't seem to be helping!
Interestingly, the difference disappears if double is used instead of int as all generic type parameters.
Regression?
Unsure
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue