Skip to content

Instantiating ValueTuple<int, int, int> significantly more expensive than ValueTuple<int, int> or equivalent struct #89170

@canton7

Description

@canton7

Description

See the following benchmark, which compares the performance of instantiating and passing a ValueTuple<int, int>, ValueTuple<int, int, int>, and two structs which are functionally equivalent to ValueTuple.

public class Test
{
    private int sum = 0;

    [Benchmark]
    public int ValueTuple2()
    {
        sum = 0;
        for (int i = 0; i < 100; i++)
        {
            M(new ValueTuple<int, int>(i, i));
        }
        return sum;
    }

    [Benchmark]
    public int ValueTuple3()
    {
        sum = 0;
        for (int i = 0; i < 100; i++)
        {
            M(new ValueTuple<int, int, int>(i, i, i));
        }
        return sum;
    }

    [Benchmark]
    public int ValueTupleIsh2()
    {
        sum = 0;
        for (int i = 0; i < 100; i++)
        {
            M(new ValueTupleIsh<int, int>(i, i));
        }
        return sum;
    }

    [Benchmark]
    public int ValueTupleIsh3()
    {
        sum = 0;
        for (int i = 0; i < 100; i++)
        {
            M(new ValueTupleIsh<int, int, int>(i, i, i));
        }
        return sum;
    }

    public void M((int, int) s)
    {
        sum += s.Item1 + s.Item2;
    }

    public void M((int, int, int) s)
    {
        sum += s.Item1 + s.Item2;
    }

    public void M(ValueTupleIsh<int, int> s)
    {
        sum += s.Item1 + s.Item2;
    }

    public void M(ValueTupleIsh<int, int, int> s)
    {
        sum += s.Item1 + s.Item2;
    }
}

public struct ValueTupleIsh<T1, T2>
{
    public T1 Item1;
    public T2 Item2;

    public ValueTupleIsh(T1 item1, T2 item2)
    {
        Item1 = item1;
        Item2 = item2;
    }
}

public struct ValueTupleIsh<T1, T2, T3>
{
    public T1 Item1;
    public T2 Item2;
    public T3 Item3;

    public ValueTupleIsh(T1 item1, T2 item2, T3 item3)
    {
        Item1 = item1;
        Item2 = item2;
        Item3 = item3;
    }
}

Data

This gives the following:

BenchmarkDotNet v0.13.6, Windows 11 (10.0.22621.1555/22H2/2022Update/SunValley2)
11th Gen Intel Core i7-11850H 2.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 7.0.202
  [Host]     : .NET 6.0.15 (6.0.1523.11507), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.15 (6.0.1523.11507), X64 RyuJIT AVX2
Method Mean Error StdDev
ValueTuple2 113.4 ns 1.45 ns 1.28 ns
ValueTuple3 661.5 ns 7.37 ns 6.90 ns
ValueTupleIsh2 112.0 ns 1.22 ns 1.08 ns
ValueTupleIsh3 111.5 ns 1.03 ns 0.86 ns

As you can see, instantiating/passing a ValueTuple<int, int, int> is significantly slower than a ValueTuple<int, int>. This change is not seen with an equivalent struct.

Analysis

From SharpLab, we see:

Test.ValueTuple2()
    L0000: xor eax, eax
    L0002: mov [rcx+8], eax
    L0005: mov edx, eax
    L0007: add edx, [rcx+8]
    L000a: add edx, eax
    L000c: mov [rcx+8], edx
    L000f: inc eax
    L0011: cmp eax, 0x64
    L0014: jl short L0005
    L0016: mov eax, [rcx+8]
    L0019: ret

Test.ValueTuple3()
    L0000: sub rsp, 0x28
    L0004: vzeroupper
    L0007: xor eax, eax
    L0009: mov [rcx+8], eax
    L000c: nop [rax]
    L0010: vxorps xmm0, xmm0, xmm0
    L0014: vmovupd [rsp+0x18], xmm0
    L001a: mov [rsp+0x18], eax
    L001e: mov [rsp+0x1c], eax
    L0022: mov [rsp+0x20], eax
    L0026: vmovupd xmm0, [rsp+0x18]
    L002c: vmovupd [rsp+8], xmm0
    L0032: mov edx, [rcx+8]
    L0035: add edx, [rsp+8]
    L0039: add edx, [rsp+0xc]
    L003d: mov [rcx+8], edx
    L0040: inc eax
    L0042: cmp eax, 0x64
    L0045: jl short L0010
    L0047: mov eax, [rcx+8]
    L004a: add rsp, 0x28
    L004e: ret

Test.ValueTupleIsh2()
    L0000: xor eax, eax
    L0002: mov [rcx+8], eax
    L0005: mov edx, eax
    L0007: add edx, [rcx+8]
    L000a: add edx, eax
    L000c: mov [rcx+8], edx
    L000f: inc eax
    L0011: cmp eax, 0x64
    L0014: jl short L0005
    L0016: mov eax, [rcx+8]
    L0019: ret

Test.ValueTupleIsh3()
    L0000: xor eax, eax
    L0002: mov [rcx+8], eax
    L0005: mov edx, eax
    L0007: add edx, [rcx+8]
    L000a: add edx, eax
    L000c: mov [rcx+8], edx
    L000f: inc eax
    L0011: cmp eax, 0x64
    L0014: jl short L0005
    L0016: mov eax, [rcx+8]
    L0019: ret

I'm not sure why the runtime has significantly different codegen for the ValueTuple<int, int, int> case, but it doesn't seem to be helping!

Interestingly, the difference disappears if double is used instead of int as all generic type parameters.

Regression?

Unsure

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions