Skip to content

Conversation

@dsyme
Copy link
Contributor

@dsyme dsyme commented Jun 1, 2021

This micro-perf analysis by Bartosz Adamczewski (@badamczewski01) revealed that we are emitting backward branches for match targets reachable via multiple paths in the decision tree, e.g.

match x with 
| Pattern1 
| Pattern2-> <target-code>
| ...

or this also applies to some instances of boolean logic as shown here: https://twitter.com/badamczewski01/status/1399460065254547458

if (x = 1 || x = 2) then <target-code> else ...

In the first example we emit <target-code> when first encountered (Pattern1), then the successful emit for Pattern2 was branching backwards to that target code. The backward branch evidently has perf cost on typical microprocessors branch prediction and instruction decoding. The second example is similar

This PR changes to only emit the target code at the last pattern that matches, not the first.

NOTE: I had to reduce the size limit for one of the "large record" tests from 1000 to 970 for .NET Core due to some change in stack usage in the compiler. I don't think it indicates anything wrong - though I don't know the specific reason why stack usage increased - in general all the EmittedIL tests show improvement in generated code, removing unnecessary branches etc. One of the other neighbouring tests was already disabled, and I re-enabled that at size 970.


I did a quick perf test using adhoc timing techniques following the code in the tweet:

let perf s n f = 
    let t = System.Diagnostics.Stopwatch()
    t.Start()
    for i in 0 .. 1000 do
     for h in 0 .. 1000000 do
       f n |> ignore
    t.Stop()
    printfn "PERF: %s : %d" s t.ElapsedMilliseconds

let condition x =
    if x = 1 || x = 2 then 1 elif x = 3 || x = 4 then 2 else 0

perf "warmup" 0 condition
perf "n = 0" 0 condition
perf "n = 1" 1 condition
perf "n = 2" 2 condition
perf "n = 3" 3 condition
perf "n = 4" 4 condition

main branch compiler:

PERF: n = 1 : 2147
PERF: n = 2 : 3218 // slow
PERF: n = 3 : 2686
PERF: n = 4 : 3794 // slow
TOTAL: 11845

this PR:

PERF: n = 1 : 2740 // slower, but more uniform between n=1 and n=2
PERF: n = 2 : 2261 // much improved, and more uniform between n=1 and n=2
PERF: n = 3 : 3309
PERF: n = 4 : 2745 // improved , and more uniform between n=3 and n=4
TOTAL: 11055

It is true the n=1 and n=3 cases have degraded in this quick test, though the results now match what C# does (the generated code is now the same), and overall have become more uniform and the larger variance due to the backward branches avoided. I think it's the right change to make given we're now matching C#

Note that the slowdowns reported by Bartosz in the tweet were more dramatic than those shown here. That's because my home machine is using an old Xeon processor I think (yes I need a new one)

  • Update baselines

  • Verify the expected perf improvement in this branch.

BEFORE:

.method public static int32  condition(int32 x) cil managed
{
  // Code size       26 (0x1a)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldc.i4.1
  IL_0002:  bne.un.s   IL_0006
  IL_0004:  ldc.i4.1
  IL_0005:  ret
  IL_0006:  ldarg.0
  IL_0007:  ldc.i4.2
  IL_0008:  bne.un.s   IL_000c
  IL_000a:  br.s       IL_0004        // backward branch
  IL_000c:  ldarg.0
  IL_000d:  ldc.i4.3
  IL_000e:  bne.un.s   IL_0012
  IL_0010:  ldc.i4.2
  IL_0011:  ret
  IL_0012:  ldarg.0
  IL_0013:  ldc.i4.4
  IL_0014:  bne.un.s   IL_0018
  IL_0016:  br.s       IL_0010
  IL_0018:  ldc.i4.0
  IL_0019:  ret
} // end of method A::condition

AFTER (updated!)

  IL_0000:  ldarg.0
  IL_0001:  ldc.i4.1
  IL_0002:  beq.s      IL_0008
  IL_0004:  ldarg.0
  IL_0005:  ldc.i4.2
  IL_0006:  bne.un.s   IL_000a
  IL_0008:  ldc.i4.1
  IL_0009:  ret
  IL_000a:  ldarg.0
  IL_000b:  ldc.i4.3
  IL_000c:  beq.s      IL_0012
  IL_000e:  ldarg.0
  IL_000f:  ldc.i4.4
  IL_0010:  bne.un.s   IL_0014
  IL_0012:  ldc.i4.2
  IL_0013:  ret
  IL_0014:  ldc.i4.0
  IL_0015:  ret

@dsyme dsyme changed the title Prefer forward branches for decision tree targets reachable via via multiple ways Prefer forward branches to decision tree targets Jun 1, 2021
@dsyme
Copy link
Contributor Author

dsyme commented Jun 1, 2021

Note that the slowdowns reported by Bartosz in the tweet were more dramatic than those shown here.

I'm going to work on this a little more. The codegen in the second case is still not perfect, see the corresponding C# codegen here:

https://sharplab.io/#v2:C4LglgNgPgAgTARgLACgYGYAE9MGFMDeqmJ2WYAdsJgLIAUl1AHgJSHGmlgBmmdTmALyDMCTFCiYBwzHDYdOimAHZRAbgWcAphADOWzJsUkefaSKwSpQkQBZ5KY0tVwNj4zv2H3Tkiszobk6aAL6oIUA

        IL_0000: ldarg.1
        IL_0001: ldc.i4.1
        IL_0002: beq.s IL_0008

        IL_0004: ldarg.1
        IL_0005: ldc.i4.2
        IL_0006: bne.un.s IL_000a

        IL_0008: ldc.i4.1
        IL_0009: ret

        IL_000a: ldarg.1
        IL_000b: ldc.i4.3
        IL_000c: beq.s IL_0012

        IL_000e: ldarg.1
        IL_000f: ldc.i4.4
        IL_0010: bne.un.s IL_0014

        IL_0012: ldc.i4.2
        IL_0013: ret

        IL_0014: ldc.i4.3
        IL_0015: ret

@dsyme
Copy link
Contributor Author

dsyme commented Jun 1, 2021

I did some further work and the code for if (x = 1 || x = 2) then ... is now identical to C#.

As expected a whole bunch of baselines need updating, making a list of the first round of them here for reference

2021-06-01T18:18:36.0805760Z CodeGen\EmittedIL\GeneratedIterators (GenIter01.fs -) -- failed
2021-06-01T18:18:36.0815887Z CodeGen\EmittedIL\GeneratedIterators (GenIter02.fs -) -- failed
2021-06-01T18:18:36.0818775Z CodeGen\EmittedIL\GeneratedIterators (GenIter03.fs -) -- failed
2021-06-01T18:18:36.0823673Z CodeGen\EmittedIL\GeneratedIterators (GenIter04.fs -) -- failed
2021-06-01T18:18:36.1054804Z CodeGen\EmittedIL\InequalityComparison (if (x > y) then ... else ...) -- failed
2021-06-01T18:18:41.6798713Z CodeGen\EmittedIL\ListExpressionStepping (ListExpressionSteppingTest1.fs -) -- failed
2021-06-01T18:18:41.6800379Z CodeGen\EmittedIL\ListExpressionStepping (ListExpressionSteppingTest2.fs -) -- failed
2021-06-01T18:18:41.6801314Z CodeGen\EmittedIL\ListExpressionStepping (ListExpressionSteppingTest3.fs -) -- failed
2021-06-01T18:18:41.6802123Z CodeGen\EmittedIL\ListExpressionStepping (ListExpressionSteppingTest4.fs -) -- failed
2021-06-01T18:18:41.6802935Z CodeGen\EmittedIL\ListExpressionStepping (ListExpressionSteppingTest5.fs -) -- failed
2021-06-01T18:18:41.6803774Z CodeGen\EmittedIL\ListExpressionStepping (ListExpressionSteppingTest6.fs -) -- failed
2021-06-01T18:18:29.3494705Z CodeGen\EmittedIL\CCtorDUWithMember (CCtorDUWithMember01.fs -) -- failed
2021-06-01T18:18:56.9232810Z CodeGen\EmittedIL\Misc (AnonRecd.fs) -- failed
2021-06-01T18:18:56.9247055Z CodeGen\EmittedIL\Misc (CodeGenRenamings01.fs -) -- failed
2021-06-01T18:18:56.9343146Z CodeGen\EmittedIL\Misc (EntryPoint01.fs) -- failed
2021-06-01T18:18:56.9359901Z CodeGen\EmittedIL\Misc (EqualsOnUnions01.fs -) -- failed
2021-06-01T18:18:56.9532526Z CodeGen\EmittedIL\Misc (IfThenElse01.fs) -- failed
2021-06-01T18:18:56.9560892Z CodeGen\EmittedIL\Misc (LetIfThenElse01.fs -) -- failed
2021-06-01T18:18:56.9580032Z CodeGen\EmittedIL\Misc (Lock01.fs -) -- failed
2021-06-01T18:18:56.9654414Z CodeGen\EmittedIL\Misc (Seq_for_all01.fs) -- failed
2021-06-01T18:18:56.9673569Z CodeGen\EmittedIL\Misc (StructsAsArrayElements01.fs -) -- failed
2021-06-01T18:18:56.9728429Z CodeGen\EmittedIL\Misc (TryWith_NoFilterBlocks01.fs) -- failed
2021-06-01T18:18:56.9783557Z CodeGen\EmittedIL\Misc (Structs01.fs -) -- failed
2021-06-01T18:18:56.9817744Z CodeGen\EmittedIL\Misc (Structs02.fs -) -- failed
2021-06-01T18:18:57.0006555Z CodeGen\EmittedIL\Misc (GeneralizationOnUnions01.fs) -- failed
2021-06-01T18:18:57.1932730Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101Aggregates01.fs - CodeGen) -- failed
2021-06-01T18:18:57.1987955Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101ElementOperators01.fs - CodeGen) -- failed
2021-06-01T18:18:57.2080815Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101Joins01.fs - CodeGen) -- failed
2021-06-01T18:18:57.2107180Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101Ordering01.fs - CodeGen) -- failed
2021-06-01T18:18:57.2136342Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101Partitioning01.fs - CodeGen) -- failed
2021-06-01T18:18:57.2192864Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101Quantifiers01.fs - CodeGen) -- failed
2021-06-01T18:18:57.2229203Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101Select01.fs - CodeGen) -- failed
2021-06-01T18:18:57.2376926Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101SetOperators01.fs - CodeGen) -- failed
2021-06-01T18:18:57.2529621Z CodeGen\EmittedIL\QueryExpressionStepping (Linq101Where01.fs - CodeGen) -- failed
2021-06-01T18:18:57.3027540Z CodeGen\EmittedIL\SeqExpressionStepping (SeqExpressionSteppingTest1.fs -) -- failed
2021-06-01T18:18:57.3043637Z CodeGen\EmittedIL\SeqExpressionStepping (SeqExpressionSteppingTest2.fs -) -- failed
2021-06-01T18:18:57.3059261Z CodeGen\EmittedIL\SeqExpressionStepping (SeqExpressionSteppingTest3.fs -) -- failed
2021-06-01T18:18:57.3091176Z CodeGen\EmittedIL\SeqExpressionStepping (SeqExpressionSteppingTest4.fs -) -- failed
2021-06-01T18:18:57.3115392Z CodeGen\EmittedIL\SeqExpressionStepping (SeqExpressionSteppingTest5.fs -) -- failed
2021-06-01T18:18:57.3161785Z CodeGen\EmittedIL\SeqExpressionStepping (SeqExpressionSteppingTest6.fs -) -- failed
2021-06-01T18:18:57.3189831Z CodeGen\EmittedIL\SeqExpressionStepping (SeqExpressionSteppingTest7.fs -) -- failed
2021-06-01T18:18:57.3718141Z CodeGen\EmittedIL\SeqExpressionTailCalls (SeqExpressionTailCalls01.fs -) -- failed
2021-06-01T18:18:57.3758408Z CodeGen\EmittedIL\SeqExpressionTailCalls (SeqExpressionTailCalls02.fs -) -- failed
2021-06-01T18:18:57.4704783Z CodeGen\EmittedIL\StaticInit (StaticInit_Struct01.fs -) -- failed
2021-06-01T18:18:57.4725286Z CodeGen\EmittedIL\StaticInit (StaticInit_Class01.fs -) -- failed
2021-06-01T18:18:57.5129646Z CodeGen\EmittedIL\SteppingMatch (SteppingMatch03.fs) -- failed
2021-06-01T18:18:57.5393494Z CodeGen\EmittedIL\SteppingMatch (SteppingMatch04.fs) -- failed
2021-06-01T18:18:57.5500292Z CodeGen\EmittedIL\SteppingMatch (SteppingMatch05.fs) -- failed
2021-06-01T18:18:57.5690359Z CodeGen\EmittedIL\SteppingMatch (SteppingMatch06.fs -) -- failed
2021-06-01T18:18:57.5713454Z CodeGen\EmittedIL\SteppingMatch (SteppingMatch07.fs -) -- failed
2021-06-01T18:18:57.5810766Z CodeGen\EmittedIL\SteppingMatch (SteppingMatch09.fs) -- failed
2021-06-01T18:19:18.3021365Z CodeGen\EmittedIL\TestFunctions (TestFunction16.fs -) -- failed
2021-06-01T18:19:18.3069715Z CodeGen\EmittedIL\TestFunctions (TestFunction17.fs -) -- failed
2021-06-01T18:19:18.3106746Z CodeGen\EmittedIL\TestFunctions (TestFunction21.fs -) -- failed
2021-06-01T18:19:18.3190270Z CodeGen\EmittedIL\TestFunctions (TestFunction3b.fs -) -- failed
2021-06-01T18:19:18.3228101Z CodeGen\EmittedIL\TestFunctions (TestFunction3c.fs -) -- failed
2021-06-01T18:19:18.3291150Z CodeGen\EmittedIL\TestFunctions (TestFunction8.fs) -- failed
2021-06-01T18:19:18.3355678Z CodeGen\EmittedIL\TestFunctions (TestFunction9.fs) -- failed
2021-06-01T18:19:18.3374826Z CodeGen\EmittedIL\TestFunctions (TestFunction9b.fs) -- failed
2021-06-01T18:19:18.3378883Z CodeGen\EmittedIL\TestFunctions (TestFunction9b1.fs) -- failed
2021-06-01T18:19:18.3472270Z CodeGen\EmittedIL\TestFunctions (TestFunction9b2.fs) -- failed
2021-06-01T18:19:18.3480985Z CodeGen\EmittedIL\TestFunctions (TestFunction9b3.fs) -- failed
2021-06-01T18:19:18.3613618Z CodeGen\EmittedIL\TestFunctions (TestFunction22f.fs) -- failed
2021-06-01T18:19:18.3615767Z CodeGen\EmittedIL\TestFunctions (TestFunction22g.fs) -- failed
2021-06-01T18:19:18.3627411Z CodeGen\EmittedIL\TestFunctions (TestFunction24.fs -) -- failed

@badamczewski
Copy link

This is a test for singular calls, some of the results that are under a nanosecond should be discarded (1 | 3) since a method call can cost up to 4 cycles. For these cases, I will construct a better test; Since JIT prefers early exits on conditions we can create a large condition chain and force it to execute a set amount of instructions before we start measuring performance.

    [DisassemblyDiagnoser]
    [HardwareCounters(
        HardwareCounter.BranchMispredictions,
        HardwareCounter.BranchInstructions)]
    public class Bench
    {
        Random rnd = new Random();

        [Benchmark]
        [Arguments(1)]
        [Arguments(2)]
        [Arguments(3)]
        [Arguments(4)]
        [Arguments(5)]
        [Arguments(6)]
        [Arguments(7)]
        public int CSharp(int x)
        {
            return Cond(x);
        }

        [Benchmark]
        [Arguments(1)]
        [Arguments(2)]
        [Arguments(3)]
        [Arguments(4)]
        [Arguments(5)]
        [Arguments(6)]
        [Arguments(7)]
        public int FSharp(int x)
        {
            return FSharpCond.Bench.condition_1(x);
        }

        //
        // FSharp will not inline the code so we shouldn't eiter.
        //
        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int Cond(int x)
        {
            if      (x == 1 || x == 2) return 1;
            else if (x == 3 || x == 4) return 2;
            else if (x == 5 || x == 6) return 3;
            else return 4;
        }
    }
namespace FSharpCond

module Bench =
    let condition_1 x =
        if  (x = 1 || x = 2) then 1
        elif(x = 3 || x = 4) then 2
        elif(x = 5 || x = 6) then 3
        else 4
BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.985 (20H2/October2020Update)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.100
  [Host]     : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT
  DefaultJob : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT

Method x Mean Error StdDev Median BranchInstructions/Op BranchMispredictions/Op Code Size
CSharp 1 1.2706 ns 0.0736 ns 0.1706 ns 1.2856 ns 3 -0 45 B
FSharp 1 0.9039 ns 0.0621 ns 0.1603 ns 0.8465 ns 3 0 45 B
CSharp 2 1.0056 ns 0.0624 ns 0.0553 ns 1.0035 ns 4 0 45 B
FSharp 2 1.9008 ns 0.0365 ns 0.0285 ns 1.9044 ns 4 0 45 B
CSharp 3 1.7767 ns 0.0602 ns 0.0533 ns 1.7752 ns 5 0 45 B
FSharp 3 1.8348 ns 0.0802 ns 0.1922 ns 1.8334 ns 5 0 45 B
CSharp 4 1.7199 ns 0.1252 ns 0.3673 ns 1.6882 ns 6 -0 45 B
FSharp 4 2.5129 ns 0.0458 ns 0.0406 ns 2.4982 ns 6 0 45 B
CSharp 5 1.6120 ns 0.0804 ns 0.1298 ns 1.5964 ns 6 -0 45 B
FSharp 5 2.0366 ns 0.0832 ns 0.2176 ns 2.0407 ns 6 0 45 B

@badamczewski
Copy link

This is a random test where the branch predictor will have a hard time guessing what the correct branch should be; since backward branches are usually reserved for loops on my CPU, it will get more backward branches wrong.

   [DisassemblyDiagnoser]
   [HardwareCounters(
        HardwareCounter.BranchMispredictions,
        HardwareCounter.BranchInstructions)]
   public class Bench3_Random
   {
       Random rnd = new Random(12345678);
       const int size = 1024 * 1024;
       private int[] s = new int[size];

       [GlobalSetup]
       public void Setup()
       {
           for(int i = 0; i < size; i++)
           {
               s[i] = rnd.Next(0, 8);
           }
       }

       [Benchmark]
       public void CSharp()
       {
           var _s = s;
           for (int i = 0; i < _s.Length; i++)
               Cond(_s[i]);
       }

       [Benchmark]
       public void FSharp()
       {
           var _s = s;
           for (int i = 0; i < _s.Length; i++)
               FSharpCond.Bench.condition_1(_s[i]);
       }

       //
       // FSharp will not inline the code so we shouldn't either.
       //
       [MethodImpl(MethodImplOptions.NoInlining)]
       public static int Cond(int x)
       {
           if (x == 1 || x == 2) return 1;
           else if (x == 3 || x == 4) return 2;
           else if (x == 5 || x == 6) return 3;
           else return 4;
       }
namespace FSharpCond

module Bench =
    let condition_1 x =
        if  (x = 1 || x = 2) then 1
        elif(x = 3 || x = 4) then 2
        elif(x = 5 || x = 6) then 3
        else 4
BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.985 (20H2/October2020Update)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.100
  [Host]     : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT
  DefaultJob : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT

Method Mean Error StdDev BranchMispredictions/Op BranchInstructions/Op Code Size
CSharp 10.71 ms 0.087 ms 0.081 ms 870,758 8,687,070 100 B
FSharp 11.78 ms 0.097 ms 0.086 ms 878,005 8,697,378 100 B

@badamczewski
Copy link

This bench shows what really happens when we are dealing with a mix of forward and backward branches, the performance alternates between identical to C# and Slower than C#. This is because if we want to reach for a condition when x = 11 we first have to cross the backwards branch, on most intels if the predictor state is undiscovered it will pick forwards as not taken and backwards as taken. New AMD CPUs assume that unseen branches are not taken but since they use a deep dynamic predictor it fairs extremely poorly with a small number of alternating branches:

https://developer.amd.com/wordpress/media/2013/12/55723_SOG_Fam_17h_Processors_3.00.pdf
https://www.agner.org/optimize/microarchitecture.pdf

This doesn't explain why the benchmark still works on a fully initialized branch predictor that guesses 100% of the time, but it might be that dynamic predictors and CPUs, in general, don't like mixing branches.

    [DisassemblyDiagnoser]
    [HardwareCounters(
     HardwareCounter.BranchMispredictions,
     HardwareCounter.BranchInstructions)]
    public class Bench2
    {

        [Benchmark]
        [Arguments(10)]
        [Arguments(11)]
        [Arguments(12)]
        [Arguments(13)]
        [Arguments(14)]
        [Arguments(15)]
        public int CSharp(int x)
        {
            return Cond(x);
        }

        [Benchmark]
        [Arguments(10)]
        [Arguments(11)]
        [Arguments(12)]
        [Arguments(13)]
        [Arguments(14)]
        [Arguments(15)]
        public int FSharp(int x)
        {
            return FSharpCond.Bench.condition_2(x);
        }

        //
        // FSharp will not inline the code so we shouldn't eiter.
        //
        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int Cond(int x)
        {
            if (x == 1 || x == 2) return 1;
            else if (x == 3 || x == 4) return 2;
            else if (x == 5 || x == 6) return 3;
            else if (x == 5 || x == 6) return 3;
            else if (x == 7 || x == 8) return 4;
            else if (x == 9 || x == 10) return 5;
            else if (x == 11 || x == 12) return 6;
            else if (x == 13 || x == 14) return 7;

            else return 8;
        }
    }
    let condition_2 x =
        if  (x = 1 || x = 2)   then 1
        elif(x = 3 || x = 4)   then 2
        elif(x = 5 || x = 6)   then 3
        elif(x = 7 || x = 8)   then 4
        elif(x = 9 || x = 10)  then 5
        elif(x = 11 || x = 12) then 6
        elif(x = 13 || x = 14) then 7

        else 8
BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.985 (20H2/October2020Update)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.100
  [Host]     : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT
  DefaultJob : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT

Method x Mean Error StdDev Median Code Size BranchInstructions/Op BranchMispredictions/Op
CSharp 10 4.076 ns 0.0889 ns 0.0788 ns 4.086 ns 129 B 13 0
FSharp 10 6.316 ns 0.1699 ns 0.3273 ns 6.231 ns 129 B 13 0
CSharp 11 4.851 ns 0.1008 ns 0.1200 ns 4.882 ns 129 B 14 0
FSharp 11 4.772 ns 0.1043 ns 0.0975 ns 4.793 ns 129 B 14 -0
CSharp 12 4.872 ns 0.1405 ns 0.2386 ns 4.863 ns 129 B 15 0
FSharp 12 6.937 ns 0.1841 ns 0.5162 ns 6.862 ns 129 B 15 0
CSharp 13 5.902 ns 0.1668 ns 0.4838 ns 5.836 ns 129 B 16 0
FSharp 13 5.859 ns 0.1587 ns 0.4125 ns 5.854 ns 129 B 16 0
CSharp 14 4.913 ns 0.1296 ns 0.1858 ns 4.907 ns 129 B 17 0
FSharp 14 6.984 ns 0.1839 ns 0.4224 ns 6.921 ns 129 B 17 0
CSharp 15 5.413 ns 0.1901 ns 0.5605 ns 5.386 ns 129 B 16 0
FSharp 15 5.591 ns 0.1620 ns 0.4649 ns 5.437 ns 129 B 16 0

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCondBench.Bench2.Cond(Int32)
; Total bytes of code 7
; FSharpCondBench.Bench2.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCond.Bench.condition_2(Int32)
; Total bytes of code 7
; FSharpCond.Bench.condition_2(Int32)
       cmp       ecx,1
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,2
       je        short M01_L00
       cmp       ecx,3
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,4
       je        short M01_L02
       cmp       ecx,5
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,6
       je        short M01_L04
       cmp       ecx,7
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,8
       je        short M01_L06
       cmp       ecx,9
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0A
       je        short M01_L08
       cmp       ecx,0B
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0C
       je        short M01_L10
       cmp       ecx,0D
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       cmp       ecx,0E
       je        short M01_L12
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCondBench.Bench2.Cond(Int32)
; Total bytes of code 7
; FSharpCondBench.Bench2.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCond.Bench.condition_2(Int32)
; Total bytes of code 7
; FSharpCond.Bench.condition_2(Int32)
       cmp       ecx,1
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,2
       je        short M01_L00
       cmp       ecx,3
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,4
       je        short M01_L02
       cmp       ecx,5
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,6
       je        short M01_L04
       cmp       ecx,7
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,8
       je        short M01_L06
       cmp       ecx,9
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0A
       je        short M01_L08
       cmp       ecx,0B
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0C
       je        short M01_L10
       cmp       ecx,0D
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       cmp       ecx,0E
       je        short M01_L12
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCondBench.Bench2.Cond(Int32)
; Total bytes of code 7
; FSharpCondBench.Bench2.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCond.Bench.condition_2(Int32)
; Total bytes of code 7
; FSharpCond.Bench.condition_2(Int32)
       cmp       ecx,1
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,2
       je        short M01_L00
       cmp       ecx,3
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,4
       je        short M01_L02
       cmp       ecx,5
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,6
       je        short M01_L04
       cmp       ecx,7
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,8
       je        short M01_L06
       cmp       ecx,9
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0A
       je        short M01_L08
       cmp       ecx,0B
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0C
       je        short M01_L10
       cmp       ecx,0D
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       cmp       ecx,0E
       je        short M01_L12
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCondBench.Bench2.Cond(Int32)
; Total bytes of code 7
; FSharpCondBench.Bench2.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCond.Bench.condition_2(Int32)
; Total bytes of code 7
; FSharpCond.Bench.condition_2(Int32)
       cmp       ecx,1
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,2
       je        short M01_L00
       cmp       ecx,3
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,4
       je        short M01_L02
       cmp       ecx,5
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,6
       je        short M01_L04
       cmp       ecx,7
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,8
       je        short M01_L06
       cmp       ecx,9
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0A
       je        short M01_L08
       cmp       ecx,0B
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0C
       je        short M01_L10
       cmp       ecx,0D
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       cmp       ecx,0E
       je        short M01_L12
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCondBench.Bench2.Cond(Int32)
; Total bytes of code 7
; FSharpCondBench.Bench2.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCond.Bench.condition_2(Int32)
; Total bytes of code 7
; FSharpCond.Bench.condition_2(Int32)
       cmp       ecx,1
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,2
       je        short M01_L00
       cmp       ecx,3
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,4
       je        short M01_L02
       cmp       ecx,5
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,6
       je        short M01_L04
       cmp       ecx,7
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,8
       je        short M01_L06
       cmp       ecx,9
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0A
       je        short M01_L08
       cmp       ecx,0B
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0C
       je        short M01_L10
       cmp       ecx,0D
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       cmp       ecx,0E
       je        short M01_L12
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCondBench.Bench2.Cond(Int32)
; Total bytes of code 7
; FSharpCondBench.Bench2.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET 5.0.0 (5.0.20.51904), X64 RyuJIT

; FSharpCondBench.Bench2.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr FSharpCond.Bench.condition_2(Int32)
; Total bytes of code 7
; FSharpCond.Bench.condition_2(Int32)
       cmp       ecx,1
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,2
       je        short M01_L00
       cmp       ecx,3
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,4
       je        short M01_L02
       cmp       ecx,5
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,6
       je        short M01_L04
       cmp       ecx,7
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,8
       je        short M01_L06
       cmp       ecx,9
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0A
       je        short M01_L08
       cmp       ecx,0B
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0C
       je        short M01_L10
       cmp       ecx,0D
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       cmp       ecx,0E
       je        short M01_L12
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

@dsyme
Copy link
Contributor Author

dsyme commented Jun 1, 2021

@badamczewski Just to check, are those results on this branch, or main, or a release version in the .NET SDK?

Also what's the high level summary of the results please :)

@badamczewski
Copy link

@dsyme, this is the release version of .NET 5. The high-level summary is that there's a performance gain when conditions are composed of forward branches on intel CPUs (Skylake, Hanswel) and most likely on AMD's as well 🙂

@dsyme
Copy link
Contributor Author

dsyme commented Jun 2, 2021

Codewise this is ready (once green). We should add a Benchmark.NET perf suite project to test this sort of thing out

@cartermp
Copy link
Contributor

cartermp commented Jun 2, 2021

This would be a good place to add it, likely a new project to the solution: https://github.com/dotnet/fsharp/tree/main/tests/benchmarks

@badamczewski
Copy link

@dsyme this kind of branch organization is considered good practice. That being said, it would be great to test this on an AMD Ryzen CPU as well.

@dsyme
Copy link
Contributor Author

dsyme commented Jun 2, 2021

I added a benchmark. Here are the perf results on my (old) Xeon processor. The new perf results simply make the F# identical to the C# so are uninteresting to list, you can tell the differences below.

OLD:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Xeon CPU E5-1620 0 3.60GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.104
  [Host]     : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT


| Method | x |     Mean |     Error |    StdDev | Code Size | BranchInstructions/Op | BranchMispredictions/Op |
|------- |-- |---------:|----------:|----------:|----------:|----------------------:|------------------------:|
| CSharp | 1 | 1.568 ns | 0.0154 ns | 0.0144 ns |      45 B |                     2 |                       0 |
| FSharp | 1 | 1.100 ns | 0.0195 ns | 0.0173 ns |      45 B |                     2 |                       0 |
| CSharp | 2 | 1.054 ns | 0.0335 ns | 0.0314 ns |      45 B |                     3 |                       0 |
| FSharp | 2 | 2.168 ns | 0.0109 ns | 0.0085 ns |      45 B |                     3 |                       0 |
| CSharp | 3 | 2.088 ns | 0.0118 ns | 0.0111 ns |      45 B |                     3 |                       0 |
| FSharp | 3 | 1.897 ns | 0.0102 ns | 0.0095 ns |      45 B |                     3 |                       0 |
| CSharp | 4 | 1.590 ns | 0.0237 ns | 0.0222 ns |      45 B |                     4 |                       0 |
| FSharp | 4 | 2.983 ns | 0.0119 ns | 0.0105 ns |      45 B |                     4 |                       0 |

@dsyme
Copy link
Contributor Author

dsyme commented Jun 2, 2021

This is now ready

We can integrate it, simply on the basis that we now generate the same code as C# for if A || B then .... Though it would also be good to get expected perf improvement measurements on recent Intel and AMD processors as mentioned by @badamczewski

To do that (this is for Windows, step 1 will need adjustment on Linux)

  1. uncomment lines in tests/benchmarks/MicroPerf/MicroPerf.fsproj to set to old F# compiler https://github.com/dotnet/fsharp/pull/11619/files#diff-99a320aaa85a277dc6a5ece2ecdd7243de305dde9020d61160f181a3e3af5c30R17#

  2. build -c Release

  3. Run artifacts\bin\MicroPerf\Release\net5.0\MicroPerf.exe

  4. paste results below including processor description.

thanks

@dominikprzywara
Copy link

NEW

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET Core SDK=5.0.104
  [Host]     : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT


| Method | x |      Mean |     Error |    StdDev | Code Size | BranchInstructions/Op | BranchMispredictions/Op |
|------- |-- |----------:|----------:|----------:|----------:|----------------------:|------------------------:|
| CSharp | 1 | 0.6229 ns | 0.0033 ns | 0.0028 ns |      45 B |                     2 |                       0 |
| FSharp | 1 | 0.6217 ns | 0.0032 ns | 0.0030 ns |      45 B |                     2 |                       0 |
| CSharp | 2 | 0.6732 ns | 0.0030 ns | 0.0028 ns |      45 B |                     2 |                       0 |
| FSharp | 2 | 0.6759 ns | 0.0037 ns | 0.0033 ns |      45 B |                     2 |                       0 |
| CSharp | 3 | 0.8321 ns | 0.0046 ns | 0.0043 ns |      45 B |                     2 |                       0 |
| FSharp | 3 | 1.0849 ns | 0.0043 ns | 0.0038 ns |      45 B |                     2 |                       0 |
| CSharp | 4 | 0.8783 ns | 0.0057 ns | 0.0053 ns |      45 B |                     2 |                       0 |
| FSharp | 4 | 0.8782 ns | 0.0034 ns | 0.0030 ns |      45 B |                     2 |                       0 |

OLD

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET Core SDK=5.0.104
  [Host]     : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT


| Method | x |      Mean |     Error |    StdDev | Code Size | BranchInstructions/Op | BranchMispredictions/Op |
|------- |-- |----------:|----------:|----------:|----------:|----------------------:|------------------------:|
| CSharp | 1 | 0.6199 ns | 0.0049 ns | 0.0043 ns |      45 B |                     2 |                       0 |
| FSharp | 1 | 0.7116 ns | 0.0033 ns | 0.0029 ns |      45 B |                     2 |                       0 |
| CSharp | 2 | 0.6744 ns | 0.0055 ns | 0.0051 ns |      45 B |                     2 |                       0 |
| FSharp | 2 | 1.0823 ns | 0.0051 ns | 0.0047 ns |      45 B |                     2 |                       0 |
| CSharp | 3 | 1.0834 ns | 0.0034 ns | 0.0032 ns |      45 B |                     2 |                       0 |
| FSharp | 3 | 0.9333 ns | 0.0292 ns | 0.0274 ns |      45 B |                     2 |                       0 |
| CSharp | 4 | 0.6257 ns | 0.0025 ns | 0.0022 ns |      45 B |                     2 |                       0 |
| FSharp | 4 | 1.2912 ns | 0.0043 ns | 0.0041 ns |      45 B |                     3 |                       0 |

@kerams
Copy link
Contributor

kerams commented Jun 2, 2021

@dsyme, @badamczewski, here you go, a 2019 Ryzen CPU. It looks a bit wild, but I wasn't really using the PC at the time.

Old:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.300-preview.21258.4
  [Host]     : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Method x Mean Error StdDev Code Size BranchInstructions/Op BranchMispredictions/Op
CSharp 1 0.6838 ns 0.0091 ns 0.0085 ns 45 B 2 0
FSharp 1 0.4287 ns 0.0011 ns 0.0009 ns 45 B 2 0
CSharp 2 0.4741 ns 0.0021 ns 0.0017 ns 45 B 2 0
FSharp 2 0.8899 ns 0.0055 ns 0.0049 ns 45 B 2 0
CSharp 3 0.8931 ns 0.0022 ns 0.0019 ns 45 B 3 0
FSharp 3 0.6774 ns 0.0057 ns 0.0053 ns 45 B 2 0
CSharp 4 0.7042 ns 0.0146 ns 0.0114 ns 45 B 2 0
FSharp 4 1.3445 ns 0.0008 ns 0.0007 ns 45 B 2 0

New:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.300-preview.21258.4
  [Host]     : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Method x Mean Error StdDev Code Size BranchInstructions/Op BranchMispredictions/Op
CSharp 1 1.1253 ns 0.0009 ns 0.0008 ns 45 B 2 0
FSharp 1 1.1480 ns 0.0021 ns 0.0016 ns 45 B 2 0
CSharp 2 0.4583 ns 0.0007 ns 0.0006 ns 45 B 2 0
FSharp 2 0.4763 ns 0.0053 ns 0.0050 ns 45 B 2 0
CSharp 3 0.9002 ns 0.0021 ns 0.0016 ns 45 B 2 0
FSharp 3 0.9221 ns 0.0021 ns 0.0020 ns 45 B 2 0
CSharp 4 1.1160 ns 0.0009 ns 0.0008 ns 45 B 2 0
FSharp 4 0.6995 ns 0.0039 ns 0.0033 ns 45 B 2 0

New, run 2:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.300-preview.21258.4
  [Host]     : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Method x Mean Error StdDev Code Size BranchInstructions/Op BranchMispredictions/Op
CSharp 1 0.6976 ns 0.0019 ns 0.0016 ns 45 B 2 0
FSharp 1 0.6832 ns 0.0005 ns 0.0004 ns 45 B 2 0
CSharp 2 0.4548 ns 0.0008 ns 0.0007 ns 45 B 2 0
FSharp 2 0.4559 ns 0.0038 ns 0.0035 ns 45 B 2 0
CSharp 3 0.9780 ns 0.0309 ns 0.0274 ns 45 B 2 0
FSharp 3 0.8945 ns 0.0056 ns 0.0047 ns 45 B 2 0
CSharp 4 0.6736 ns 0.0013 ns 0.0010 ns 45 B 2 0
FSharp 4 0.6983 ns 0.0063 ns 0.0053 ns 45 B 3 0

If anyone wants to run this with a preview of VS, the correct tool path is C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\Common7\IDE\CommonExtensions\Microsoft\FSharp\Tools. Also had to disable virtualization in the BIOS, otherwise the benchmark would not run.

@cartermp
Copy link
Contributor

cartermp commented Jun 2, 2021

@dsyme more failures :)

@badamczewski
Copy link

@dominikprzywara @kerams Could you also run the benchmark under this post:
@dsyme The bench project should use the updated test code if possible.

Ryzen CPUs are crazy fast and they need to execute more instructions for the results to be more stable, and thus the test in the list will force the code to run more instructions per test without changing the branching code in any way:

#11619 (comment)

@kerams
Copy link
Contributor

kerams commented Jun 2, 2021

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.300-preview.21258.4
  [Host]     : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Method x Mean Error StdDev Code Size BranchInstructions/Op BranchMispredictions/Op
CSharp 10 2.355 ns 0.0430 ns 0.0402 ns 129 B 4 0
FSharp 10 2.056 ns 0.0170 ns 0.0151 ns 129 B 4 0
CSharp 11 3.032 ns 0.0341 ns 0.0319 ns 129 B 5 0
FSharp 11 2.131 ns 0.0260 ns 0.0230 ns 129 B 4 0
CSharp 12 2.310 ns 0.0142 ns 0.0125 ns 129 B 5 0
FSharp 12 2.338 ns 0.0474 ns 0.0396 ns 129 B 5 0
CSharp 13 2.609 ns 0.0305 ns 0.0270 ns 129 B 5 0
FSharp 13 2.510 ns 0.0167 ns 0.0156 ns 129 B 5 0
CSharp 14 2.296 ns 0.0273 ns 0.0255 ns 129 B 5 0
FSharp 14 2.293 ns 0.0158 ns 0.0140 ns 129 B 5 0
CSharp 15 2.306 ns 0.0183 ns 0.0171 ns 129 B 5 0
FSharp 15 2.228 ns 0.0158 ns 0.0148 ns 129 B 5 0

Looks good to me. The generated assembly is below.

Details

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

@dsyme
Copy link
Contributor Author

dsyme commented Jun 2, 2021

@dsyme more failures :)

Should be fixed now.

@dominikprzywara
Copy link

Sorry for delay :)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET Core SDK=5.0.104
  [Host]     : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

Method x Mean Error StdDev Code Size BranchInstructions/Op BranchMispredictions/Op
CSharp 10 1.238 ns 0.0121 ns 0.0113 ns 129 B 4 0
FSharp 10 1.630 ns 0.0051 ns 0.0043 ns 129 B 4 0
CSharp 11 1.836 ns 0.0060 ns 0.0056 ns 129 B 5 0
FSharp 11 2.101 ns 0.0551 ns 0.0516 ns 129 B 4 0
CSharp 12 1.891 ns 0.0095 ns 0.0084 ns 129 B 5 0
FSharp 12 1.825 ns 0.0058 ns 0.0051 ns 129 B 4 0
CSharp 13 2.445 ns 0.0090 ns 0.0085 ns 129 B 5 0
FSharp 13 2.443 ns 0.0055 ns 0.0046 ns 129 B 5 0
CSharp 14 1.636 ns 0.0059 ns 0.0052 ns 129 B 5 0
FSharp 14 1.632 ns 0.0073 ns 0.0068 ns 129 B 5 0
CSharp 15 2.030 ns 0.0059 ns 0.0055 ns 129 B 5 0
FSharp 15 2.039 ns 0.0122 ns 0.0114 ns 129 B 5 0
ASM

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.CSharp(Int32)
       mov       ecx,edx
       jmp       near ptr MicroPerfCSharp.Cond(Int32)
; Total bytes of code 7
; MicroPerfCSharp.Cond(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

.NET Core 5.0.6 (CoreCLR 5.0.621.22011, CoreFX 5.0.621.22011), X64 RyuJIT

; TaskPerf.Benchmarks.FSharp(Int32)
       mov       ecx,edx
       jmp       near ptr TaskPerf.Code.condition_2(Int32)
; Total bytes of code 7
; TaskPerf.Code.condition_2(Int32)
       cmp       ecx,1
       je        short M01_L00
       cmp       ecx,2
       jne       short M01_L01
M01_L00:
       mov       eax,1
       ret
M01_L01:
       cmp       ecx,3
       je        short M01_L02
       cmp       ecx,4
       jne       short M01_L03
M01_L02:
       mov       eax,2
       ret
M01_L03:
       cmp       ecx,5
       je        short M01_L04
       cmp       ecx,6
       jne       short M01_L05
M01_L04:
       mov       eax,3
       ret
M01_L05:
       cmp       ecx,7
       je        short M01_L06
       cmp       ecx,8
       jne       short M01_L07
M01_L06:
       mov       eax,4
       jmp       short M01_L14
M01_L07:
       cmp       ecx,9
       je        short M01_L08
       cmp       ecx,0A
       jne       short M01_L09
M01_L08:
       mov       eax,5
       jmp       short M01_L14
M01_L09:
       cmp       ecx,0B
       je        short M01_L10
       cmp       ecx,0C
       jne       short M01_L11
M01_L10:
       mov       eax,6
       jmp       short M01_L14
M01_L11:
       cmp       ecx,0D
       je        short M01_L12
       cmp       ecx,0E
       jne       short M01_L13
M01_L12:
       mov       eax,7
       jmp       short M01_L14
M01_L13:
       mov       eax,8
M01_L14:
       ret
; Total bytes of code 122

@dsyme dsyme requested review from TIHan and cartermp June 4, 2021 00:42
Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is pretty short, majority of changes are updating baseline files.
Awesome.

@dsyme dsyme merged commit 63e4741 into main Jun 4, 2021
@KevinRansom KevinRansom deleted the fwdbranches branch June 30, 2021 19:08
@KevinRansom KevinRansom restored the fwdbranches branch June 30, 2021 19:08
@KevinRansom KevinRansom deleted the fwdbranches branch June 30, 2021 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants