Skip to content

[TieredCompilation] Cold methods with hot loops may run slower with tiering #11006

@kouvel

Description

@kouvel
internal static class Program
{
    private const int HistoryCount = 8;
    private const int InnerIterationCount = 256;
    private static readonly TimeSpan s_ts500ms = TimeSpan.FromMilliseconds(500);

    private static void Main()
    {
        var sw = new Stopwatch();
        var history = new Queue<double>(HistoryCount);
        var list = new List<int>(InnerIterationCount);
        for (int outerIteration = -1; outerIteration < HistoryCount; ++outerIteration)
        {
            var duration = s_ts500ms;
            int iterations = 0;
            TimeSpan elapsed;
            sw.Restart();
            do
            {
                // ---
                list.Clear();
                for (int innerIteration = 0; innerIteration < InnerIterationCount; ++innerIteration)
                    list.Add(innerIteration);
                // ---
                ++iterations;
            } while ((iterations & 0xf) != 0 || (elapsed = sw.Elapsed) < duration);

            if (outerIteration < 0)
                continue;

            var iterationsPerMs = iterations / elapsed.TotalMilliseconds;
            if (history.Count >= HistoryCount)
                history.Dequeue();
            history.Enqueue(iterationsPerMs);
            Console.WriteLine($"{iterationsPerMs,10:0.00} {history.Average(),10:0.00}");
        }
    }
}

Average iterations per ms with tiering disabled: 2775.05
Tiering enabled: 2045.84

A comparison of PerfView profiles shows that some inlining is not happening:

Name                                                                               	Inc %	     Inc	Exc %	   Exc
 test!Program.Main()                                                               	 97.7	   4,485	 29.9	 1,375
+ system.private.corelib!System.Collections.Generic.List`1[System.Int32].Add(Int32)	 66.9	   3,072	 66.8	 3,069

The JITStats summary shows that the only JIT trigger for Main is FG (foreground), which when tiering is enabled, is tier 0 (minopts), which does not do inlining. There is no TC trigger to indicate tier 1 for Main.

A workaround is to move the iteration code into a separate method:

internal static class Program
{
    private const int HistoryCount = 8;
    private const int InnerIterationCount = 256;
    private static readonly TimeSpan s_ts500ms = TimeSpan.FromMilliseconds(500);

    private static void Main()
    {
        var sw = new Stopwatch();
        var history = new Queue<double>(HistoryCount);
        var list = new List<int>(InnerIterationCount);
        for (int outerIteration = -1; outerIteration < HistoryCount; ++outerIteration)
        {
            var duration = s_ts500ms;
            int iterations = 0;
            TimeSpan elapsed;
            sw.Restart();
            do
            {
                // ---
                RunIteration(list);
                // ---
                ++iterations;
            } while ((iterations & 0xf) != 0 || (elapsed = sw.Elapsed) < duration);

            if (outerIteration < 0)
                continue;

            var iterationsPerMs = iterations / elapsed.TotalMilliseconds;
            if (history.Count >= HistoryCount)
                history.Dequeue();
            history.Enqueue(iterationsPerMs);
            Console.WriteLine($"{iterationsPerMs,10:0.00} {history.Average(),10:0.00}");
        }
    }

    private static void RunIteration(List<int> list)
    {
        list.Clear();
        for (int innerIteration = 0; innerIteration < InnerIterationCount; ++innerIteration)
            list.Add(innerIteration);
    }
}

Average iterations per ms with tiering disabled: 2775.55
Tiering enabled: 2728.70

The PerfView profile now shows most of the time spent is exclusively in RunIteration as expected:

Name                                                                                	Inc %	     Inc	Exc %	   Exc	    First	      Last
 test!Program.Main()                                                                	 98.0	   4,490	  0.5	    22	1,567.832	 6,074.464
+ test!Program.RunIteration(class System.Collections.Generic.List`1)                	 97.0	   4,443	 93.5	 4,283	1,568.696	 6,074.464
|+ system.private.corelib!System.Collections.Generic.List`1[System.Int32].Add(Int32)	  3.5	     159	  3.5	   159	1,569.678	 1,792.351

List.Add is still showing up, and that must be when RunIteration was at tier 0, as the JITStats summary shows:

Start (msec) JitTime msec IL Size Native Size Method Name Trigger
1,568.151 0.1 30 74 Program.RunIteration(class System.Collections.Generic.List`1) FG
1,791.821 0.6 30 78 Program.RunIteration(class System.Collections.Generic.List`1) TC

The last sample in List.Add in the profile was at 1,792.351. The tier 1 JIT for RunIteration was initiated at 1,791.821 and would have completed at around 1,792.421.

Other workarounds:

  • For benchmarks where each iteration of the benchmark is very short (a few milliseconds or less), use something like BenchmarkDotNet, where tiering would occur during the piloting or warmup phases and would not affect the measured phase. If each iteration of the benchmark takes longer, the number of warmup iterations may be increased to allow enough time for tiering to occur before measurement begins.
  • Disable tier 0 JIT (in environment COMPlus_TieredCompilation_DisableTier0Jit=1 or in project file <DisableTier0Jit>true</DisableTier0Jit>). In this mode, methods that don't have pregenerated code would be optimized initially. It may be useful as a global workaround for a suite of benchmarks where there may be several instances of cold methods with hot loops. For apps, it would avoid the worst-case situations where a cold method jitted at tier 0 contains a hot loop that runs for a long time. It would still be possible to be running a long-running hot loop in a cold method that has not yet been jitted at tier 1, but it would be running optimized pregenerated code, so the perf may be reasonable and the issue may not be as severe.
  • Attribute methods expected to contain hot code with MethodImplOptions.AggressiveOptimization. In the first example above, that would be:
      [MethodImpl(MethodImplOptions.AggressiveOptimization)]
      private static void Main()
      {
          ...
      }
  • Turn off tiered compilation (in environment COMPlus_TieredCompilation=0 or in project file <TieredCompilation>false</TieredCompilation>) for such types of benchmarks

Considerations:

  • Consider optimizing loops at tier 0, or methods containing loops. Data needs to be collected on how this would affect startup performance.
  • Longer-term: A proper fix would probably involve at least some portions of what OSR involves

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions