internal static class Program
{
private const int HistoryCount = 8;
private const int InnerIterationCount = 256;
private static readonly TimeSpan s_ts500ms = TimeSpan.FromMilliseconds(500);
private static void Main()
{
var sw = new Stopwatch();
var history = new Queue<double>(HistoryCount);
var list = new List<int>(InnerIterationCount);
for (int outerIteration = -1; outerIteration < HistoryCount; ++outerIteration)
{
var duration = s_ts500ms;
int iterations = 0;
TimeSpan elapsed;
sw.Restart();
do
{
// ---
list.Clear();
for (int innerIteration = 0; innerIteration < InnerIterationCount; ++innerIteration)
list.Add(innerIteration);
// ---
++iterations;
} while ((iterations & 0xf) != 0 || (elapsed = sw.Elapsed) < duration);
if (outerIteration < 0)
continue;
var iterationsPerMs = iterations / elapsed.TotalMilliseconds;
if (history.Count >= HistoryCount)
history.Dequeue();
history.Enqueue(iterationsPerMs);
Console.WriteLine($"{iterationsPerMs,10:0.00} {history.Average(),10:0.00}");
}
}
}
Average iterations per ms with tiering disabled: 2775.05
Tiering enabled: 2045.84
A comparison of PerfView profiles shows that some inlining is not happening:
Name Inc % Inc Exc % Exc
test!Program.Main() 97.7 4,485 29.9 1,375
+ system.private.corelib!System.Collections.Generic.List`1[System.Int32].Add(Int32) 66.9 3,072 66.8 3,069
The JITStats summary shows that the only JIT trigger for Main is FG (foreground), which when tiering is enabled, is tier 0 (minopts), which does not do inlining. There is no TC trigger to indicate tier 1 for Main.
A workaround is to move the iteration code into a separate method:
internal static class Program
{
private const int HistoryCount = 8;
private const int InnerIterationCount = 256;
private static readonly TimeSpan s_ts500ms = TimeSpan.FromMilliseconds(500);
private static void Main()
{
var sw = new Stopwatch();
var history = new Queue<double>(HistoryCount);
var list = new List<int>(InnerIterationCount);
for (int outerIteration = -1; outerIteration < HistoryCount; ++outerIteration)
{
var duration = s_ts500ms;
int iterations = 0;
TimeSpan elapsed;
sw.Restart();
do
{
// ---
RunIteration(list);
// ---
++iterations;
} while ((iterations & 0xf) != 0 || (elapsed = sw.Elapsed) < duration);
if (outerIteration < 0)
continue;
var iterationsPerMs = iterations / elapsed.TotalMilliseconds;
if (history.Count >= HistoryCount)
history.Dequeue();
history.Enqueue(iterationsPerMs);
Console.WriteLine($"{iterationsPerMs,10:0.00} {history.Average(),10:0.00}");
}
}
private static void RunIteration(List<int> list)
{
list.Clear();
for (int innerIteration = 0; innerIteration < InnerIterationCount; ++innerIteration)
list.Add(innerIteration);
}
}
Average iterations per ms with tiering disabled: 2775.55
Tiering enabled: 2728.70
The PerfView profile now shows most of the time spent is exclusively in RunIteration as expected:
Name Inc % Inc Exc % Exc First Last
test!Program.Main() 98.0 4,490 0.5 22 1,567.832 6,074.464
+ test!Program.RunIteration(class System.Collections.Generic.List`1) 97.0 4,443 93.5 4,283 1,568.696 6,074.464
|+ system.private.corelib!System.Collections.Generic.List`1[System.Int32].Add(Int32) 3.5 159 3.5 159 1,569.678 1,792.351
List.Add is still showing up, and that must be when RunIteration was at tier 0, as the JITStats summary shows:
| Start (msec) |
JitTime msec |
IL Size |
Native Size |
Method Name |
Trigger |
| 1,568.151 |
0.1 |
30 |
74 |
Program.RunIteration(class System.Collections.Generic.List`1) |
FG |
| 1,791.821 |
0.6 |
30 |
78 |
Program.RunIteration(class System.Collections.Generic.List`1) |
TC |
The last sample in List.Add in the profile was at 1,792.351. The tier 1 JIT for RunIteration was initiated at 1,791.821 and would have completed at around 1,792.421.
Other workarounds:
- For benchmarks where each iteration of the benchmark is very short (a few milliseconds or less), use something like BenchmarkDotNet, where tiering would occur during the piloting or warmup phases and would not affect the measured phase. If each iteration of the benchmark takes longer, the number of warmup iterations may be increased to allow enough time for tiering to occur before measurement begins.
- Disable tier 0 JIT (in environment
COMPlus_TieredCompilation_DisableTier0Jit=1 or in project file <DisableTier0Jit>true</DisableTier0Jit>). In this mode, methods that don't have pregenerated code would be optimized initially. It may be useful as a global workaround for a suite of benchmarks where there may be several instances of cold methods with hot loops. For apps, it would avoid the worst-case situations where a cold method jitted at tier 0 contains a hot loop that runs for a long time. It would still be possible to be running a long-running hot loop in a cold method that has not yet been jitted at tier 1, but it would be running optimized pregenerated code, so the perf may be reasonable and the issue may not be as severe.
- Attribute methods expected to contain hot code with
MethodImplOptions.AggressiveOptimization. In the first example above, that would be:
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
private static void Main()
{
...
}
- Turn off tiered compilation (in environment
COMPlus_TieredCompilation=0 or in project file <TieredCompilation>false</TieredCompilation>) for such types of benchmarks
Considerations:
- Consider optimizing loops at tier 0, or methods containing loops. Data needs to be collected on how this would affect startup performance.
- Longer-term: A proper fix would probably involve at least some portions of what OSR involves
Average iterations per ms with tiering disabled:
2775.05Tiering enabled:
2045.84A comparison of PerfView profiles shows that some inlining is not happening:
The JITStats summary shows that the only JIT trigger for
MainisFG(foreground), which when tiering is enabled, is tier 0 (minopts), which does not do inlining. There is noTCtrigger to indicate tier 1 forMain.A workaround is to move the iteration code into a separate method:
Average iterations per ms with tiering disabled:
2775.55Tiering enabled:
2728.70The PerfView profile now shows most of the time spent is exclusively in
RunIterationas expected:List.Addis still showing up, and that must be whenRunIterationwas at tier 0, as the JITStats summary shows:The last sample in
List.Addin the profile was at1,792.351. The tier 1 JIT forRunIterationwas initiated at1,791.821and would have completed at around1,792.421.Other workarounds:
COMPlus_TieredCompilation_DisableTier0Jit=1or in project file<DisableTier0Jit>true</DisableTier0Jit>). In this mode, methods that don't have pregenerated code would be optimized initially. It may be useful as a global workaround for a suite of benchmarks where there may be several instances of cold methods with hot loops. For apps, it would avoid the worst-case situations where a cold method jitted at tier 0 contains a hot loop that runs for a long time. It would still be possible to be running a long-running hot loop in a cold method that has not yet been jitted at tier 1, but it would be running optimized pregenerated code, so the perf may be reasonable and the issue may not be as severe.MethodImplOptions.AggressiveOptimization. In the first example above, that would be:COMPlus_TieredCompilation=0or in project file<TieredCompilation>false</TieredCompilation>) for such types of benchmarksConsiderations: