You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider switching to using Thread::YieldProcessorNormalized() instead of YieldProcessor() (after calling Thread::EnsureYieldProcessorNormalizedInitialized() before the spin loop)
Issue https://github.com/dotnet/coreclr/issues/13388 mentions that from Intel Skylake architecture, the pause delay has been significantly increased, so any spin loops that do a large number of YieldProcessor() calls per spin iteration, or spin loops that do a large number of spin iterations over YieldProcessor() (such as enter_spin_lock), may be significantly less efficient on Skylake and newer processors
PR Add normalized equivalent of YieldProcessor, retune some spin loops coreclr#13670 added Thread::YieldProcessorNormalized() to measure and normalize the delay per call across different processors. Since the change in the processor would require retuning spin loops such as above based on the processor, the idea is that switching to YieldProcessorNormalized() will also require retuning but only once and from then hopefully the spin loop would work similarly on different processors.
YieldProcessorNormalizedWithBackOff() increases the delay up to a maximum (also scaled for the processor) that translates to a delay of about 900 cycles. The observation based on other testing was that significantly beyond that, Sleep(0) typically produces more efficient spinning by letting other threads run. This may or may not be useful, but it's there if it's found to be useful.
For instance, in enter_spin_lock with server GC, the max spin count on a 4-core (8-thread) Skylake processor would be equivalent to a total delay of 1 M cycles, which is about 300 us. On a pre-Skylake processor with the same core/thread count, the total delay is about 140 K cycles, which is about 40 us.
Consider switching to using Thread::YieldProcessorNormalized() instead of YieldProcessor() (after calling Thread::EnsureYieldProcessorNormalizedInitialized() before the spin loop)