Skip to content

threading: lock-free fast path for SemaphoreSlim.WaitAsync#125452

Open
thomhurst wants to merge 4 commits intodotnet:mainfrom
thomhurst:semaphoreslim-cas
Open

threading: lock-free fast path for SemaphoreSlim.WaitAsync#125452
thomhurst wants to merge 4 commits intodotnet:mainfrom
thomhurst:semaphoreslim-cas

Conversation

@thomhurst
Copy link

Use a lock-free CAS fast path in SemaphoreSlim.WaitAsync to skip the Monitor lock when a permit is immediately available, improving uncontended throughput

Copilot AI review requested due to automatic review settings March 11, 2026 18:18
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 11, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

@thomhurst
Copy link
Author

@EgorBot -intel -amd -arm

  using System.Threading;
  using System.Threading.Tasks;
  using BenchmarkDotNet.Attributes;

  [MemoryDiagnoser]
  public class SemaphoreSlimUncontended
  {
      private SemaphoreSlim _sem = new SemaphoreSlim(1, 1);

      [Benchmark]
      public async Task WaitAsync_Release()
      {
          await _sem.WaitAsync();
          _sem.Release();
      }
  }

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a lock-free fast path in SemaphoreSlim.WaitAsync that attempts to acquire an available permit via CAS, avoiding taking m_lockObjAndDisposed when uncontended.

Changes:

  • Added a CAS-based fast path to decrement m_currentCount when a permit appears immediately available.
  • Added special-case handling to keep AvailableWaitHandle state consistent if it’s initialized concurrently during the fast-path acquire.

@EgorBo
Copy link
Member

EgorBo commented Mar 11, 2026

Doesn't lock itself has fast paths for that?

Use Interlocked.Add to apply a relative delta to m_currentCount rather
than writing back an absolute snapshot-derived value, so concurrent
lock-free decrements from the WaitAsync fast path are not overwritten.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment on lines +688 to +691
if (current > 0
&& Volatile.Read(ref m_asyncHead) is null
&& Volatile.Read(ref m_waitCount) == 0
&& Interlocked.CompareExchange(ref m_currentCount, current - 1, current) == current)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fast-path gating reads m_asyncHead and m_waitCount outside the lock, but both fields are updated under the lock without volatile/Volatile.Write semantics. These checks can legally observe stale values, letting the CAS decrement run while a Release/Wait* locked path is concurrently releasing waiters / updating counts, which can corrupt accounting (e.g., negative m_currentCount or too many waiters completed). Please make the “no waiters” signal reliable (e.g., make m_waitCount and m_asyncHead volatile and ensure all writes are volatile, or add a dedicated volatile/Interlocked state flag used by the fast path), and consider making in-lock decrements of m_currentCount CAS-based so they can’t be invalidated by a concurrent lock-free acquire.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_asyncHead doesn't need volatile writes. When m_asyncHead is non-null, the invariant guarantees m_currentCount == 0 — so the current > 0 guard independently blocks the CAS. A stale non-null read is conservatively safe (just an unnecessary slow-path trip). A stale null read when actually non-null can't cause a double-grant because m_currentCount would also be 0.

m_waitCount stale reads are a real concern, but the root fix isn't volatile writes — it's making the in-lock check-then-decrement atomic. Even with Volatile.Write on m_waitCount, a fast-path thread that read m_waitCount = 0 before WaitCore's write can still have its CAS execute between WaitCore's line 404 check and the decrement. Replaced both in-lock plain decrements (WaitCore and WaitAsyncCore slow path) with the same CAS-loop pattern used in the fast path. Both threads' operations now serialize on m_currentCount atomically — only one wins, eliminating the double-grant.

Replace plain --m_currentCount with a CAS loop to prevent a double grant
when the lock-free WaitAsync fast path decrements m_currentCount between
the > 0 check and the decrement in the slow path.

WaitCore is safe because m_waitCount++ on lock entry blocks the CAS guard
for its entire critical section. WaitAsyncCore has no such protection.
Apply the same CAS-loop pattern to WaitCore's m_currentCount decrement
that was applied to WaitAsyncCore in the previous commit. A fast-path
thread that read m_waitCount = 0 before WaitCore's m_waitCount++ can
still race with WaitCore's check-at-404 / decrement-at-407 sequence.
The CAS loop serializes both operations on m_currentCount atomically.
Copilot AI review requested due to automatic review settings March 11, 2026 19:36
@thomhurst
Copy link
Author

@EgorBot -intel -amd -arm

  using System.Threading;
  using System.Threading.Tasks;
  using BenchmarkDotNet.Attributes;

  [MemoryDiagnoser]
  public class SemaphoreSlimUncontended
  {
      private SemaphoreSlim _sem = new SemaphoreSlim(1, 1);

      [Benchmark]
      public async Task WaitAsync_Release()
      {
          await _sem.WaitAsync();
          _sem.Release();
      }
  }

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment on lines 401 to +408
// that synchronous waiter succeeds so that they have a chance to release.
Debug.Assert(!waitSuccessful || m_currentCount > 0,
"If the wait was successful, there should be count available.");
if (m_currentCount > 0)
// Use CAS rather than a plain decrement: the lock-free fast path in WaitAsync
// can decrement m_currentCount concurrently (it holds no lock).
int currentCount = m_currentCount;
while (currentCount > 0 && Interlocked.CompareExchange(ref m_currentCount, currentCount - 1, currentCount) != currentCount)
currentCount = m_currentCount;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Debug.Assert that assumes waitSuccessful implies m_currentCount > 0 is no longer valid now that WaitAsync can decrement m_currentCount concurrently without taking the lock. In Debug builds this can fire spuriously (e.g., WaitUntilCountOrTimeout observed a positive count, but the lock-free fast path acquired the last permit before the assertion executes). Consider removing this assert or rewriting it to assert something that remains true under concurrent lock-free decrements (e.g., assert only after the CAS-based acquire attempt succeeds).

Copilot uses AI. Check for mistakes.
Comment on lines +685 to +706
// Fast path: try a lock-free acquire; falls through to the lock if it fails.
// Skipped when m_waitHandle is non-null to keep its state consistent under the lock.
if (m_waitHandle is null)
{
int current = m_currentCount;
// The waiter checks are best-effort: a sync waiter incrementing m_waitCount inside
// the lock may not be visible yet, but the CAS will fail if the count has changed.
if (current > 0
&& Volatile.Read(ref m_asyncHead) is null
&& Volatile.Read(ref m_waitCount) == 0
&& Interlocked.CompareExchange(ref m_currentCount, current - 1, current) == current)
{
// Handle the rare race where AvailableWaitHandle was initialised concurrently.
if (current == 1 && m_waitHandle is not null)
{
lock (m_lockObjAndDisposed)
{
if (m_waitHandle is not null && m_currentCount == 0)
m_waitHandle.Reset();
}
}
return Task.FromResult(true);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a lock-free acquire path with special handling for the race where AvailableWaitHandle is initialized concurrently (the current == 1 && m_waitHandle is not null branch). There doesn't appear to be existing test coverage around SemaphoreSlim.AvailableWaitHandle state transitions under concurrent WaitAsync/initialization; adding a stress/regression test for this scenario would help prevent subtle signaling regressions (e.g., wait handle remaining signaled when the count reaches 0 via the fast path).

Copilot uses AI. Check for mistakes.
@thomhurst
Copy link
Author

@EgorBo I think just by not entering the lock we can save some time: EgorBot/Benchmarks#31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Threading community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants