Skip to content

perf: eliminate locks from mock invocation and verification hot paths#5422

Merged
thomhurst merged 6 commits intomainfrom
perf/mock-engine-lock-free-hot-path
Apr 7, 2026
Merged

perf: eliminate locks from mock invocation and verification hot paths#5422
thomhurst merged 6 commits intomainfrom
perf/mock-engine-lock-free-hot-path

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

  • Lock-free call recording: Replace the shared MockEngine.Lock with pre-allocated per-member CallRecordBuffer that uses lightweight per-buffer locks, eliminating contention between setup and invocation phases
  • Lock-free AddBehavior: Use Interlocked.CompareExchange for the common single-behavior setup case, avoiding Lock object allocation and lock acquisition entirely
  • Single-pass verification: Count and mark verified in one iteration instead of two, with zero-copy buffer access eliminating .ToArray() allocation on the happy path

These changes target the three mock benchmark categories where TUnit.Mocks currently loses to source-generated competitors:

Benchmark Gap vs Best Rival Root Cause Addressed
Callback 60% slower than Imposter Shared lock in StoreCallRecord + Lock alloc in AddBehavior
Setup 18% slower than Mockolate Lock acquisition in AddBehavior for single-behavior case
Verification 13% slower than Imposter Two-pass iteration + .ToArray() copy allocation

All 822 mock tests pass (761 runtime + 31 source-gen + 30 analyzer).

Test plan

  • TUnit.Mocks.Tests — 761 passed (includes concurrent mock access tests)
  • TUnit.Mocks.SourceGenerator.Tests — 31 passed
  • TUnit.Mocks.Analyzers.Tests — 30 passed
  • CI mock benchmark workflow to measure actual improvement

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 6, 2026

Not up to standards ⛔

🔴 Issues 3 minor

Alerts:
⚠ 3 issues (≤ 0 issues of at least minor severity)

Results:
3 new issues

Category Results
CodeStyle 3 minor

View in Codacy

🟢 Metrics 25 complexity

Metric Results
Complexity 25

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: perf/mock-engine-lock-free-hot-path

The motivation here is clear and the benchmark data is well-presented. The approach — per-buffer locks, CAS for single-behavior setup, and single-pass verification — is the right direction. However, there are several correctness issues that need addressing before merge.


🚨 Critical: Bounds mismatch between calls and counts arrays

File: MockEngine.csStoreCallRecord fast path

var calls  = _callsByMemberId;       // non-volatile read — could be generation N+1
var counts = _callCountByMemberId;   // non-volatile read — could be generation N

if (calls is not null && counts is not null && (uint)memberId < (uint)calls.Length)
{
    var buffer = calls[memberId];
    if (buffer is not null)
    {
        buffer.Add(record);
        Interlocked.Increment(ref counts[memberId]);  // OOB if counts is old array
        return record;
    }
}

The bounds check only guards against calls.Length, not counts.Length. EnsureCallArrayCapacity writes _callsByMemberId = newByMember then _callCountByMemberId = newCounts as two separate non-volatile stores. A racing reader could see the new (larger) calls array while still seeing the old (smaller) counts array. If memberId is in range of the new array but not the old, Interlocked.Increment(ref counts[memberId]) throws IndexOutOfRangeException.

Fix: Bundle both arrays into a single struct/record and replace them atomically:

private volatile CallArrays? _callArrays;  // single volatile reference swap

private sealed class CallArrays
{
    public readonly CallRecordBuffer?[] Calls;
    public readonly int[] Counts;
    public CallArrays(int size) { Calls = new CallRecordBuffer?[size]; Counts = new int[size]; }
}

This eliminates the two-array TOCTOU entirely — a single Volatile.Read gives you a consistent snapshot of both.


🚨 Critical: "Optimistic mark + unmark" corrupts previously-verified records

File: CallVerificationBuilder.csWasCalled

var matchingCount = CountAndMarkBuffer(buffer, markVerified: true);   // marks ALL matching

if (!times.Matches(matchingCount))
{
    CountAndMarkBuffer(buffer, markVerified: false);  // unmarks ALL matching — including ones
                                                      // verified by a PRIOR WasCalled call!
    throw ...;
}

CountAndMarkBuffer with markVerified: false unconditionally sets IsVerified = false on every matching record. If a record was already verified by a previous WasCalled assertion, the failed second assertion will un-verify it, causing VerifyNoOtherCalls() to report a false failure.

The original two-pass approach was correct: count first, then mark only if the count passes. Restoring that contract is the right fix here. The allocation of GetCallsFor that this avoids is only on the throw path anyway — the exception itself is far more expensive than an array allocation.

// Safe single-pass approach: count first, never unmark
var matchingCount = CountMatchingBuffer(buffer);

if (!times.Matches(matchingCount))
{
    throw new MockVerificationException(...);
}

// Only mark after successful assertion
MarkMatchingBuffer(buffer);

⚠️ Significant: CAS race in AddBehavior can leave behaviors split across both fields

File: MethodSetup.csAddBehavior

Consider this interleaving:

  1. Thread A: Volatile.Read(ref _behaviors) is nulltrue
  2. Thread B (in AddBehaviorSlow): promotes to list, Volatile.Write(ref _behaviors, [existing]), then Volatile.Write(ref _singleBehavior, null)
  3. Thread A: CAS on _singleBehavior with expected null — succeeds (B just cleared it!)
  4. Thread A returns. Now _singleBehavior is set AND _behaviors is non-null.

GetNextBehavior checks _singleBehavior first and returns it early, so the behaviors in _behaviors are silently ignored.

The fix is to re-check _behaviors after a successful CAS, or check both conditions atomically under the lock:

public void AddBehavior(IBehavior behavior)
{
    // CAS is only safe if _behaviors is still null after the exchange.
    // Re-read _behaviors after CAS to detect the race.
    if (Volatile.Read(ref _behaviors) is null &&
        Interlocked.CompareExchange(ref _singleBehavior, behavior, null) is null &&
        Volatile.Read(ref _behaviors) is null)  // double-check after CAS
    {
        return;
    }
    AddBehaviorSlow(behavior);
}

ℹ️ Minor: lock(this) in CallRecordBuffer

File: CallRecordBuffer.csAdd

lock(this) on internal sealed class is low-risk since no external code can grab the monitor, but it's a well-known antipattern. A private dedicated _lock object is the idiomatic replacement and costs one extra pointer. This is a style preference — not a blocker.


ℹ️ Minor: _callsByMemberId non-volatile field now accessed without lock

The field was always lock-protected before; the fast path now reads it without Volatile.Read. The JIT is allowed to hoist this load. Marking the field volatile (or using Volatile.Read) restores the happens-before guarantee that the previous lock provided for free.


What's good

  • The CallRecordBuffer design (pre-allocated, per-member lock, volatile count publish) is solid and the API surface is clean.
  • Splitting StoreCallRecord into fast/slow paths is idiomatic and [MethodImpl(MethodImplOptions.NoInlining)] on the slow path is the right call.
  • Replacing the shared MockEngine.Lock with per-buffer locks is the correct architectural move; the contention analysis in the PR description is accurate.
  • The CollectInto / ForEach zero-copy iteration API on CallRecordBuffer is a nice improvement over the previous AddRange(list) pattern.

The issues above are real correctness risks in concurrent test runs. The two critical ones (bounds mismatch and optimistic unmark) are straightforward to fix and the overall approach is worth landing once they're addressed.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 6, 2026

Code Review: perf: eliminate locks from mock invocation and verification hot paths

Overall this is a well-motivated performance PR with a clear benchmark-driven rationale. The design is largely sound, but there are several correctness and design concerns worth addressing before merge.


1. GetSnapshot() has a torn read hazard in CallRecordBuffer

The lock-free snapshot in GetSnapshot() reads _count and _items in two separate volatile reads:

internal (CallRecord?[] Items, int Count) GetSnapshot()
{
    var count = Volatile.Read(ref _count);
    var items = Volatile.Read(ref _items);
    return (items, count);
}

Between the two reads, a concurrent Grow() + Add() can replace _items with a larger array while _count has already been incremented past what the old array contains. The result would be a count that reflects the new array but items pointing to the old, shorter array — leading to an IndexOutOfRangeException or accessing stale null slots beyond the original array capacity.

Fix: Read _items first, then _count. A writer always increments _count last (inside the lock, after updating _items), so reading in that order gives a safe lower-bound count that is always consistent with the captured array reference:

internal (CallRecord?[] Items, int Count) GetSnapshot()
{
    var items = Volatile.Read(ref _items);   // read array reference first
    var count = Volatile.Read(ref _count);   // count is always <= items.Length at this point
    return (items, count);
}

2. AddBehavior CAS has a silent data loss window

The fast-path CAS in AddBehavior has a subtle hole:

if (Volatile.Read(ref _behaviors) is null
    && Interlocked.CompareExchange(ref _singleBehavior, behavior, null) is null)
{
    if (Volatile.Read(ref _behaviors) is null)
        return;   // early return — behavior "installed"
}
AddBehaviorSlow(behavior);

If a second concurrent AddBehavior call also wins the CAS (impossible for the same CAS, but consider the "lost CAS" case) — actually the larger problem is the double-check on _behaviors. When the first CAS succeeds and _behaviors is still null at the double-check, the method returns. But _singleBehavior is now set and _behaviors is null. Then a concurrent AddBehaviorSlow runs, reads _singleBehavior as null (not yet set because of CPU reordering — _singleBehavior is no longer volatile after this PR), initializes _behaviors to [], and adds the new behavior, losing the first behavior entirely.

Root cause: Removing volatile from _singleBehavior and _behaviors (lines 25–26) means the CAS in the fast path provides sequential consistency only for _singleBehavior, not for _behaviors. The Volatile.Read guards are correct but the non-volatile field declaration means the JIT / CPU is free to cache old values between the calls to Volatile.Read.

The fields must remain volatile, or all accesses must go through Volatile.Read/Volatile.Write (which the fast path CAS already does). Since Interlocked.CompareExchange on a non-volatile field is valid, the deeper issue is that AddBehaviorSlow reads _singleBehavior with Volatile.Read but the window between Volatile.Read(ref _behaviors) returning null and reading _singleBehavior is not atomic, so a concurrent CAS could succeed in between.

This is a narrow race but it is a real correctness issue for concurrent setup scenarios.


3. _callsByMemberId is no longer volatile but is read lock-free in several places

In the original code _callsByMemberId was implicitly guarded by Lock. The new code reads it without a lock in StoreCallRecord (fast path), GetCallsFor, GetCallBufferFor, MarkCallsVerified, GetAllCalls, GetUnverifiedCalls, GetDiagnostics, and CollectCallRecords. However the field is declared as:

private CallRecordBuffer?[]? _callsByMemberId;

There is no volatile keyword and no Volatile.Read when this field is read. In StoreCallRecord, the local copy is taken with a plain read:

var calls = _callsByMemberId;

Without volatile, the JIT is allowed to hoist this read out of loops or cache it in a register, meaning a freshly-allocated array written in StoreCallRecordSlow might never be visible on the fast path. _callCountByMemberId is volatile (correct), but _callsByMemberId is not. This should either be declared volatile or all reads should use Volatile.Read.


4. Optimistic mark-then-unmark in WasCalled has observable side effects

var matchingCount = CountAndMarkBuffer(buffer, markVerified: true);

if (!times.Matches(matchingCount))
{
    // Unmark on failure — rare path
    CountAndMarkBuffer(buffer, markVerified: false);
    ...
    throw new MockVerificationException(...);
}

The comment acknowledges the unmark-on-failure path. However, this creates a TOCTOU window: another concurrent verification that runs between the optimistic mark and the unmark will observe calls as already verified and skip them. In single-threaded test teardown this is unlikely, but VerifyAll and VerifyNoOtherCalls are often called from cleanup hooks that could run concurrently across tests. The previous two-pass approach (count first, mark second) was safer because it never left a partially-verified state visible.

If the performance win is important, consider separating the pass into count-only (no writes), then mark only on success — matching the old behavior but without the ToArray() allocation, by iterating the buffer directly.


5. GetAllCalls and GetUnverifiedCalls are no longer protected by a lock

These two methods previously held Lock for the duration of CollectCallRecords. The new code calls them lock-free. While individual buffer accesses via GetSnapshot() are safe (see issue #1 above), the outer loop over _callsByMemberId is not atomic:

foreach (var buffer in _callsByMemberId)
{
    buffer?.CollectInto(target, filter);
}

If EnsureCallArrayCapacity replaces _callsByMemberId with a new larger array during this loop (via StoreCallRecordSlow), the foreach is iterating over a snapshot of the old array reference (captured by the foreach enumerator), which is fine. But if a concurrent setup call triggers EnsureCallArrayCapacity, the new buffers for newly-added members will not appear in this snapshot. This is probably acceptable for a test framework, but should be documented explicitly.


Minor / Design suggestions

  • GetCallBufferFor is internal: exposing the mutable buffer directly to CallVerificationBuilder tightly couples the two types. A VerifyWithMatchers(IArgumentMatcher[], Times, ...) method on MockEngine would keep the buffer encapsulated.

  • Predicate<T>Func<T, bool> change in CollectCallRecords: functionally identical but inconsistent with other uses of Predicate<T> in the codebase. Not a blocker, but pick one and be consistent.

  • CallRecordBuffer initial capacity of 4: Fine for most cases. Worth a note that this is a trade-off — lower capacity means more Grow() calls for heavily-exercised members.


Summary

Issue Severity
GetSnapshot() torn read (_count before _items) High — can cause OOBE or stale null slots
_callsByMemberId missing volatile on lock-free reads High — can cause invisible writes on some JIT/CPU combos
_singleBehavior/_behaviors removed volatile keyword Medium — CAS provides ordering for those fields but dependent reads in slow path may not see updates
Optimistic mark-then-unmark in WasCalled Low — narrow race in concurrent teardown
GetAllCalls/GetUnverifiedCalls lock removal Low — acceptable with a documentation note

The performance direction is correct. Fixing the GetSnapshot() read order and adding volatile back to _callsByMemberId (or using Volatile.Read consistently) would make this safe to merge.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: perf/mock-engine-lock-free-hot-path

Summary

This PR replaces TUnit.Mocks' shared MockEngine.Lock in the invocation hot path with three complementary lock-free strategies:

  1. CallArrays wrapper — Bundles CallRecordBuffer?[] and int[] into a single volatile object reference so both arrays are swapped atomically, eliminating the TOCTOU bug from the previous separate volatile array fields.
  2. CallRecordBuffer — Per-member append-only buffer with its own small _syncRoot. Writers lock only their buffer; readers take a lock-free snapshot via Volatile.Read.
  3. AddBehavior CAS fast path — Uses Interlocked.CompareExchange to avoid allocating and acquiring a Lock object when only a single behavior is registered (the common case).

The prior review comment about the two-array TOCTOU bug was correctly diagnosed, and this PR directly resolves it with the CallArrays wrapper. That fix is solid.


Positive Aspects

CallArrays atomic swap is the right fix. Bundling both arrays into one volatile reference is the textbook pattern for this problem. Since EnsureCallArrayCapacity is always called under Lock, the swap itself is safe, and readers outside the lock get a consistent generation.

CallRecordBuffer.GetSnapshot memory ordering intent is sound. The code attempts to exploit the "writer stores element before incrementing count" invariant, which is a well-known lock-free pattern.

[MethodImpl(MethodImplOptions.NoInlining)] on slow paths keeps hot-path JIT output cleaner and prevents lock-allocation code from being inlined.

Removal of .ToArray() in CallVerificationBuilder on the happy path (when count matches) is a genuine allocation win for the "test passes" case.


Issues Found

1. GetSnapshot read order is inverted — potential torn read on Grow (Medium)

File: TUnit.Mocks/Verification/CallRecordBuffer.cs, lines 68–70

var items = Volatile.Read(ref _items);   // reads items FIRST
var count = Volatile.Read(ref _count);   // reads count SECOND

The comment says "Read items first, then count. A writer always updates items before incrementing count." However, consider the Grow path:

  1. Grow() sets _items = newItems (the new, larger array)
  2. Add writes items[count] = record
  3. Add increments _count

A reader that snaps _items (pre-grow, smaller array) and then snaps _count (post-grow, larger count) can produce a snapshot where count > items.Length. Passing this to Array.Copy(items, result, count) will throw ArgumentException or silently read stale/garbage entries.

The safe pattern for this classic "array + count" lock-free snapshot is the reverse: read count first, then items. A reader that sees count == N is guaranteed to see an _items that was at least length N at that point.

Alternatively, use Volatile.Write(ref _items, newItems) inside Grow() to publish the new array with a store-release fence before any subsequent count increment is visible.

// Safe: read count first, then items
var count = Volatile.Read(ref _count);
var items = Volatile.Read(ref _items);
return (items, count);

2. CAS fast path in AddBehavior has a double-dispatch window (Low)

File: TUnit.Mocks/Setup/MethodSetup.cs, lines 76–87

if (Volatile.Read(ref _behaviors) is null                    // (A)
    && Interlocked.CompareExchange(ref _singleBehavior, behavior, null) is null)  // (B)
{
    if (Volatile.Read(ref _behaviors) is null)               // (C)
        return;
}
AddBehaviorSlow(behavior);

Consider this interleaving during concurrent setup:

  • Thread 2 runs AddBehaviorSlow fully: promotes to list, sets _behaviors = [existing], clears _singleBehavior.
  • Thread 1 passes (A) (saw null), CAS (B) succeeds, but (C) now sees _behaviors != null.
  • Thread 1 calls AddBehaviorSlow and adds behavior to the list.

BUT between (B) and (C), _singleBehavior = behavior is briefly visible to GetNextBehavior, which reads it without a lock. A concurrent invocation thread could dispatch behavior during this window, then dispatch it again once it's in _behaviors. In practice, mock setup is usually single-threaded before invocations, but the invariant is wrong. Additionally, AddBehaviorSlow captures _singleBehavior to seed the list — if this CAS value was already captured, behavior could appear twice.

3. CallRecordBuffer uses object lock instead of Lock — inconsistent (Minor)

File: TUnit.Mocks/Verification/CallRecordBuffer.cs, line 16

private readonly object _syncRoot = new();

MockEngine uses C# 13's System.Threading.Lock type, which provides better diagnostics and a faster implementation on .NET 9+. CallRecordBuffer's per-buffer lock is acquired on every write (the hot path), so this is the most impactful place to use Lock. Since the project targets net8.0;net9.0;net10.0, changing to Lock would be consistent and deliver additional benefit on newer runtimes.

4. Duplicate <summary> XML doc in CallRecordBuffer.cs (Minor)

File: TUnit.Mocks/Verification/CallRecordBuffer.cs, lines 56–64

/// <summary>
/// Returns the current items array and count for lock-free iteration.
/// ...
/// </summary>
/// <summary>                        ← invalid; silently ignored by XML doc processor
/// Read items first, then count. ...
/// </summary>

The second <summary> block should be a <remarks> element.

5. _matchers.Length == 0 guard in CountMatchingBuffer is always false at that call site (Cosmetic)

File: TUnit.Mocks/Verification/CallVerificationBuilder.cs, line 117

The _matchers.Length == 0 fast-exit inside the loop is only reached from the else branch of WasCalled, which already handles the zero-matchers case via GetCallCountFor. The condition is always false here, adding cognitive overhead. The guard can be removed or hoisted outside the loop.


Summary Table

# Severity Location Issue
1 Medium CallRecordBuffer.cs:68-70 GetSnapshot reads _items before _count; concurrent Grow can produce torn snapshot with pre-grow array and post-grow count
2 Low MethodSetup.cs:76-87 CAS fast path window where _singleBehavior is set but _behaviors is already a list; concurrent reader could double-dispatch
3 Minor CallRecordBuffer.cs:16 Uses object lock instead of Lock (C# 13), inconsistent with rest of codebase
4 Minor CallRecordBuffer.cs:56-64 Duplicate <summary> tags; second should be <remarks>
5 Cosmetic CallVerificationBuilder.cs:117 _matchers.Length == 0 check in loop is always false at this call site

Recommendation

The core architectural change (atomic CallArrays swap, per-buffer locking, lock-free single-behavior CAS) is sound and a meaningful improvement over the previous lock-everywhere approach. Issue #1 (GetSnapshot read order) is the only one that affects memory safety correctness and should be fixed before merge — the Grow path creates a subtle torn-read hazard. Issues #2 and #3 are worth addressing. The rest are cosmetic.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: perf/mock-engine-lock-free-hot-path

Thank you for the four follow-up commits. Most of the previously raised issues have been addressed correctly. Here is a status summary and the two remaining issues.


Issues Addressed Since Prior Reviews

Prior Issue Status
Two-array TOCTOU (_callsByMemberId / _callCountByMemberId) Fixed — bundled into CallArrays with a single volatile reference swap
Optimistic mark-then-unmark corrupting prior verifications Fixed — restored count-first, mark-on-success pattern
GetSnapshot() torn read (count before items) Fixed — now reads _items before _count, consistent with writer ordering
lock(this) in CallRecordBuffer Fixed — replaced with private _syncRoot field
CAS race in AddBehavior when _behaviors is promoted concurrently Fixed — double-check after CAS (Volatile.Read(ref _behaviors) is null on re-entry)
_callsByMemberId non-volatile fast-path read Fixed — consolidated into CallArrays which is volatile
Duplicate <summary> XML tag in CallRecordBuffer Not fixed — still present (minor, see below)

Remaining Issue 1: GetSnapshot() read order is still the Grow-unsafe order (Medium)

File: TUnit.Mocks/Verification/CallRecordBuffer.cs, lines 68–74

The comment says:

"Read items first, then count. A writer always updates items before incrementing count (inside the lock), so this order guarantees count <= items.Length."

The code matches this intent:

var items = Volatile.Read(ref _items);
var count = Volatile.Read(ref _count);

However, the Grow path undermines the guarantee. Inside Add (under the lock):

  1. Grow() sets _items = newItems (the new, larger array)
  2. items[count] = record writes into newItems
  3. _count = count + 1

A reader that snapshots _items before Grow sets it (so it gets the old, shorter array), then snapshots _count after step 3, can observe count > items.Length. The Array.Copy(items, result, count) in ToArray() will then throw ArgumentException.

The safe order here is the reverse: read count first, then items. Because the writer increments _count only after writing into _items (whether grown or not), a reader that sees count == N is guaranteed that an _items array of at least length N was published before that count was visible.

Note that the previous review already flagged this exact scenario and the fix that was applied reads items first — which is the unsafe ordering for the Grow path. Please swap the two Volatile.Read lines:

internal (CallRecord?[] Items, int Count) GetSnapshot()
{
    var count = Volatile.Read(ref _count);   // read count first
    var items = Volatile.Read(ref _items);   // items.Length is always >= count at this point
    return (items, count);
}

Remaining Issue 2: Duplicate <summary> XML doc tag (Minor)

File: TUnit.Mocks/Verification/CallRecordBuffer.cs, lines 56–64

/// <summary>
/// Returns the current items array and count for lock-free iteration.
/// ...
/// </summary>
/// <summary>                      ← second <summary> is invalid XML doc
/// Read items first, then count...
/// </summary>

The second <summary> is silently discarded by the XML doc processor and will generate a compiler warning (CS1570/CS1591 depending on settings). The second block should be <remarks>:

/// <summary>
/// Returns the current items array and count for lock-free iteration.
/// The returned array and count form a consistent snapshot: all items at
/// indices 0..count-1 are guaranteed to be non-null.
/// </summary>
/// <remarks>
/// Read count first, then items. A writer always updates items before incrementing
/// count (inside the lock), so this order guarantees count &lt;= items.Length.
/// </remarks>

Remaining Issue 3: _matchers.Length == 0 guard in CountMatchingBuffer is dead code (Cosmetic)

File: TUnit.Mocks/Verification/CallVerificationBuilder.cs, line 117

if (_matchers.Length == 0 || MatchesArguments(record.Arguments))

CountMatchingBuffer is only reached from the else branch of WasCalled — the _matchers.Length == 0 fast path exits earlier via GetCallCountFor. Inside this loop the guard is always false, adding cognitive overhead without effect. The condition can be dropped:

if (MatchesArguments(record.Arguments))

The same applies to the identical guard in MarkMatchingBuffer.


What Looks Good

The core design is clean and well-structured:

  • CallArrays atomic swap correctly eliminates all TOCTOU races between the two old separate array fields.
  • CallRecordBuffer pre-allocation during setup ensures the invocation hot path never locks or allocates.
  • The AddBehavior CAS fast path with double-check is correctly reasoned and documented.
  • The non-volatile field rationale (CS0420 with Interlocked.CompareExchange) is sound and the <remarks> explaining it are helpful.
  • The CollectCallRecords snapshot-semantics documentation is appropriate: concurrent capacity growth during verification is documented as acceptable.
  • Restoring count-first, mark-on-success in CallVerificationBuilder eliminates the partially-verified-state bug entirely.

Issue 1 (GetSnapshot read order) is the only remaining correctness concern. It is the exact same issue as was raised in the second review pass — the fix that was applied is unfortunately the wrong ordering for the Grow path. The other two items are cosmetic.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: perf/mock-engine-lock-free-hot-path (latest commit: 9e56739)

This is a follow-up to the three prior review passes. The latest commit addresses all remaining flagged issues correctly.


All Previously Raised Issues — Now Resolved

Prior Issue Status
Two-array TOCTOU (_callsByMemberId / _callCountByMemberId) FixedCallArrays atomic swap
Optimistic mark-then-unmark corrupting prior verifications Fixed — count-first, mark-on-success
GetSnapshot() read order (items before count — unsafe for Grow path) Fixed — now reads count first, then items
Duplicate <summary> XML tag Fixed — second block changed to <remarks>
Dead _matchers.Length == 0 guard in CountMatchingBuffer / MarkMatchingBuffer Fixed — removed from both methods
CAS race in AddBehavior when _behaviors is promoted concurrently Fixed — double-check after CAS
_callsByMemberId non-volatile fast-path read Fixed — consolidated into volatile CallArrays
Duplicate lock(this) antipattern in CallRecordBuffer Fixed — private _syncRoot field

One Remaining Minor Point

CallRecordBuffer._syncRoot uses object instead of System.Threading.Lock

CallRecordBuffer.Add is called on the hot invocation path. The second review pass flagged that MockEngine uses the C# 13 System.Threading.Lock type throughout, but CallRecordBuffer uses a plain object for its per-buffer lock:

private readonly object _syncRoot = new();

On .NET 9+, System.Threading.Lock avoids the kernel transition and provides better tooling diagnostics. Since the project multi-targets net8.0;net9.0;net10.0 and CallRecordBuffer is the new hot-path lock, this is worth changing:

private readonly Lock _syncRoot = new();

No lock statement changes needed — C# 13's lock keyword handles both object and Lock transparently. This was raised in the prior review and is the only remaining item.


What Looks Good

The GetSnapshot fix is correct. The writer inside the lock always increments _count after writing into _items (directly or via Grow). The lock's release fence ensures both writes are visible to any subsequent Volatile.Read. A reader that snapshots count first sees value N, then reads _items — which must have been at least length N at the time count was published. The documentation in <remarks> now accurately describes this invariant.

The overall design is clean and ready to merge pending the Lock consistency fix.

Replace the shared MockEngine.Lock with per-member lock-free call recording
and lock-free verification reads. This targets the three benchmark categories
where TUnit.Mocks loses to source-generated competitors (Imposter, Mockolate):

**Callback (60% slower than Imposter)**
- StoreCallRecord no longer acquires MockEngine.Lock on the hot path
- CallRecordBuffer uses a lightweight per-buffer lock instead of the shared
  engine lock, eliminating contention between setup and invocation
- Buffers are pre-created during setup so invocations never allocate or
  initialize under lock

**Setup (18% slower than Mockolate)**
- AddBehavior uses Interlocked.CompareExchange for the common single-behavior
  case, avoiding Lock object allocation and lock acquisition entirely

**Verification (13% slower than Imposter)**
- Single-pass verification: count and mark verified in one iteration instead
  of two. Only unmarks on failure (rare path).
- GetCallBufferFor returns the internal buffer directly, eliminating the
  .ToArray() copy allocation on the verification happy path
- GetCallsFor, MarkCallsVerified, GetAllCalls, GetUnverifiedCalls are now
  lock-free reads

All 822 mock tests pass (761 + 31 + 30).
- Replace lock(this) with private _syncRoot object in CallRecordBuffer
- Add GetSnapshot() to cache array reference for tight iteration loops,
  eliminating N volatile reads in CountAndMarkBuffer
- Remove misleading AggressiveInlining on Add (JIT can't inline lock bodies)
- Add double-check after CAS in AddBehavior to prevent race with concurrent
  AddBehaviorSlow on the same MethodSetup
- Fix inaccurate "single-writer"/"lock-free" comment on _callsByMemberId
- Remove unnecessary WHAT-style doc comments
… restore count-first verification

- Bundle _callsByMemberId/_callCountByMemberId into single volatile CallArrays
  reference to eliminate TOCTOU race between the two fields
- Fix GetSnapshot() read order: items before count ensures count <= items.Length
  after Grow() swaps the array
- Restore count-first, mark-on-success verification pattern to avoid corrupting
  prior verifications on assertion failure
…hot safety

- Document why _singleBehavior/_behaviors aren't volatile (CS0420 with
  Interlocked.CompareExchange) and that all accesses use Volatile.Read/Write
- Document CAS full-barrier safety in AddBehavior fast path
- Document CollectCallRecords snapshot semantics for concurrent capacity growth
- GetSnapshot: read count before items (count-first is safe because writer
  updates items/Grow before incrementing count — seeing new count guarantees
  items.Length >= count)
- Fix duplicate <summary> tag → <remarks>
- Remove dead _matchers.Length == 0 guard in CountMatchingBuffer/MarkMatchingBuffer
  (already handled by the fast path in WasCalled)
Consistent with MockEngine; faster on .NET 9+ and better diagnostics.
@thomhurst thomhurst force-pushed the perf/mock-engine-lock-free-hot-path branch from 9e56739 to 51fb8b6 Compare April 7, 2026 18:59
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: perf/mock-engine-lock-free-hot-path (final pass)

This review picks up from the four previous passes. All issues identified in prior rounds have been addressed in the current code. Here is the final status.


All Previously Raised Issues — Now Resolved

Prior Issue Status
Two-array TOCTOU (_callsByMemberId / _callCountByMemberId) FixedCallArrays atomic swap
Optimistic mark-then-unmark corrupting prior verifications Fixed — count-first, mark-on-success
GetSnapshot() read order (items before count — unsafe for Grow path) Fixed — reads count first, then items; <remarks> accurately documents the invariant
Duplicate <summary> XML tag Fixed — second block is now <remarks>
Dead _matchers.Length == 0 guard in CountMatchingBuffer / MarkMatchingBuffer Fixed — removed from both methods
CAS race in AddBehavior when _behaviors is promoted concurrently Fixed — double-check after CAS, falls through to AddBehaviorSlow which clears the CAS-set _singleBehavior
_callsByMemberId non-volatile fast-path read Fixed — consolidated into volatile CallArrays
lock(this) antipattern in CallRecordBuffer Fixed — private _syncRoot field
CallRecordBuffer._syncRoot using object instead of System.Threading.Lock Fixed — now private readonly Lock _syncRoot = new()

Code Correctness Verification

GetSnapshot read order is now correct. Reads _count first (line 69), then _items (line 70). The writer inside the lock always increments _count after writing into _items (whether via direct write or Grow), and the lock's release fence ensures both stores are visible before any subsequent Volatile.Read. A reader that sees count == N is therefore guaranteed to see an _items with Length >= N. The <remarks> accurately documents this invariant.

AddBehaviorSlow correctly handles the CAS-then-fall-through case. When the fast-path CAS wins _singleBehavior but the double-check reveals _behaviors != null (a concurrent promotion happened), AddBehaviorSlow is called. Inside the lock it sees _behaviors != null, skips the promote-to-list step, calls _behaviors!.Add(behavior), and then Volatile.Write(ref _singleBehavior, null) — which correctly clears the CAS value that was transiently set. The behavior is not lost and does not appear twice.

CallArrays atomic swap eliminates all TOCTOU. EnsureCallArrayCapacity is always called under Lock, so the swap is safe. All lock-free readers take a single volatile read of _callArrays and get a consistent generation of both Buffers and Counts.


One Very Minor Observation (not a blocker)

AddBehavior window where _singleBehavior is transiently visible before the double-check.

When the fast-path CAS succeeds (line 84) but _behaviors is already non-null (detected at line 88), there is a brief window between lines 84 and 88 where _singleBehavior is set and _behaviors is also non-null. A concurrent GetNextBehavior reading in this window would return _singleBehavior (the fast path) and not look at _behaviors. In practice, setup is nearly always single-threaded and concurrent invocations before setup is complete are unsupported usage, so this is an academic concern rather than a practical one. The double-check correctly reconciles the state before returning, and the subsequent AddBehaviorSlow clears _singleBehavior. Documenting the "setup is not thread-safe" contract at the AddBehavior level (if not already documented elsewhere) would be the appropriate response — no code change required.


Summary

This PR is well-crafted. The motivation is benchmark-driven, the design decisions are documented with accurate reasoning, all identified correctness issues from prior reviews have been addressed, and the final implementation is consistent and clean. The CallArrays atomic swap, per-buffer Lock, correct GetSnapshot ordering, and count-first-mark-on-success verification pattern together form a coherent and safe lock-free design for the hot paths.

Ready to merge.

@thomhurst thomhurst merged commit 630182a into main Apr 7, 2026
14 of 15 checks passed
@thomhurst thomhurst deleted the perf/mock-engine-lock-free-hot-path branch April 7, 2026 20:23
@claude claude bot mentioned this pull request Apr 9, 2026
1 task
intellitect-bot pushed a commit to IntelliTect/EssentialCSharp.Web that referenced this pull request Apr 9, 2026
Updated [TUnit](https://github.com/thomhurst/TUnit) from 1.29.0 to
1.30.0.

<details>
<summary>Release notes</summary>

_Sourced from [TUnit's
releases](https://github.com/thomhurst/TUnit/releases)._

## 1.30.0

<!-- Release notes generated using configuration in .github/release.yml
at v1.30.0 -->

## What's Changed
### Other Changes
* perf: eliminate locks from mock invocation and verification hot paths
by @​thomhurst in thomhurst/TUnit#5422
* feat: TUnit0074 analyzer for redundant hook attributes on overrides by
@​thomhurst in thomhurst/TUnit#5459
* fix(mocks): respect generic type argument accessibility (#​5453) by
@​thomhurst in thomhurst/TUnit#5460
* fix(mocks): skip inaccessible internal accessors when mocking
Azure.Response by @​thomhurst in
thomhurst/TUnit#5461
* fix: apply CultureAttribute and STAThreadExecutorAttribute to hooks
(#​5452) by @​thomhurst in thomhurst/TUnit#5463
### Dependencies
* chore(deps): update tunit to 1.29.0 by @​thomhurst in
thomhurst/TUnit#5446
* chore(deps): update react to ^19.2.5 by @​thomhurst in
thomhurst/TUnit#5457
* chore(deps): update opentelemetry to 1.15.2 by @​thomhurst in
thomhurst/TUnit#5456
* chore(deps): update dependency qs to v6.15.1 by @​thomhurst in
thomhurst/TUnit#5458


**Full Changelog**:
thomhurst/TUnit@v1.29.0...v1.30.0

Commits viewable in [compare
view](thomhurst/TUnit@v1.29.0...v1.30.0).
</details>

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=TUnit&package-manager=nuget&previous-version=1.29.0&new-version=1.30.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant