Skip to content

perf(engine): reduce lock contention in scheduling and hook caches (#5686)#5693

Merged
thomhurst merged 2 commits intomainfrom
perf/5686-locks
Apr 24, 2026
Merged

perf(engine): reduce lock contention in scheduling and hook caches (#5686)#5693
thomhurst merged 2 commits intomainfrom
perf/5686-locks

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

Closes #5686.

CPU profile of ~1000-test runs showed Monitor.Enter_Slowpath at 4.33% exclusive and a 48.5% GetQueuedCompletionStatus idle cliff. Two structural issues addressed:

  • Per-key hook caches: BeforeHookTaskCache.GetOrCreateBeforeClassTask and AfterHookPairTracker.GetOrCreateAfterClassTask guarded all classes behind a single Lock, serialising unrelated classes. Replaced with the existing ThreadSafeDictionary<,>.GetOrAdd pattern (already used for GetOrCreateBeforeAssemblyTask), which internally uses Lazy<T> with ExecutionAndPublication and guarantees single-execution per key with no shared monitor. A lock-free TryGetValue fast path keeps cache-hit calls closure-allocation-free.
  • Bounded unlimited-parallelism path: TestScheduler.ExecuteTestsAsync called Parallel.ForEachAsync without MaxDegreeOfParallelism on the --maximum-parallel-tests-unset branch. Default is ProcessorCount but large test sets still queue near-simultaneous continuations and saturate IOCP. Added explicit MaxDegreeOfParallelism = Environment.ProcessorCount * 2, converging with the already-bounded limited path.

Item 3 in the issue (EventReceiverRegistry single-lock) was left for a follow-up as it's marked lower priority.

Test plan

  • dotnet build TUnit.Engine/TUnit.Engine.csproj across net8.0/net9.0/net10.0/netstandard2.0 — 0 warnings, 0 errors (no trimmer complaints on IL2077 annotation flow through the new closure).
  • TUnit.UnitTests (net10.0): 180/180 pass.
  • TUnit.Engine.Tests (net10.0): 245/247 pass; the 2 failures are pre-existing environmental (FSharp/VB integration tests require those sub-projects to be built locally and are unrelated to this change).
  • TUnit.TestProject/HookExecutorTests (net10.0): 5/5 pass — verifies Before/After class hooks still execute correctly through the new pattern.

…5686)

Three hot-path contention points observed in CPU profiles of ~1000-test
runs (Monitor.Enter_Slowpath at 4.33% exclusive, large
GetQueuedCompletionStatus idle stalls):

1. BeforeHookTaskCache.GetOrCreateBeforeClassTask and
   AfterHookPairTracker.GetOrCreateAfterClassTask guarded all classes
   behind one shared Lock, serialising unrelated classes. Replaced with
   the existing ThreadSafeDictionary<,>.GetOrAdd pattern (already used
   for GetOrCreateBeforeAssemblyTask), which internally uses Lazy<T> with
   ExecutionAndPublication to guarantee single-execution per key without
   a shared monitor. Keeps a lock-free TryGetValue fast path so the
   common cache-hit case doesn't allocate a closure.

2. TestScheduler.ExecuteTestsAsync unlimited branch called
   Parallel.ForEachAsync without MaxDegreeOfParallelism, which defaults
   to ProcessorCount but still queues near-simultaneous continuations on
   large test sets, saturating the IOCP thread. Explicit cap at
   Environment.ProcessorCount * 2 converges with the already-bounded
   limited-parallelism path.

EventReceiverRegistry single-lock (item 3 in the issue) left for a
follow-up as it's lower priority and the two above cover the bulk of
the observed contention.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 24, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics -7 complexity

Metric Results
Complexity -7

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Good performance-focused PR addressing real contention hotspots identified via CPU profiling. The hook cache change is solid. One behavioral concern worth discussing before merging.


TestScheduler.cs — Behavioral Regression: Unlimited < Default

Concern: UnlimitedPathMaxDop = ProcessorCount * 2 creates a counterintuitive inversion.

The default path (no --maximum-parallel-tests flag) returns ProcessorCount * 4 (line 492). But a user who explicitly opts for unlimited (--maximum-parallel-tests 0) now gets ProcessorCount * 2 — which is more restrictive than doing nothing. This is a silent regression for anyone who has opted in to unlimited parallelism.

// Default path (no flag): ProcessorCount * 4
var defaultLimit = Environment.ProcessorCount * 4;

// Explicit "unlimited" (--maximum-parallel-tests 0): ProcessorCount * 2  ← now more restrictive
private static readonly int UnlimitedPathMaxDop = Environment.ProcessorCount * 2;

The intent is correct (prevent IOCP saturation), but the cap should be at least as high as the default (ProcessorCount * 4), or ideally configurable. A better approach would be to unify the concept: if the goal is a sensible upper bound for async I/O-bound workloads, apply it consistently in GetMaxParallelism rather than as a second cap in ExecuteTestsAsync. For instance, mapping int.MaxValue to ProcessorCount * 4 in GetMaxParallelism itself would mean users see it in one place and ExecuteTestsAsync stays DOP-agnostic.

Incorrect comment: The current comment says "Parallel.ForEachAsync defaults to ProcessorCount" — this is wrong. When MaxDegreeOfParallelism is -1 (the .NET default), Parallel.ForEachAsync is truly unbounded, not capped at ProcessorCount. The actual problem is the unbounded depth, which the fix correctly addresses, but the comment should say "truly unlimited" not "defaults to ProcessorCount".


BeforeHookTaskCache / AfterHookPairTracker — Solid Improvement

Removing the shared _classLock and delegating to ThreadSafeDictionary.GetOrAdd is the right call. LazyThreadSafetyMode.ExecutionAndPublication inside ThreadSafeDictionary guarantees single-execution per key without serializing unrelated classes. The pattern is already established for the assembly-level tasks, so this is a welcome consistency fix.

The outer TryGetValue fast path before calling GetOrAdd is justified: ThreadSafeDictionary.GetOrAdd allocates a Lazy<T> instance before calling ConcurrentDictionary.GetOrAdd, so the outer check avoids that allocation plus the closure on the common cache-hit path. The comment accurately explains this.

One minor note: the closure passed to GetOrAdd captures cancellationToken from the calling thread. In a race where two threads both miss the TryGetValue fast path, the winning factory uses whichever thread's token got there first. This mirrors the old lock-based behaviour and is fine — just worth knowing if debugging a cancellation edge case.

The [UnconditionalSuppressMessage("Trimming", "IL2077")] suppressions are valid: the [DynamicallyAccessedMembers] annotation is present at the public method boundary, so the trimmer has the information it needs even though it doesn't flow into the lambda. The justification comments make this clear enough for future maintainers.


Summary

Area Assessment
Hook caches (Before/After class) ✅ Correct, clear, consistent with existing patterns
TestScheduler UnlimitedPathMaxDop ⚠️ Cap value creates "unlimited < default" inversion; comment is factually wrong
AOT / trimming annotations ✅ Suppressions are valid and well-justified

The hook cache change can merge as-is. I'd suggest revisiting the DOP cap value (or moving the cap into GetMaxParallelism) before this lands to avoid surprising users who opted into unlimited.

- Align unlimited path with default path: both now cap at
  ProcessorCount * 4 so opting into unlimited isn't a regression.
- Correct comment about Parallel.ForEachAsync default behaviour.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

Previous feedback has been fully addressed in the second commit. Here's the updated assessment.


Previous Concerns — All Resolved ✅

"Unlimited < Default" inversion: Fixed. The 0-means-unlimited case now maps to defaultLimit = ProcessorCount * 4 (same as the default path) via the refactored GetMaxParallelism. The separate UnlimitedPathMaxDop = ProcessorCount * 2 constant is gone.

Incorrect Parallel.ForEachAsync comment: Fixed. The comment now correctly states "truly unbounded" rather than "defaults to ProcessorCount".


Architecture After Second Commit

The simplification is good. ExecuteTestsAsync is now:

private Task ExecuteTestsAsync(AbstractExecutableTest[] tests, CancellationToken cancellationToken)
    => ExecuteWithGlobalLimitAsync(tests, _maxParallelismSemaphore.Value, cancellationToken);

Single code path, no conditional branching, no dead netstandard2.0 fallback path lingering. The async keyword removal is also correct — no need for a state machine when you're just forwarding a task.


One Remaining Consideration (Non-blocking)

True "unlimited" is no longer possible. Users who set --maximum-parallel-tests 0 or TUNIT_MAX_PARALLEL_TESTS=0 explicitly to opt into unlimited parallelism (e.g., network-bound test suites with high latency where more concurrency is beneficial) now silently get ProcessorCount * 4. The log message tells them this happened, which is good. But if any user genuinely needs beyond ProcessorCount * 4, the escape hatch is now to pass an explicit large number (e.g., --maximum-parallel-tests 64). Worth a brief mention in release notes or docs, but not a blocker.

The SemaphoreSlim is now always allocated (previously null for the unlimited case). The overhead of WaitAsync/Release at ProcessorCount * 4 slots is negligible in practice — it will rarely block on typical machines — so this trade-off is fine.


Hook Caches — No Changes Needed

The BeforeHookTaskCache and AfterHookPairTracker changes from the first commit are correct and clean. The per-key Lazy pattern via ThreadSafeDictionary is the right approach and consistent with the assembly-level tasks already in the codebase.


Summary

Area Status
DOP cap inversion (unlimited < default) ✅ Fixed
Incorrect ForEachAsync comment ✅ Fixed
Code path unification ✅ Cleaner than before
Hook caches (Before/After class) ✅ No change needed
True unlimited escape hatch gone ℹ️ Acknowledged tradeoff, not a blocker

This is in good shape. The profiling-driven changes are well-motivated and the second commit addressed all the structural concerns cleanly.

intellitect-bot pushed a commit to IntelliTect/EssentialCSharp.Web that referenced this pull request Apr 25, 2026
Updated [TUnit](https://github.com/thomhurst/TUnit) from 1.37.10 to
1.39.0.

<details>
<summary>Release notes</summary>

_Sourced from [TUnit's
releases](https://github.com/thomhurst/TUnit/releases)._

## 1.39.0

<!-- Release notes generated using configuration in .github/release.yml
at v1.39.0 -->

## What's Changed
### Other Changes
* perf(mocks): shrink MethodSetup + cache stateless matchers by
@​thomhurst in thomhurst/TUnit#5669
* fix(mocks): handle base classes with explicit interface impls (#​5673)
by @​thomhurst in thomhurst/TUnit#5674
* fix(mocks): implement indexer in generated mock (#​5676) by
@​thomhurst in thomhurst/TUnit#5683
* fix(mocks): disambiguate IEquatable<T>.Equals from object.Equals
(#​5675) by @​thomhurst in thomhurst/TUnit#5680
* fix(mocks): escape C# keyword identifiers at all emit sites (#​5679)
by @​thomhurst in thomhurst/TUnit#5684
* fix(mocks): emit [SetsRequiredMembers] on generated mock ctor (#​5678)
by @​thomhurst in thomhurst/TUnit#5682
* fix(mocks): skip MockBridge for class targets with static-abstract
interfaces (#​5677) by @​thomhurst in
thomhurst/TUnit#5681
* chore(mocks): regenerate source generator snapshots by @​thomhurst in
thomhurst/TUnit#5691
* perf(engine): collapse async state-machine layers on hot test path
(#​5687) by @​thomhurst in thomhurst/TUnit#5690
* perf(engine): reduce lock contention in scheduling and hook caches
(#​5686) by @​thomhurst in thomhurst/TUnit#5693
* fix(assertions): prevent implicit-to-string op from NREing on null
(#​5692) by @​thomhurst in thomhurst/TUnit#5696
* perf(engine/core): reduce per-test allocations (#​5688) by @​thomhurst
in thomhurst/TUnit#5694
* perf(engine): reduce message-bus contention on test start (#​5685) by
@​thomhurst in thomhurst/TUnit#5695
### Dependencies
* chore(deps): update tunit to 1.37.36 by @​thomhurst in
thomhurst/TUnit#5667
* chore(deps): update verify to 31.16.2 by @​thomhurst in
thomhurst/TUnit#5699


**Full Changelog**:
thomhurst/TUnit@v1.37.36...v1.39.0

## 1.37.36

<!-- Release notes generated using configuration in .github/release.yml
at v1.37.36 -->

## What's Changed
### Other Changes
* fix(telemetry): remove duplicate HTTP client spans by @​thomhurst in
thomhurst/TUnit#5668


**Full Changelog**:
thomhurst/TUnit@v1.37.35...v1.37.36

## 1.37.35

<!-- Release notes generated using configuration in .github/release.yml
at v1.37.35 -->

## What's Changed
### Other Changes
* Add TUnit.TestProject.Library to the TUnit.Dev.slnx solution file by
@​Zodt in thomhurst/TUnit#5655
* fix(aspire): preserve user-supplied OTLP endpoint (#​4818) by
@​thomhurst in thomhurst/TUnit#5665
* feat(aspire): emit client spans for HTTP by @​thomhurst in
thomhurst/TUnit#5666
### Dependencies
* chore(deps): update dependency dotnet-sdk to v10.0.203 by @​thomhurst
in thomhurst/TUnit#5656
* chore(deps): update microsoft.aspnetcore to 10.0.7 by @​thomhurst in
thomhurst/TUnit#5657
* chore(deps): update tunit to 1.37.24 by @​thomhurst in
thomhurst/TUnit#5659
* chore(deps): update microsoft.extensions to 10.0.7 by @​thomhurst in
thomhurst/TUnit#5658
* chore(deps): update aspire to 13.2.3 by @​thomhurst in
thomhurst/TUnit#5661
* chore(deps): update dependency microsoft.net.test.sdk to 18.5.0 by
@​thomhurst in thomhurst/TUnit#5664

## New Contributors
* @​Zodt made their first contribution in
thomhurst/TUnit#5655

**Full Changelog**:
thomhurst/TUnit@v1.37.24...v1.37.35

## 1.37.24

<!-- Release notes generated using configuration in .github/release.yml
at v1.37.24 -->

## What's Changed
### Other Changes
* docs: add Tluma Ask AI widget to Docusaurus site by @​thomhurst in
thomhurst/TUnit#5638
* Revert "chore(deps): update dependency docusaurus-plugin-llms to
^0.4.0 (#​5637)" by @​thomhurst in
thomhurst/TUnit#5640
* fix(asp-net): forward disposal in FlowSuppressingHostedService
(#​5651) by @​JohnVerheij in
thomhurst/TUnit#5652
### Dependencies
* chore(deps): update dependency docusaurus-plugin-llms to ^0.4.0 by
@​thomhurst in thomhurst/TUnit#5637
* chore(deps): update tunit to 1.37.10 by @​thomhurst in
thomhurst/TUnit#5639
* chore(deps): update opentelemetry to 1.15.3 by @​thomhurst in
thomhurst/TUnit#5645
* chore(deps): update opentelemetry by @​thomhurst in
thomhurst/TUnit#5647
* chore(deps): update dependency dompurify to v3.4.1 by @​thomhurst in
thomhurst/TUnit#5648
* chore(deps): update dependency system.commandline to 2.0.7 by
@​thomhurst in thomhurst/TUnit#5650
* chore(deps): update dependency microsoft.entityframeworkcore to 10.0.7
by @​thomhurst in thomhurst/TUnit#5649
* chore(deps): update dependency microsoft.templateengine.authoring.cli
to v10.0.203 by @​thomhurst in
thomhurst/TUnit#5653
* chore(deps): update dependency
microsoft.templateengine.authoring.templateverifier to 10.0.203 by
@​thomhurst in thomhurst/TUnit#5654


**Full Changelog**:
thomhurst/TUnit@v1.37.10...v1.37.24

Commits viewable in [compare
view](thomhurst/TUnit@v1.37.10...v1.39.0).
</details>

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=TUnit&package-manager=nuget&previous-version=1.37.10&new-version=1.39.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(engine): reduce lock contention in scheduling and hook caches

1 participant