perf(engine): reduce lock contention in scheduling and hook caches (#5686)#5693
perf(engine): reduce lock contention in scheduling and hook caches (#5686)#5693
Conversation
…5686) Three hot-path contention points observed in CPU profiles of ~1000-test runs (Monitor.Enter_Slowpath at 4.33% exclusive, large GetQueuedCompletionStatus idle stalls): 1. BeforeHookTaskCache.GetOrCreateBeforeClassTask and AfterHookPairTracker.GetOrCreateAfterClassTask guarded all classes behind one shared Lock, serialising unrelated classes. Replaced with the existing ThreadSafeDictionary<,>.GetOrAdd pattern (already used for GetOrCreateBeforeAssemblyTask), which internally uses Lazy<T> with ExecutionAndPublication to guarantee single-execution per key without a shared monitor. Keeps a lock-free TryGetValue fast path so the common cache-hit case doesn't allocate a closure. 2. TestScheduler.ExecuteTestsAsync unlimited branch called Parallel.ForEachAsync without MaxDegreeOfParallelism, which defaults to ProcessorCount but still queues near-simultaneous continuations on large test sets, saturating the IOCP thread. Explicit cap at Environment.ProcessorCount * 2 converges with the already-bounded limited-parallelism path. EventReceiverRegistry single-lock (item 3 in the issue) left for a follow-up as it's lower priority and the two above cover the bulk of the observed contention.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | -7 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
Good performance-focused PR addressing real contention hotspots identified via CPU profiling. The hook cache change is solid. One behavioral concern worth discussing before merging.
TestScheduler.cs — Behavioral Regression: Unlimited < Default
Concern: UnlimitedPathMaxDop = ProcessorCount * 2 creates a counterintuitive inversion.
The default path (no --maximum-parallel-tests flag) returns ProcessorCount * 4 (line 492). But a user who explicitly opts for unlimited (--maximum-parallel-tests 0) now gets ProcessorCount * 2 — which is more restrictive than doing nothing. This is a silent regression for anyone who has opted in to unlimited parallelism.
// Default path (no flag): ProcessorCount * 4
var defaultLimit = Environment.ProcessorCount * 4;
// Explicit "unlimited" (--maximum-parallel-tests 0): ProcessorCount * 2 ← now more restrictive
private static readonly int UnlimitedPathMaxDop = Environment.ProcessorCount * 2;The intent is correct (prevent IOCP saturation), but the cap should be at least as high as the default (ProcessorCount * 4), or ideally configurable. A better approach would be to unify the concept: if the goal is a sensible upper bound for async I/O-bound workloads, apply it consistently in GetMaxParallelism rather than as a second cap in ExecuteTestsAsync. For instance, mapping int.MaxValue to ProcessorCount * 4 in GetMaxParallelism itself would mean users see it in one place and ExecuteTestsAsync stays DOP-agnostic.
Incorrect comment: The current comment says "Parallel.ForEachAsync defaults to ProcessorCount" — this is wrong. When MaxDegreeOfParallelism is -1 (the .NET default), Parallel.ForEachAsync is truly unbounded, not capped at ProcessorCount. The actual problem is the unbounded depth, which the fix correctly addresses, but the comment should say "truly unlimited" not "defaults to ProcessorCount".
BeforeHookTaskCache / AfterHookPairTracker — Solid Improvement
Removing the shared _classLock and delegating to ThreadSafeDictionary.GetOrAdd is the right call. LazyThreadSafetyMode.ExecutionAndPublication inside ThreadSafeDictionary guarantees single-execution per key without serializing unrelated classes. The pattern is already established for the assembly-level tasks, so this is a welcome consistency fix.
The outer TryGetValue fast path before calling GetOrAdd is justified: ThreadSafeDictionary.GetOrAdd allocates a Lazy<T> instance before calling ConcurrentDictionary.GetOrAdd, so the outer check avoids that allocation plus the closure on the common cache-hit path. The comment accurately explains this.
One minor note: the closure passed to GetOrAdd captures cancellationToken from the calling thread. In a race where two threads both miss the TryGetValue fast path, the winning factory uses whichever thread's token got there first. This mirrors the old lock-based behaviour and is fine — just worth knowing if debugging a cancellation edge case.
The [UnconditionalSuppressMessage("Trimming", "IL2077")] suppressions are valid: the [DynamicallyAccessedMembers] annotation is present at the public method boundary, so the trimmer has the information it needs even though it doesn't flow into the lambda. The justification comments make this clear enough for future maintainers.
Summary
| Area | Assessment |
|---|---|
| Hook caches (Before/After class) | ✅ Correct, clear, consistent with existing patterns |
TestScheduler UnlimitedPathMaxDop |
|
| AOT / trimming annotations | ✅ Suppressions are valid and well-justified |
The hook cache change can merge as-is. I'd suggest revisiting the DOP cap value (or moving the cap into GetMaxParallelism) before this lands to avoid surprising users who opted into unlimited.
- Align unlimited path with default path: both now cap at ProcessorCount * 4 so opting into unlimited isn't a regression. - Correct comment about Parallel.ForEachAsync default behaviour.
There was a problem hiding this comment.
Follow-up Review
Previous feedback has been fully addressed in the second commit. Here's the updated assessment.
Previous Concerns — All Resolved ✅
"Unlimited < Default" inversion: Fixed. The 0-means-unlimited case now maps to defaultLimit = ProcessorCount * 4 (same as the default path) via the refactored GetMaxParallelism. The separate UnlimitedPathMaxDop = ProcessorCount * 2 constant is gone.
Incorrect Parallel.ForEachAsync comment: Fixed. The comment now correctly states "truly unbounded" rather than "defaults to ProcessorCount".
Architecture After Second Commit
The simplification is good. ExecuteTestsAsync is now:
private Task ExecuteTestsAsync(AbstractExecutableTest[] tests, CancellationToken cancellationToken)
=> ExecuteWithGlobalLimitAsync(tests, _maxParallelismSemaphore.Value, cancellationToken);Single code path, no conditional branching, no dead netstandard2.0 fallback path lingering. The async keyword removal is also correct — no need for a state machine when you're just forwarding a task.
One Remaining Consideration (Non-blocking)
True "unlimited" is no longer possible. Users who set --maximum-parallel-tests 0 or TUNIT_MAX_PARALLEL_TESTS=0 explicitly to opt into unlimited parallelism (e.g., network-bound test suites with high latency where more concurrency is beneficial) now silently get ProcessorCount * 4. The log message tells them this happened, which is good. But if any user genuinely needs beyond ProcessorCount * 4, the escape hatch is now to pass an explicit large number (e.g., --maximum-parallel-tests 64). Worth a brief mention in release notes or docs, but not a blocker.
The SemaphoreSlim is now always allocated (previously null for the unlimited case). The overhead of WaitAsync/Release at ProcessorCount * 4 slots is negligible in practice — it will rarely block on typical machines — so this trade-off is fine.
Hook Caches — No Changes Needed
The BeforeHookTaskCache and AfterHookPairTracker changes from the first commit are correct and clean. The per-key Lazy pattern via ThreadSafeDictionary is the right approach and consistent with the assembly-level tasks already in the codebase.
Summary
| Area | Status |
|---|---|
| DOP cap inversion (unlimited < default) | ✅ Fixed |
Incorrect ForEachAsync comment |
✅ Fixed |
| Code path unification | ✅ Cleaner than before |
| Hook caches (Before/After class) | ✅ No change needed |
| True unlimited escape hatch gone | ℹ️ Acknowledged tradeoff, not a blocker |
This is in good shape. The profiling-driven changes are well-motivated and the second commit addressed all the structural concerns cleanly.
Updated [TUnit](https://github.com/thomhurst/TUnit) from 1.37.10 to 1.39.0. <details> <summary>Release notes</summary> _Sourced from [TUnit's releases](https://github.com/thomhurst/TUnit/releases)._ ## 1.39.0 <!-- Release notes generated using configuration in .github/release.yml at v1.39.0 --> ## What's Changed ### Other Changes * perf(mocks): shrink MethodSetup + cache stateless matchers by @thomhurst in thomhurst/TUnit#5669 * fix(mocks): handle base classes with explicit interface impls (#5673) by @thomhurst in thomhurst/TUnit#5674 * fix(mocks): implement indexer in generated mock (#5676) by @thomhurst in thomhurst/TUnit#5683 * fix(mocks): disambiguate IEquatable<T>.Equals from object.Equals (#5675) by @thomhurst in thomhurst/TUnit#5680 * fix(mocks): escape C# keyword identifiers at all emit sites (#5679) by @thomhurst in thomhurst/TUnit#5684 * fix(mocks): emit [SetsRequiredMembers] on generated mock ctor (#5678) by @thomhurst in thomhurst/TUnit#5682 * fix(mocks): skip MockBridge for class targets with static-abstract interfaces (#5677) by @thomhurst in thomhurst/TUnit#5681 * chore(mocks): regenerate source generator snapshots by @thomhurst in thomhurst/TUnit#5691 * perf(engine): collapse async state-machine layers on hot test path (#5687) by @thomhurst in thomhurst/TUnit#5690 * perf(engine): reduce lock contention in scheduling and hook caches (#5686) by @thomhurst in thomhurst/TUnit#5693 * fix(assertions): prevent implicit-to-string op from NREing on null (#5692) by @thomhurst in thomhurst/TUnit#5696 * perf(engine/core): reduce per-test allocations (#5688) by @thomhurst in thomhurst/TUnit#5694 * perf(engine): reduce message-bus contention on test start (#5685) by @thomhurst in thomhurst/TUnit#5695 ### Dependencies * chore(deps): update tunit to 1.37.36 by @thomhurst in thomhurst/TUnit#5667 * chore(deps): update verify to 31.16.2 by @thomhurst in thomhurst/TUnit#5699 **Full Changelog**: thomhurst/TUnit@v1.37.36...v1.39.0 ## 1.37.36 <!-- Release notes generated using configuration in .github/release.yml at v1.37.36 --> ## What's Changed ### Other Changes * fix(telemetry): remove duplicate HTTP client spans by @thomhurst in thomhurst/TUnit#5668 **Full Changelog**: thomhurst/TUnit@v1.37.35...v1.37.36 ## 1.37.35 <!-- Release notes generated using configuration in .github/release.yml at v1.37.35 --> ## What's Changed ### Other Changes * Add TUnit.TestProject.Library to the TUnit.Dev.slnx solution file by @Zodt in thomhurst/TUnit#5655 * fix(aspire): preserve user-supplied OTLP endpoint (#4818) by @thomhurst in thomhurst/TUnit#5665 * feat(aspire): emit client spans for HTTP by @thomhurst in thomhurst/TUnit#5666 ### Dependencies * chore(deps): update dependency dotnet-sdk to v10.0.203 by @thomhurst in thomhurst/TUnit#5656 * chore(deps): update microsoft.aspnetcore to 10.0.7 by @thomhurst in thomhurst/TUnit#5657 * chore(deps): update tunit to 1.37.24 by @thomhurst in thomhurst/TUnit#5659 * chore(deps): update microsoft.extensions to 10.0.7 by @thomhurst in thomhurst/TUnit#5658 * chore(deps): update aspire to 13.2.3 by @thomhurst in thomhurst/TUnit#5661 * chore(deps): update dependency microsoft.net.test.sdk to 18.5.0 by @thomhurst in thomhurst/TUnit#5664 ## New Contributors * @Zodt made their first contribution in thomhurst/TUnit#5655 **Full Changelog**: thomhurst/TUnit@v1.37.24...v1.37.35 ## 1.37.24 <!-- Release notes generated using configuration in .github/release.yml at v1.37.24 --> ## What's Changed ### Other Changes * docs: add Tluma Ask AI widget to Docusaurus site by @thomhurst in thomhurst/TUnit#5638 * Revert "chore(deps): update dependency docusaurus-plugin-llms to ^0.4.0 (#5637)" by @thomhurst in thomhurst/TUnit#5640 * fix(asp-net): forward disposal in FlowSuppressingHostedService (#5651) by @JohnVerheij in thomhurst/TUnit#5652 ### Dependencies * chore(deps): update dependency docusaurus-plugin-llms to ^0.4.0 by @thomhurst in thomhurst/TUnit#5637 * chore(deps): update tunit to 1.37.10 by @thomhurst in thomhurst/TUnit#5639 * chore(deps): update opentelemetry to 1.15.3 by @thomhurst in thomhurst/TUnit#5645 * chore(deps): update opentelemetry by @thomhurst in thomhurst/TUnit#5647 * chore(deps): update dependency dompurify to v3.4.1 by @thomhurst in thomhurst/TUnit#5648 * chore(deps): update dependency system.commandline to 2.0.7 by @thomhurst in thomhurst/TUnit#5650 * chore(deps): update dependency microsoft.entityframeworkcore to 10.0.7 by @thomhurst in thomhurst/TUnit#5649 * chore(deps): update dependency microsoft.templateengine.authoring.cli to v10.0.203 by @thomhurst in thomhurst/TUnit#5653 * chore(deps): update dependency microsoft.templateengine.authoring.templateverifier to 10.0.203 by @thomhurst in thomhurst/TUnit#5654 **Full Changelog**: thomhurst/TUnit@v1.37.10...v1.37.24 Commits viewable in [compare view](thomhurst/TUnit@v1.37.10...v1.39.0). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Summary
Closes #5686.
CPU profile of ~1000-test runs showed
Monitor.Enter_Slowpathat 4.33% exclusive and a 48.5%GetQueuedCompletionStatusidle cliff. Two structural issues addressed:BeforeHookTaskCache.GetOrCreateBeforeClassTaskandAfterHookPairTracker.GetOrCreateAfterClassTaskguarded all classes behind a singleLock, serialising unrelated classes. Replaced with the existingThreadSafeDictionary<,>.GetOrAddpattern (already used forGetOrCreateBeforeAssemblyTask), which internally usesLazy<T>withExecutionAndPublicationand guarantees single-execution per key with no shared monitor. A lock-freeTryGetValuefast path keeps cache-hit calls closure-allocation-free.TestScheduler.ExecuteTestsAsynccalledParallel.ForEachAsyncwithoutMaxDegreeOfParallelismon the--maximum-parallel-tests-unset branch. Default isProcessorCountbut large test sets still queue near-simultaneous continuations and saturate IOCP. Added explicitMaxDegreeOfParallelism = Environment.ProcessorCount * 2, converging with the already-bounded limited path.Item 3 in the issue (
EventReceiverRegistrysingle-lock) was left for a follow-up as it's marked lower priority.Test plan
dotnet build TUnit.Engine/TUnit.Engine.csprojacross net8.0/net9.0/net10.0/netstandard2.0 — 0 warnings, 0 errors (no trimmer complaints on IL2077 annotation flow through the new closure).TUnit.UnitTests(net10.0): 180/180 pass.TUnit.Engine.Tests(net10.0): 245/247 pass; the 2 failures are pre-existing environmental (FSharp/VB integration tests require those sub-projects to be built locally and are unrelated to this change).TUnit.TestProject/HookExecutorTests(net10.0): 5/5 pass — verifies Before/After class hooks still execute correctly through the new pattern.