perf: parallelize test metadata collection for source-generated tests by thomhurst · Pull Request #5221 · thomhurst/TUnit

thomhurst · 2026-03-22T19:02:21Z

Summary

Parallelizes GetTests() calls in AotTestDataCollector.CollectTests using Parallel.For with a pre-sized array, reducing startup time for projects with many source-generated test classes
Renames CollectTestsTraditional → CollectTests

Context

Reported in #5043: with 10,000 tests across 1,000 classes, source-generated tests were significantly slower than reflection mode. The root cause was sequential GetTests() calls — each one JIT-compiles a per-class method and creates TestMetadata objects, taking ~560ms sequentially for 10,000 tests.

Parallel.For at the source level is the right granularity — each source batches ~10 tests, amortizing ClassMetadata lookups. Index-based writes to a pre-allocated array preserve deterministic ordering with no concurrent collection overhead.

Benchmark (10,000 tests, 1,000 classes)

Configuration	Before	After
v1.17.36, source gen ON	~11s	—
v1.21.0, source gen ON	~3.4s	~3.1s
Reflection mode	~2.7s	~2.7s (unchanged)

Test plan

All 10,000 tests pass with source gen ON
All 10,000 tests pass with reflection mode
Builds on all target frameworks (netstandard2.0, net8.0, net9.0, net10.0)

When source generation is enabled with many test classes (1000+), the sequential GetTests() calls in CollectTests created a bottleneck. Each call JIT-compiles a per-class method and creates TestMetadata objects, taking ~560ms sequentially for 10,000 tests. Parallelize GetTests() calls using Parallel.ForEach when the source count exceeds the parallel threshold. Each source's GetTests is independent and safe to call concurrently. Small source sets (<8) remain sequential to avoid task scheduling overhead. Benchmarked with 10,000 tests across 1,000 classes: - Source gen (before): ~3.4s → Source gen (after): ~3.1s - Reflection mode (unchanged): ~2.7s

claude

Code Review

This is a clean, well-targeted performance improvement. The benchmark data, PR description, and implementation are all solid. A few things worth considering:

Non-deterministic test ordering

does not preserve insertion order — its enumeration order is implementation-defined and non-deterministic under concurrent inserts (in practice, it follows thread-local LIFO). This means the order of objects in can differ between runs.

If anything downstream depends on collection order (e.g., for reproducible test IDs or stable display order before scheduling kicks in), this could cause subtle inconsistencies. A simpler approach that preserves original source ordering:

var resultArrays = new IReadOnlyList<TestMetadata>[testSourcesList.Count];
Parallel.For(0, testSourcesList.Count, i =>
{
    resultArrays[i] = testSourcesList[i].GetTests(testSessionId);
});

var combined = new List<TestMetadata>();
foreach (var batch in resultArrays)
{
    for (var i = 0; i < batch.Count; i++)
        combined.Add(batch[i]);
}
return combined;

with index-based writes to a pre-allocated array is thread-safe (each index is written exactly once), eliminates overhead, and keeps the source ordering deterministic.

Missing / cancellation

without uses all available threads, which might be aggressive on CI agents with constrained CPU. Consider threading through a cancellation token and capping :

Parallel.ForEach(testSourcesList,
    new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
    testSource => allResults.Add(testSource.GetTests(testSessionId)));

Two-phase discovery path not parallelized

(the filter-hints path) is still sequential. For projects that use filters (e.g., CI jobs running a subset of tests), this path won't benefit. Whether that matters depends on how common filter usage is in the large-project scenario, but it's worth noting the optimization is asymmetric.

Threshold value

The threshold of 8 matches , which is consistent — good. The PR description says "8+ sources", which aligns with (i.e., is sequential, is parallel). The threshold name and value are clear.

Comment removal

The removed comments were substantive (they explained the two-path design rationale, single-pass vs. double-enumeration, etc.), not just noise. The remaining comment is much terser. The architecture is non-obvious enough that the original explanations added real value. Consider keeping at least the XML doc-level rationale.

Overall: The change is correct and the parallelization approach is sound. The main actionable concern is the ordering issue — switching to index-based would be strictly better (deterministic, no concurrent collection overhead, same thread safety). The lack of is a minor concern. Good work on the benchmark-driven approach and the small-project threshold guard.

claude

Code Review

This is a clean, well-targeted performance improvement. The benchmark data, PR description, and implementation are all solid. A few things worth considering:

Non-deterministic test ordering

ConcurrentBag<T> does not preserve insertion order — its enumeration order is implementation-defined and non-deterministic under concurrent inserts (in practice, it follows thread-local LIFO). This means the order of TestMetadata objects in combined can differ between runs.

If anything downstream depends on collection order (e.g., for reproducible test IDs or stable display order before scheduling kicks in), this could cause subtle inconsistencies. A simpler approach that preserves original source ordering:

var resultArrays = new IReadOnlyList<TestMetadata>[testSourcesList.Count];
Parallel.For(0, testSourcesList.Count, i =>
{
    resultArrays[i] = testSourcesList[i].GetTests(testSessionId);
});

var combined = new List<TestMetadata>();
foreach (var batch in resultArrays)
{
    for (var i = 0; i < batch.Count; i++)
        combined.Add(batch[i]);
}
return combined;

Parallel.For with index-based writes to a pre-allocated array is thread-safe (each index is written exactly once), eliminates ConcurrentBag overhead, and keeps the source ordering deterministic.

Missing `ParallelOptions` / cancellation

Parallel.ForEach without ParallelOptions uses all available threads, which might be aggressive on CI agents with constrained CPU. Consider capping MaxDegreeOfParallelism:

Parallel.ForEach(testSourcesList,
    new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
    testSource => allResults.Add(testSource.GetTests(testSessionId)));

Two-phase discovery path not parallelized

CollectTestsWithTwoPhaseDiscovery (the filter-hints path) is still sequential. For projects using filters (e.g., CI jobs running a subset of tests), this path won't benefit. Whether that matters depends on how common filter usage is in the large-project scenario, but it's worth noting the optimization is asymmetric.

Threshold value

The threshold of 8 matches TestBuilderPipeline.MinItemsForParallel, which is good for consistency. The condition < MinItemsForParallel meaning sequential and >= 8 meaning parallel is clear.

Comment removal

The removed comments were substantive — they explained the two-path design rationale, single-pass vs. double-enumeration, etc. The remaining comment // Filtered: enumerate descriptors, apply filters, expand dependencies, materialize matches is much terser. The architecture is non-obvious enough that the original explanations added real value. Consider preserving at least the XML doc-level rationale on the method.

Overall: The change is correct and the parallelization approach is sound. The main actionable concern is the ConcurrentBag ordering non-determinism — switching to index-based Parallel.For with a pre-allocated array would be strictly better (deterministic, no concurrent collection overhead, same thread safety). Good work on the benchmark-driven approach and the small-project threshold guard.

…urrentBag

claude

Code Review (Follow-up)

Previous review raised two main concerns. Here's how they were addressed:

Previous concern: Non-deterministic ordering () — ✅ ADDRESSED

The index-based with a pre-allocated array is exactly the right fix. Each index is written exactly once (thread-safe without locking), source order is fully deterministic, and there's no concurrent collection overhead. Better than what was originally suggested — good simplification.

Previous concern: Missing `ParallelOptions` — Partially addressed

Parallel.For without ParallelOptions uses the default threadpool scheduler, which is fine in practice, but the original suggestion was also to thread through cancellation support. This is still absent. For a startup/discovery path it's low priority, but worth noting.

New observation: Threshold check removed, but PR description is stale

The second commit removed the threshold guard (< MinItemsForParallel → sequential), but the PR description still says:

Small test sets (<8 sources) use sequential path — no overhead for small projects

This is no longer true. Parallel.For(0, 1, ...) for a single-source project will now spin up TPL machinery unnecessarily. For most real-world projects this is irrelevant, but the PR description should be updated to match the actual behavior.

If the intent is to always parallelize regardless of count, that's a valid call — Parallel.For with 0 or 1 iterations is cheap and the code is simpler. But the description should reflect this.

still sequential

Still not parallelized (filter-hints path). This was flagged as a minor/informational note last time and remains unchanged — acceptable if the majority of large-project workloads go through the branch.

Overall

The critical fix (deterministic ordering via pre-sized array) has been cleanly applied. The implementation is correct and the approach is sound. The only actionable item is updating the PR description to remove the stale threshold language. The code itself is good to merge.

claude

Code Review (Follow-up)

Previous review raised two main concerns. Here is how they were addressed:

Previous concern: Non-deterministic ordering (ConcurrentBag) — ADDRESSED

The index-based Parallel.For with a pre-allocated IReadOnlyList<TestMetadata>[] array is exactly the right fix. Each index is written exactly once (thread-safe without locking), source order is fully deterministic, and there is no concurrent collection overhead. This is a better implementation than what was originally suggested — good simplification.

Previous concern: Missing `ParallelOptions` — Partially addressed

Parallel.For without ParallelOptions uses the default threadpool scheduler, which is fine in practice. Cancellation support is still absent but for a startup/discovery path this is low priority.

New observation: Threshold check removed, but PR description is stale

The second commit removed the threshold guard (< MinItemsForParallel → sequential), but the PR description still says:

Small test sets (<8 sources) use sequential path — no overhead for small projects

This is no longer true. Parallel.For(0, 1, ...) for a single-source project will now spin up TPL machinery for a single item. In practice this overhead is negligible and the code is simpler, so removing the threshold is a valid call — but the PR description should be updated to match actual behavior.

`CollectTestsWithTwoPhaseDiscovery` still sequential

The filter-hints path is still not parallelized. Flagged as informational last time and unchanged — acceptable if large-project workloads predominantly go through the unfiltered path.

Overall

The critical fix (deterministic ordering via pre-sized array) has been cleanly applied. The implementation is correct and the approach is sound. Only actionable item: update the PR description to remove the stale threshold language. The code itself is good to merge.

claude

Code Review (Follow-up #2)

All previously raised concerns have been addressed. Here is the final assessment:

Previous concern: Non-deterministic ordering (ConcurrentBag) — RESOLVED

The implementation now uses index-based Parallel.For with a pre-allocated IReadOnlyList<TestMetadata>[] array. Each index is written exactly once, preserving deterministic source ordering with no concurrent collection overhead. Additionally, the total count is pre-computed to size the final List<TestMetadata> exactly, avoiding any reallocations. This is a clean, allocation-efficient implementation.

Previous concern: Stale PR description (threshold language) — RESOLVED

The PR description has been updated and no longer mentions the <8 sources sequential path. It now accurately describes the unconditional Parallel.For approach with the pre-sized array.

Previous concern: Missing `ParallelOptions` — Accepted as-is

Parallel.For without ParallelOptions uses the default threadpool scheduler, which is reasonable for a startup/discovery path. Cancellation is still absent but low priority here.

`CollectTestsWithTwoPhaseDiscovery` still sequential — Accepted as-is

The filter-hints path remains sequential. This asymmetry is acceptable if large-project workloads primarily go through the unfiltered branch. Not a blocker.

Overall

All critical issues have been resolved. The implementation is correct, deterministic, allocation-efficient, and well-structured. The code is ready to merge.

thomhurst temporarily deployed to Pull Requests March 22, 2026 19:02 — with GitHub Actions Inactive

thomhurst had a problem deploying to Pull Requests March 22, 2026 19:02 — with GitHub Actions Failure

claude bot reviewed Mar 22, 2026

View reviewed changes

simplify: remove threshold check, use pre-sized array instead of Conc…

caae654

…urrentBag

thomhurst temporarily deployed to Pull Requests March 22, 2026 19:06 — with GitHub Actions Inactive

thomhurst had a problem deploying to Pull Requests March 22, 2026 19:06 — with GitHub Actions Failure

claude bot reviewed Mar 22, 2026

View reviewed changes

simplify: pre-size list, drop low-signal comment, return List<T>

6add661

thomhurst temporarily deployed to Pull Requests March 22, 2026 19:09 — with GitHub Actions Inactive

thomhurst had a problem deploying to Pull Requests March 22, 2026 19:09 — with GitHub Actions Failure

claude bot reviewed Mar 22, 2026

View reviewed changes

thomhurst enabled auto-merge (squash) March 22, 2026 19:12

thomhurst merged commit 08a86f8 into main Mar 22, 2026
14 of 15 checks passed

thomhurst deleted the perf/parallelize-test-collection branch March 22, 2026 19:32

This was referenced Mar 23, 2026

Bump Microsoft.Testing.Extensions.CodeCoverage and TUnit MrKWatkins/Assertions#214

Closed

Bump TUnit from 1.19.57 to 1.21.6 YVbakker/OpenApiSourceGenerator#60

Closed

Bump the nuget-packages group with 2 updates DemoBytom/DemoEngine#532

Closed

claude bot mentioned this pull request Mar 23, 2026

chore(deps): update tunit to 1.21.6 #5228

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: parallelize test metadata collection for source-generated tests#5221

perf: parallelize test metadata collection for source-generated tests#5221
thomhurst merged 3 commits intomainfrom
perf/parallelize-test-collection

thomhurst commented Mar 22, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thomhurst commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Benchmark (10,000 tests, 1,000 classes)

Test plan

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review

Non-deterministic test ordering

Missing / cancellation

Two-phase discovery path not parallelized

Threshold value

Comment removal

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review

Non-deterministic test ordering

Missing ParallelOptions / cancellation

Two-phase discovery path not parallelized

Threshold value

Comment removal

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review (Follow-up)

Previous concern: Non-deterministic ordering () — ✅ ADDRESSED

Previous concern: Missing ParallelOptions — Partially addressed

New observation: Threshold check removed, but PR description is stale

still sequential

Overall

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review (Follow-up)

Previous concern: Non-deterministic ordering (ConcurrentBag) — ADDRESSED

Previous concern: Missing ParallelOptions — Partially addressed

New observation: Threshold check removed, but PR description is stale

CollectTestsWithTwoPhaseDiscovery still sequential

Overall

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review (Follow-up #2)

Previous concern: Non-deterministic ordering (ConcurrentBag) — RESOLVED

Previous concern: Stale PR description (threshold language) — RESOLVED

Previous concern: Missing ParallelOptions — Accepted as-is

CollectTestsWithTwoPhaseDiscovery still sequential — Accepted as-is

Overall

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thomhurst commented Mar 22, 2026 •

edited

Loading

Missing `ParallelOptions` / cancellation

Previous concern: Missing `ParallelOptions` — Partially addressed

Previous concern: Missing `ParallelOptions` — Partially addressed

`CollectTestsWithTwoPhaseDiscovery` still sequential

Previous concern: Missing `ParallelOptions` — Accepted as-is

`CollectTestsWithTwoPhaseDiscovery` still sequential — Accepted as-is