perf: parallelize test metadata collection for source-generated tests#5221
perf: parallelize test metadata collection for source-generated tests#5221
Conversation
When source generation is enabled with many test classes (1000+), the sequential GetTests() calls in CollectTests created a bottleneck. Each call JIT-compiles a per-class method and creates TestMetadata objects, taking ~560ms sequentially for 10,000 tests. Parallelize GetTests() calls using Parallel.ForEach when the source count exceeds the parallel threshold. Each source's GetTests is independent and safe to call concurrently. Small source sets (<8) remain sequential to avoid task scheduling overhead. Benchmarked with 10,000 tests across 1,000 classes: - Source gen (before): ~3.4s → Source gen (after): ~3.1s - Reflection mode (unchanged): ~2.7s
There was a problem hiding this comment.
Code Review
This is a clean, well-targeted performance improvement. The benchmark data, PR description, and implementation are all solid. A few things worth considering:
Non-deterministic test ordering
does not preserve insertion order — its enumeration order is implementation-defined and non-deterministic under concurrent inserts (in practice, it follows thread-local LIFO). This means the order of objects in can differ between runs.
If anything downstream depends on collection order (e.g., for reproducible test IDs or stable display order before scheduling kicks in), this could cause subtle inconsistencies. A simpler approach that preserves original source ordering:
var resultArrays = new IReadOnlyList<TestMetadata>[testSourcesList.Count];
Parallel.For(0, testSourcesList.Count, i =>
{
resultArrays[i] = testSourcesList[i].GetTests(testSessionId);
});
var combined = new List<TestMetadata>();
foreach (var batch in resultArrays)
{
for (var i = 0; i < batch.Count; i++)
combined.Add(batch[i]);
}
return combined;with index-based writes to a pre-allocated array is thread-safe (each index is written exactly once), eliminates overhead, and keeps the source ordering deterministic.
Missing / cancellation
without uses all available threads, which might be aggressive on CI agents with constrained CPU. Consider threading through a cancellation token and capping :
Parallel.ForEach(testSourcesList,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
testSource => allResults.Add(testSource.GetTests(testSessionId)));Two-phase discovery path not parallelized
(the filter-hints path) is still sequential. For projects that use filters (e.g., CI jobs running a subset of tests), this path won't benefit. Whether that matters depends on how common filter usage is in the large-project scenario, but it's worth noting the optimization is asymmetric.
Threshold value
The threshold of 8 matches , which is consistent — good. The PR description says "8+ sources", which aligns with (i.e., is sequential, is parallel). The threshold name and value are clear.
Comment removal
The removed comments were substantive (they explained the two-path design rationale, single-pass vs. double-enumeration, etc.), not just noise. The remaining comment is much terser. The architecture is non-obvious enough that the original explanations added real value. Consider keeping at least the XML doc-level rationale.
Overall: The change is correct and the parallelization approach is sound. The main actionable concern is the ordering issue — switching to index-based would be strictly better (deterministic, no concurrent collection overhead, same thread safety). The lack of is a minor concern. Good work on the benchmark-driven approach and the small-project threshold guard.
There was a problem hiding this comment.
Code Review
This is a clean, well-targeted performance improvement. The benchmark data, PR description, and implementation are all solid. A few things worth considering:
Non-deterministic test ordering
ConcurrentBag<T> does not preserve insertion order — its enumeration order is implementation-defined and non-deterministic under concurrent inserts (in practice, it follows thread-local LIFO). This means the order of TestMetadata objects in combined can differ between runs.
If anything downstream depends on collection order (e.g., for reproducible test IDs or stable display order before scheduling kicks in), this could cause subtle inconsistencies. A simpler approach that preserves original source ordering:
var resultArrays = new IReadOnlyList<TestMetadata>[testSourcesList.Count];
Parallel.For(0, testSourcesList.Count, i =>
{
resultArrays[i] = testSourcesList[i].GetTests(testSessionId);
});
var combined = new List<TestMetadata>();
foreach (var batch in resultArrays)
{
for (var i = 0; i < batch.Count; i++)
combined.Add(batch[i]);
}
return combined;Parallel.For with index-based writes to a pre-allocated array is thread-safe (each index is written exactly once), eliminates ConcurrentBag overhead, and keeps the source ordering deterministic.
Missing ParallelOptions / cancellation
Parallel.ForEach without ParallelOptions uses all available threads, which might be aggressive on CI agents with constrained CPU. Consider capping MaxDegreeOfParallelism:
Parallel.ForEach(testSourcesList,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
testSource => allResults.Add(testSource.GetTests(testSessionId)));Two-phase discovery path not parallelized
CollectTestsWithTwoPhaseDiscovery (the filter-hints path) is still sequential. For projects using filters (e.g., CI jobs running a subset of tests), this path won't benefit. Whether that matters depends on how common filter usage is in the large-project scenario, but it's worth noting the optimization is asymmetric.
Threshold value
The threshold of 8 matches TestBuilderPipeline.MinItemsForParallel, which is good for consistency. The condition < MinItemsForParallel meaning sequential and >= 8 meaning parallel is clear.
Comment removal
The removed comments were substantive — they explained the two-path design rationale, single-pass vs. double-enumeration, etc. The remaining comment // Filtered: enumerate descriptors, apply filters, expand dependencies, materialize matches is much terser. The architecture is non-obvious enough that the original explanations added real value. Consider preserving at least the XML doc-level rationale on the method.
Overall: The change is correct and the parallelization approach is sound. The main actionable concern is the ConcurrentBag ordering non-determinism — switching to index-based Parallel.For with a pre-allocated array would be strictly better (deterministic, no concurrent collection overhead, same thread safety). Good work on the benchmark-driven approach and the small-project threshold guard.
There was a problem hiding this comment.
Code Review (Follow-up)
Previous review raised two main concerns. Here's how they were addressed:
Previous concern: Non-deterministic ordering () — ✅ ADDRESSED
The index-based with a pre-allocated array is exactly the right fix. Each index is written exactly once (thread-safe without locking), source order is fully deterministic, and there's no concurrent collection overhead. Better than what was originally suggested — good simplification.
Previous concern: Missing ParallelOptions — Partially addressed
Parallel.For without ParallelOptions uses the default threadpool scheduler, which is fine in practice, but the original suggestion was also to thread through cancellation support. This is still absent. For a startup/discovery path it's low priority, but worth noting.
New observation: Threshold check removed, but PR description is stale
The second commit removed the threshold guard (< MinItemsForParallel → sequential), but the PR description still says:
Small test sets (<8 sources) use sequential path — no overhead for small projects
This is no longer true. Parallel.For(0, 1, ...) for a single-source project will now spin up TPL machinery unnecessarily. For most real-world projects this is irrelevant, but the PR description should be updated to match the actual behavior.
If the intent is to always parallelize regardless of count, that's a valid call — Parallel.For with 0 or 1 iterations is cheap and the code is simpler. But the description should reflect this.
still sequential
Still not parallelized (filter-hints path). This was flagged as a minor/informational note last time and remains unchanged — acceptable if the majority of large-project workloads go through the branch.
Overall
The critical fix (deterministic ordering via pre-sized array) has been cleanly applied. The implementation is correct and the approach is sound. The only actionable item is updating the PR description to remove the stale threshold language. The code itself is good to merge.
There was a problem hiding this comment.
Code Review (Follow-up)
Previous review raised two main concerns. Here is how they were addressed:
Previous concern: Non-deterministic ordering (ConcurrentBag) — ADDRESSED
The index-based Parallel.For with a pre-allocated IReadOnlyList<TestMetadata>[] array is exactly the right fix. Each index is written exactly once (thread-safe without locking), source order is fully deterministic, and there is no concurrent collection overhead. This is a better implementation than what was originally suggested — good simplification.
Previous concern: Missing ParallelOptions — Partially addressed
Parallel.For without ParallelOptions uses the default threadpool scheduler, which is fine in practice. Cancellation support is still absent but for a startup/discovery path this is low priority.
New observation: Threshold check removed, but PR description is stale
The second commit removed the threshold guard (< MinItemsForParallel → sequential), but the PR description still says:
Small test sets (<8 sources) use sequential path — no overhead for small projects
This is no longer true. Parallel.For(0, 1, ...) for a single-source project will now spin up TPL machinery for a single item. In practice this overhead is negligible and the code is simpler, so removing the threshold is a valid call — but the PR description should be updated to match actual behavior.
CollectTestsWithTwoPhaseDiscovery still sequential
The filter-hints path is still not parallelized. Flagged as informational last time and unchanged — acceptable if large-project workloads predominantly go through the unfiltered path.
Overall
The critical fix (deterministic ordering via pre-sized array) has been cleanly applied. The implementation is correct and the approach is sound. Only actionable item: update the PR description to remove the stale threshold language. The code itself is good to merge.
There was a problem hiding this comment.
Code Review (Follow-up #2)
All previously raised concerns have been addressed. Here is the final assessment:
Previous concern: Non-deterministic ordering (ConcurrentBag) — RESOLVED
The implementation now uses index-based Parallel.For with a pre-allocated IReadOnlyList<TestMetadata>[] array. Each index is written exactly once, preserving deterministic source ordering with no concurrent collection overhead. Additionally, the total count is pre-computed to size the final List<TestMetadata> exactly, avoiding any reallocations. This is a clean, allocation-efficient implementation.
Previous concern: Stale PR description (threshold language) — RESOLVED
The PR description has been updated and no longer mentions the <8 sources sequential path. It now accurately describes the unconditional Parallel.For approach with the pre-sized array.
Previous concern: Missing ParallelOptions — Accepted as-is
Parallel.For without ParallelOptions uses the default threadpool scheduler, which is reasonable for a startup/discovery path. Cancellation is still absent but low priority here.
CollectTestsWithTwoPhaseDiscovery still sequential — Accepted as-is
The filter-hints path remains sequential. This asymmetry is acceptable if large-project workloads primarily go through the unfiltered branch. Not a blocker.
Overall
All critical issues have been resolved. The implementation is correct, deterministic, allocation-efficient, and well-structured. The code is ready to merge.
Summary
GetTests()calls inAotTestDataCollector.CollectTestsusingParallel.Forwith a pre-sized array, reducing startup time for projects with many source-generated test classesCollectTestsTraditional→CollectTestsContext
Reported in #5043: with 10,000 tests across 1,000 classes, source-generated tests were significantly slower than reflection mode. The root cause was sequential
GetTests()calls — each one JIT-compiles a per-class method and createsTestMetadataobjects, taking ~560ms sequentially for 10,000 tests.Parallel.Forat the source level is the right granularity — each source batches ~10 tests, amortizingClassMetadatalookups. Index-based writes to a pre-allocated array preserve deterministic ordering with no concurrent collection overhead.Benchmark (10,000 tests, 1,000 classes)
Test plan