Skip to content

fix(aspnetcore): scope correlation processor per-factory to stop cross-factory tag leak#5891

Merged
thomhurst merged 2 commits into
mainfrom
fix/ci-aspnetcore-optout-isolation
May 14, 2026
Merged

fix(aspnetcore): scope correlation processor per-factory to stop cross-factory tag leak#5891
thomhurst merged 2 commits into
mainfrom
fix/ci-aspnetcore-optout-isolation

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

OptOut_DoesNotTag_AspNetCoreSpans failed on macOS in run 25572055061 with a foreign GUID stamped into the opt-out exporter's activity.

Cause. Every parallel WebApplicationFactory that calls AddAspNetCoreInstrumentation() subscribes its TracerProvider to the process-global Microsoft.AspNetCore ActivitySource. A single Activity per request is observed by every subscribed processor — and any tag any processor writes is visible to every exporter. Factory A's TUnitTestCorrelationProcessor was therefore tagging factory B's request activity with factory A's TestId, polluting the opt-out exporter. [NotInParallel] on the OptOut class doesn't help because the contaminating tests are sibling classes without the constraint.

Approach: per-factory baggage filter

  • Each TestWebApplicationFactory now owns a CorrelationScope with a unique FactoryId.
  • A new FactoryScopeStartupFilter adds middleware that stamps every incoming request's Activity with a baggage entry carrying the owning factory's id.
  • TUnitTestCorrelationProcessor takes an optional CorrelationScope constructor argument. When bound, it tags only activities whose baggage matches its own factory id; activities tagged by a sibling factory's middleware are skipped.
  • The parameterless processor constructor and AutoStart's process-wide use are unchanged, so the TUnit.OpenTelemetry zero-config package keeps its current behavior.

Test plan

  • All 35 TUnit.AspNetCore.Tests pass locally including both AutoConfigureOpenTelemetryTests and AutoConfigureOpenTelemetryOptOutTests.
  • All 5 existing CorrelationProcessorTests still pass (parameterless constructor unchanged).
  • Two new regression tests added for the scoped path: a scoped processor refuses to tag an activity bearing a different factory's baggage (even when TraceRegistry would resolve a TestId), and still tags activities that carry its own factory's id.
  • .NET workflow runs across all three OS legs.

…s-factory tag leak

OptOut_DoesNotTag_AspNetCoreSpans failed on macOS in run 25572055061 with
a foreign GUID stamped into the opt-out exporter's activity. Cause: every
parallel WebApplicationFactory that calls AddAspNetCoreInstrumentation()
subscribes its TracerProvider to the process-global Microsoft.AspNetCore
ActivitySource. A single Activity per request is observed by every
subscribed processor, and any tag any processor writes is visible to every
exporter — so factory A's TUnitTestCorrelationProcessor was tagging
factory B's request activity, polluting the opt-out exporter.

[NotInParallel] on the OptOut class doesn't help because the contaminating
tests are sibling classes without the constraint.

Per-factory baggage approach:

  - Each TestWebApplicationFactory gets a CorrelationScope holding a unique
    FactoryId.
  - A new FactoryScopeStartupFilter adds middleware that stamps the
    Activity for every incoming request with a baggage entry carrying the
    owning factory's id.
  - TUnitTestCorrelationProcessor takes an optional CorrelationScope. When
    bound, it tags only activities whose baggage carries its own factory
    id; activities tagged by a sibling factory's middleware are skipped.

The parameterless processor constructor and AutoStart's process-wide use
are unchanged, so the TUnit.OpenTelemetry zero-config package and the
existing CorrelationProcessorTests keep their current behavior.

Two new regression tests cover the scoped path: a scoped processor refuses
to tag an activity bearing a different factory's baggage even when the
TraceRegistry fallback would happily resolve a TestId, and the same
processor still tags activities that carry its own factory's id.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 11, 2026

Not up to standards ⛔

🔴 Issues 1 critical · 1 minor

Alerts:
⚠ 2 issues (≤ 0 issues of at least minor severity)

Results:
2 new issues

Category Results
Security 1 critical
CodeStyle 1 minor

View in Codacy

🟢 Metrics 0 complexity

Metric Results
Complexity 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(aspnetcore): scope correlation processor per-factory

The root-cause analysis is correct and the approach is sound overall. The per-factory baggage filter cleanly isolates activity tagging without breaking the zero-config TUnit.OpenTelemetry path. A few things worth considering:


Medium — tunit.factory.id propagates in W3C baggage headers to downstream services

Activity.AddBaggage(...) writes a value that W3C propagators (including the ActivityPropagationHandler already threaded through the factory's HttpClient) will include in outgoing baggage request headers. In a test setup that talks to real downstream services or records HTTP exchanges, the factory GUID leaks into those headers. It's test-only metadata that downstream services shouldn't see.

Better approach: Store the factory→trace-ID mapping in an in-memory dictionary (similar to TraceRegistry), keyed by the Activity's TraceId, populated by the middleware:

// In FactoryActivityRegistry (internal to TUnit.AspNetCore):
internal static class FactoryActivityRegistry
{
    private static readonly ConcurrentDictionary<ActivityTraceId, string> _map = new();
    internal static void Register(ActivityTraceId traceId, string factoryId) => _map[traceId] = factoryId;
    internal static string? GetFactoryId(ActivityTraceId traceId) => _map.TryGetValue(traceId, out var id) ? id : null;
}

The middleware calls FactoryActivityRegistry.Register(activity.TraceId, _scope.FactoryId) instead of activity.AddBaggage(...), and the processor checks the registry instead of reading baggage. This keeps factory identity entirely in-process and out of propagated headers.


Low — OnStart is completely skipped for scoped processors, even for child activities

The comment correctly explains why root-level server activities can't be tagged at OnStart (middleware runs after ActivitySource.StartActivity). But child activities created during request handling (e.g. database spans, outgoing HttpClient spans within the SUT) do have the factory baggage available at OnStart via the parent chain. Skipping OnStart entirely for scoped processors is conservative and causes these spans to only be tagged at OnEnd.

In practice this is fine for BatchExportProcessor and InMemoryExporter, and OnEnd covers the root-activity case. But it's a silent behavioural difference from the unscoped path that may surprise users registering SimpleExportProcessor before the correlation processor. Worth a clarifying note or a targeted early-return only for root activities:

public override void OnStart(Activity activity)
{
    // Only skip OnStart for root activities (no in-process parent) — factory baggage
    // is set by middleware that runs after StartActivity, so it can't be read here.
    if (_scope is not null && activity.Parent is null)
        return;

    TryTag(activity);
}

Low — Missing test: scoped processor with activity that carries no factory baggage

The two new tests cover alien-factory baggage and own-factory baggage. There is no test for a scoped processor observing an activity that has no tunit.factory.id baggage at all (e.g. a background timer or a span started outside any request pipeline, observed by the factory's TracerProvider via AddHttpClientInstrumentation). The expected outcome — skip, even if TraceRegistry has an entry — falls out naturally from null != _scope.FactoryId, but making it explicit would document the intentional behaviour change versus the unscoped processor and guard against future regressions.


Positive notes

  • The CorrelationScope + FactoryScopeStartupFilter split cleanly separates concerns; the scope object is a dumb value and the filter/processor share it by reference without coupling.
  • [EditorBrowsable(EditorBrowsableState.Never)] is appropriate for a type that must be public (because it appears in the processor's public constructor) but isn't part of the user-facing API.
  • Backward compatibility is fully preserved — the parameterless constructor and AutoStart's process-wide path are unchanged.
  • The regression tests are well-constructed, especially the ScopedProcessor_DoesNotTag_ActivitiesFromOtherFactories test that includes a TraceRegistry hit to prove it's the factory guard — not the registry miss — that blocks tagging.

The baggage-leakage point (item 1) is the only structural concern worth resolving before merge. Items 2 and 3 are optional improvements.

…Scope additions

The PR's per-factory scoping added a new public sealed CorrelationScope
type and a new public TUnitTestCorrelationProcessor(CorrelationScope?)
constructor to TUnit.OpenTelemetry. The OpenTelemetry_Library_Has_No_API_Changes
snapshot test was failing on all CI legs because the verified files
didn't reflect the new surface.

Regenerated locally from the received output; identical across the three
TFM-specific snapshots except for line 4 (framework display name).
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

Checking against the previous review — the PR has not been updated since that review was posted (still at commit 97cd6f36). None of the three raised items have been addressed yet.

Status of prior findings

Medium — tunit.factory.id leaks into W3C baggage headers (unresolved)

The middleware still calls activity.AddBaggage(CorrelationScope.FactoryIdBaggageKey, _scope.FactoryId). W3C propagators will include this in outgoing baggage: request headers to any downstream service the SUT contacts during tests. This is test infrastructure leaking into observed traffic.

The suggested fix from the prior review remains valid: replace the baggage write with an in-process registry keyed by ActivityTraceId (similar to TraceRegistry), so factory identity never leaves the process. This is the only structural concern blocking merge.

Low — OnStart fully skipped for scoped processors (unresolved)

Child activities (database spans, outgoing HttpClient spans created inside the SUT) have factory baggage available at OnStart via the parent chain, but are silently skipped and only tagged at OnEnd. The suggested narrowing:

if (_scope is not null && activity.Parent is null)
    return;

…would restore OnStart tagging for child activities while keeping the correct skip for root-level server activities. Optional but worth considering.

Low — Missing test for no-factory-baggage activity on a scoped processor (unresolved)

A scoped processor observing an activity with no tunit.factory.id baggage at all (e.g. a background timer span) should skip it, even when TraceRegistry has an entry. This falls out naturally from the current code (null != _scope.FactoryId) but has no explicit test. Optional.


No regressions introduced, the architecture and test quality remain good. The baggage-leakage item is the only one I'd consider a blocker.

@thomhurst thomhurst enabled auto-merge (squash) May 11, 2026 22:29
@thomhurst thomhurst disabled auto-merge May 14, 2026 08:32
@thomhurst thomhurst merged commit daca904 into main May 14, 2026
11 of 14 checks passed
@thomhurst thomhurst deleted the fix/ci-aspnetcore-optout-isolation branch May 14, 2026 08:32
@claude claude Bot mentioned this pull request May 14, 2026
1 task
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant