Skip to content

[Startup] Bump PerfMap ahead of DiagnosticServer#123226

Merged
mdh1418 merged 2 commits intodotnet:mainfrom
mdh1418:move_perfmap_init_before_diagnostic_server
Jan 16, 2026
Merged

[Startup] Bump PerfMap ahead of DiagnosticServer#123226
mdh1418 merged 2 commits intodotnet:mainfrom
mdh1418:move_perfmap_init_before_diagnostic_server

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Jan 15, 2026

Fixes #122472

From some inspection, it looks like PerfMap::Initialize does not depend on the methods right above it.

#ifdef FEATURE_PERFTRACING
        DiagnosticServerAdapter::Initialize();
        DiagnosticServerAdapter::PauseForDiagnosticsMonitor();
#endif // FEATURE_PERFTRACING

#ifdef FEATURE_GDBJIT
        // Initialize gdbjit
        NotifyGdb::Initialize();
#endif // FEATURE_GDBJIT

#ifdef FEATURE_EVENT_TRACE
        // Initialize event tracing early so we can trace CLR startup time events.
        InitializeEventTracing();

        // Fire the EE startup ETW event
        ETWFireEvent(EEStartupStart_V1);
#endif // FEATURE_EVENT_TRACE

        InitGSCookie();

#ifdef LOGGING
        InitializeLogging()
#endif

PerfMap::Initialize depends on SString::Startup() which occurs earlier in startup and some PAL/OS services.

Instead of adding more logic to handle a racing DiagnosticServer IPC EnablePerfMap command and PerfMap::Initialize (e.g. lazy-init/queued commands), bump the PerfMap initialization ahead of the DiagnosticServer initialization. The DiagnosticServer also doesn't depend on PerfMap being initialized afterwards, and PerfMap's CrstStatic and PerfMap::Enable(type, sendExisting) suggests that the IPC enable perfmap command should occur after PerfMap::Initialize.

This way, DiagnosticServer is still early in startup, and its PerfMap::Enable will not crash the runtime.
As for the IPC command to set environment variables, it is unlikely that it was used to DOTNET_PerfMapEnabled because that would either not have applied early enough or have crashed the runtime like in the issue. Instead, the EnablePerfMap/DisablePerfMap commands should be used to toggle PerfMap status.

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 15, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition where the DiagnosticServer's EnablePerfMap IPC command could crash the runtime by being invoked before PerfMap::Initialize(). The fix moves PerfMap initialization to occur before DiagnosticServer initialization in the startup sequence.

Changes:

  • Reordered PerfMap initialization to occur before DiagnosticServer initialization in EEStartupHelper()

@jkotas jkotas added area-Diagnostics-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 16, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

@mdh1418 mdh1418 merged commit 8ff3668 into dotnet:main Jan 16, 2026
104 checks passed
rosebyte pushed a commit that referenced this pull request Jan 19, 2026
Fixes #122472

From some inspection, it looks like `PerfMap::Initialize` does not
depend on the methods right above it.
```
#ifdef FEATURE_PERFTRACING
        DiagnosticServerAdapter::Initialize();
        DiagnosticServerAdapter::PauseForDiagnosticsMonitor();
#endif // FEATURE_PERFTRACING

#ifdef FEATURE_GDBJIT
        // Initialize gdbjit
        NotifyGdb::Initialize();
#endif // FEATURE_GDBJIT

#ifdef FEATURE_EVENT_TRACE
        // Initialize event tracing early so we can trace CLR startup time events.
        InitializeEventTracing();

        // Fire the EE startup ETW event
        ETWFireEvent(EEStartupStart_V1);
#endif // FEATURE_EVENT_TRACE

        InitGSCookie();

#ifdef LOGGING
        InitializeLogging()
#endif
```
`PerfMap::Initialize` depends on `SString::Startup()` which occurs
earlier in startup and some PAL/OS services.

Instead of adding more logic to handle a racing DiagnosticServer IPC
[`EnablePerfMap`
command](https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#enableperfmap)
and `PerfMap::Initialize` (e.g. lazy-init/queued commands), bump the
PerfMap initialization ahead of the DiagnosticServer initialization. The
DiagnosticServer also doesn't depend on PerfMap being initialized
afterwards, and PerfMap's `CrstStatic` and `PerfMap::Enable(type,
sendExisting)` suggests that the IPC enable perfmap command should occur
after `PerfMap::Initialize`.

This way, DiagnosticServer is still early in startup, and its
`PerfMap::Enable` will not crash the runtime.
As for the IPC command to set environment variables, it is unlikely that
it was used to `DOTNET_PerfMapEnabled` because that would either not
have applied early enough or have crashed the runtime like in the issue.
Instead, the `EnablePerfMap`/`DisablePerfMap` commands should be used to
toggle PerfMap status.
@mdh1418 mdh1418 deleted the move_perfmap_init_before_diagnostic_server branch February 10, 2026 01:03
steveisok pushed a commit that referenced this pull request Feb 11, 2026
…y in startup (#124208)

Manual backport of #123226 and #124055 to release/10.0

## Customer Impact

- [ ] Customer reported
- [x] Found internally

The DiagnosticServer allows profilers to enable PerfMap early in
start-up, even before PerfMap is initialized and dependencies are ready,
inducing a crash. Enabling PerfMap this early in start-up was not tested
until user_events support was added in .NET 10, where One-Collect's
[record-trace](https://github.com/microsoft/one-collect) aggressively
enables PerfMaps as soon as it connects to a .NET Process so it can
resolve JIT'd code. As .NET user_events + record-trace gains more users,
this crash will likely be observed more frequently.

With the changes in the two PRs, the runtime is resilient to early
PerfMap::Enable commands received by the DiagnosticServer.
DiagnosticServer initialization will still be early in start-up, PerfMap
initialization is just bumped right before it.

## Regression

- [ ] Yes
- [x] No

## Testing

I validated on a wsl2 instance, pausing the startup logic with
DOTNET_DefaultDiagnosticPortSuspend=1 and kicking off a separate app
invoking DiagnosticClient.EnablePerfMap. Before the change, it would
sigsegv and crash.
After the changes, its resilient and doesn't crash.

## Risk

Low. PerfMap initialization doesn't depend on the logic between its
former and proposed location. The logic to handle early IPC PerfMap
Enable commands makes the problematic logic a no-op until the
dependencies are ready.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DiagnosticsServer ENABLE_PERFMAP command not resilient executing early-in-startup

4 participants