Skip to content

Fix ShutdownAllNodes to find AppHost-launched worker nodes#13511

Open
YuliiaKovalova wants to merge 7 commits intodotnet:mainfrom
YuliiaKovalova:fix/shutdown-apphost-nodes
Open

Fix ShutdownAllNodes to find AppHost-launched worker nodes#13511
YuliiaKovalova wants to merge 7 commits intodotnet:mainfrom
YuliiaKovalova:fix/shutdown-apphost-nodes

Conversation

@YuliiaKovalova
Copy link
Copy Markdown
Member

@YuliiaKovalova YuliiaKovalova commented Apr 9, 2026

Fixes #13508.

Problem

BuildManager.ShutdownAllNodes() (invoked by dotnet build-server shutdown) only searched for worker processes whose name matched the current host (dotnet on .NET Core, MSBuild on .NET Framework). Worker nodes that were launched as the MSBuild.exe AppHost from a dotnet-hosted build were therefore left running.

Fix

Derive the process names to search for from the path that would be launched right now, mirroring GetNodes's launcher logic, and add the alternate host as a defensive fallback.

Production change (NodeProviderOutOfProcBase.cs)

Single resolver ResolveProcessNamesToSearch(msbuildLocation):

  • Reuse path (msbuildLocation provided): returns the single process name derived from that path (AppHost path → MSBuild; managed DLL path → current host such as dotnet).
  • Shutdown path (msbuildLocation == null): derives BuildParameters.NodeExeLocation ?? BuildEnvironmentHelper.Instance.CurrentMSBuildExePath, applies the .NET Core AppHost → .dll remap when the current process is dotnet, then adds the alternate host name as defensive fallback so workers launched by the other host kind are still discovered.

The branching core is extracted as internal static ResolveProcessNamesToSearchCore for direct unit testing.

Allocation-aware: string[] is returned directly and the single-name case avoids the string.Join allocation in the diagnostic message.

Tests (NodeProcessNameResolution_Tests.cs)

Replaces the previous global-state E2E tests (which raced with concurrently running test projects' worker nodes and could not reliably observe shutdown across installations) with 5 deterministic unit tests of the resolver:

Scenario Expected
Reuse, AppHost path ["MSBuild"]
Reuse, MSBuild.dll path [currentHost] (e.g. dotnet)
Shutdown, no configured location (regression for #13508) contains MSBuild
Shutdown, configured AppHost location contains both MSBuild and current host
Shutdown, configured .dll location (.NET Core) contains MSBuild + current host

Why not a true E2E test?

Worker connect handshakes hash MSBuildToolsDirectory32, so a test process's Microsoft.Build.dll cannot connect to / shut down workers that were spawned by the bootstrapped MSBuild — the two installations have different tools directories. The bug lived at the resolver layer, and that is where it is now covered.

Verification

  • Microsoft.Build.csproj: 0 warnings, 0 errors.
  • New unit tests: 5/5 pass on net10.0.
  • Adjacent NodeProviderOutOfProc_Tests: 7/7 pass — no regression.

Manual end-to-end smoke test

Verified on Windows with the locally bootstrapped MSBuild:

  1. Set MSBUILDNOINPROCNODE=1 and ran a parallel-fan-out build with the bootstrap MSBuild.exe. → A worker MSBuild.exe (AppHost) appeared in the process list.
  2. Ran dotnet build-server shutdown (which goes through BuildManager.ShutdownAllNodes → the new resolver). → All worker MSBuild.exe processes were terminated.

This is the exact scenario from #13508 and now works as expected.

Copilot AI review requested due to automatic review settings April 9, 2026 08:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes BuildManager.ShutdownAllNodes() / dotnet build-server shutdown missing idle worker nodes when they were launched via the MSBuild AppHost (MSBuild(.exe)) by expanding shutdown-time process enumeration to consider both host process names.

Changes:

  • Split process-name resolution into node-reuse vs shutdown scenarios, with shutdown scanning both "dotnet" and "MSBuild" process names.
  • Updated candidate process enumeration to merge results across multiple process names and keep sorting/filtering behavior.
  • Added E2E unit tests intended to validate shutdown behavior for AppHost-launched nodes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Enumerate candidate node processes using multiple possible host process names during shutdown.
src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Add end-to-end tests for ShutdownAllNodes behavior with AppHost-launched nodes.

Comment thread src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Outdated
Comment thread src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
@YuliiaKovalova YuliiaKovalova force-pushed the fix/shutdown-apphost-nodes branch from 8e0791e to 3bcc99f Compare April 9, 2026 09:05
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Review — PR #13511: ShutdownAllNodes finds AppHost nodes

Summary

This PR correctly fixes the issue where ShutdownAllNodes() failed to find MSBuild AppHost-launched worker nodes when the calling process uses the dotnet host. The root cause — a single-name process search when msbuildLocation is null — is addressed by searching for both the current host and the MSBuild AppHost name during shutdown.

Dimensions Evaluated (24/24)

Dimension Verdict Notes
1. Backwards Compatibility ✅ Pass Shutdown now finds more nodes, not fewer. No behavioral regression for callers.
2. ChangeWave Discipline ✅ N/A Not a user-facing behavioral change requiring opt-out.
3. Performance & Allocation ⚠️ Minor Extra List<string>, ToArray() copies on non-hot shutdown path. See inline.
4. Test Coverage ⚠️ Major Tests are E2E only; process-counting oracle may be incomplete. Flakiness risk in CI.
5. Error Message Quality ✅ Pass No new user-facing messages.
6. Logging & Diagnostics ⚠️ Moderate expectedProcessName in tuple may be misleading during shutdown.
7. String Comparison ✅ Pass OrdinalIgnoreCase used correctly for process name comparison.
8. API Surface ✅ Pass All changes are private. No public API impact.
9–10. Targets / Design ✅ N/A
11. Cross-Platform ✅ Pass GetFileNameWithoutExtension handles .exe on Windows, no-extension on Unix. GetProcessesByName works correctly with unextended names on both.
12. Concurrency ✅ Pass No shared mutable state introduced.
13. Naming ⚠️ Minor GetProcessNameForNodeReuse (singular) vs GetProcessNamesForShutdown (plural) naming is clear and intentional.
14. Code Simplification ⚠️ Minor Unused usings in test file. Could use collection expressions.
15. Documentation ✅ Pass Good XML doc comments on new methods.
16. Nullable ⚠️ Minor New test file uses #nullable disable.
17. Dependencies ✅ Pass No new dependencies.
18. Security ✅ Pass No credential/injection risks.
19–20. Build/SDK Integration ✅ N/A
21. Evaluation Model ✅ N/A
22. IPC Packet Stability ✅ Pass No packet format changes.
23. Binary Log Compat ✅ N/A
24. Correctness ✅ Pass Logic handles all platform combinations correctly. See inline positive note on GetProcessNamesForShutdown.

Key Observations

The fix is correct and well-scoped. The separation into GetProcessNameForNodeReuse (knows how nodes were launched) vs GetProcessNamesForShutdown (must search all possibilities) is a clean design.

No blocking issues found. All comments are MINOR or MODERATE. The test flakiness concern is worth monitoring after merge but isn't a merge blocker.

Note on the static overload at line 724: There is a second GetPossibleRunningNodes(NodeMode?) overload (static, used by CountActiveNodesWithMode) that only searches for "MSBuild" processes. This is correct for its use case (counting AppHost nodes from within an out-of-proc node), but it's worth noting that the two overloads have different search strategies. A comment cross-referencing them would help future maintainers.

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #13511 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #13511 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Expert Code Review (on open) for issue #13511 · ● 4.1M

Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Outdated
Comment thread src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Outdated
Comment thread src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Outdated
@YuliiaKovalova YuliiaKovalova force-pushed the fix/shutdown-apphost-nodes branch 2 times, most recently from b15f060 to 17f2af0 Compare April 9, 2026 09:32
@YuliiaKovalova YuliiaKovalova force-pushed the fix/shutdown-apphost-nodes branch from 17f2af0 to 2de2ed3 Compare April 9, 2026 10:25
When an SDK uses the MSBuild AppHost (MSBuild.exe), worker nodes run
with process name 'MSBuild'. However, ShutdownAllNodes resolved the
current host to 'dotnet' via CurrentHost.GetCurrentHost() and only
searched for 'dotnet' processes, missing all AppHost-launched nodes.

This change splits the process name resolution into two methods:
- GetProcessNameForNodeReuse: used during builds when msbuildLocation
  is known - returns a single process name (unchanged behavior).
- GetProcessNamesForShutdown: used during ShutdownAllNodes when we
  don't know how nodes were launched - returns both 'dotnet' and
  'MSBuild' so all idle nodes are found regardless of launch method.

Also adds an ErrorUtilities.VerifyThrow guard to ensure the process
name list is never empty.

Includes E2E tests verifying shutdown finds AppHost-launched nodes.

Fixes dotnet#13508
Comment thread src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs Outdated
Comment thread src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Outdated
YuliiaKovalova and others added 5 commits April 22, 2026 11:28
Refactor process-name resolution in NodeProviderOutOfProcBase to mirror
the launcher logic instead of relying purely on a defensive dual search,
as suggested in PR review:

- Introduce ResolveProcessNamesToSearch(msbuildLocation) which derives
  the path that would be launched (BuildParameters.NodeExeLocation ??
  CurrentMSBuildExePath, with .NET Core AppHost->.dll remap when running
  under dotnet) and only adds the alternate host as a defensive fallback.
- Extract a pure internal static ResolveProcessNamesToSearchCore so the
  branching logic is unit-testable without spinning real worker nodes.
- Avoid the string.Join allocation when only one process name applies.

Replace the previous global-state E2E tests (which raced with other test
projects' worker nodes and could not reliably observe shutdown) with
focused, deterministic unit tests covering the reuse and shutdown
branches for both AppHost and managed-DLL launch paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves conflict in NodeProviderOutOfProcBase.cs: keep the new
ResolveProcessNamesToSearch resolver (which on .NET Framework already
includes Constants.MSBuildAppName via the alternate-host fallback),
superseding main's narrower netfx-only fallback for expectedProcessName.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Deduplicates the AppHost->managed-DLL remap that was inlined in both
GetNodes (launch path) and ResolveProcessNamesToSearchCore (shutdown path).
Using a single helper guarantees the two stay in sync by construction,
addressing review feedback to reuse instead of mirroring.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous instance wrapper (ResolveProcessNamesToSearch) and static
core (ResolveProcessNamesToSearchCore) only existed to bridge instance
state into a unit-testable method. Replace both with a single
internal static ResolveProcessNamesToSearch(msbuildLocation,
configuredNodeExeLocation) and have GetPossibleRunningNodes pass the
instance state directly. The tiny GetProcessNameFromExecutablePath
helper is inlined as a local function since it has no other callers.

No behavior change; tests updated for the rename.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ShutdownAllNodes / dotnet build-server shutdown broken when AppHost is used

3 participants