Fix ShutdownAllNodes to find AppHost-launched worker nodes#13511
Fix ShutdownAllNodes to find AppHost-launched worker nodes#13511YuliiaKovalova wants to merge 7 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes BuildManager.ShutdownAllNodes() / dotnet build-server shutdown missing idle worker nodes when they were launched via the MSBuild AppHost (MSBuild(.exe)) by expanding shutdown-time process enumeration to consider both host process names.
Changes:
- Split process-name resolution into node-reuse vs shutdown scenarios, with shutdown scanning both
"dotnet"and"MSBuild"process names. - Updated candidate process enumeration to merge results across multiple process names and keep sorting/filtering behavior.
- Added E2E unit tests intended to validate shutdown behavior for AppHost-launched nodes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs |
Enumerate candidate node processes using multiple possible host process names during shutdown. |
src/Build.UnitTests/BackEnd/ShutdownAllNodes_Tests.cs |
Add end-to-end tests for ShutdownAllNodes behavior with AppHost-launched nodes. |
8e0791e to
3bcc99f
Compare
There was a problem hiding this comment.
Expert Review — PR #13511: ShutdownAllNodes finds AppHost nodes
Summary
This PR correctly fixes the issue where ShutdownAllNodes() failed to find MSBuild AppHost-launched worker nodes when the calling process uses the dotnet host. The root cause — a single-name process search when msbuildLocation is null — is addressed by searching for both the current host and the MSBuild AppHost name during shutdown.
Dimensions Evaluated (24/24)
| Dimension | Verdict | Notes |
|---|---|---|
| 1. Backwards Compatibility | ✅ Pass | Shutdown now finds more nodes, not fewer. No behavioral regression for callers. |
| 2. ChangeWave Discipline | ✅ N/A | Not a user-facing behavioral change requiring opt-out. |
| 3. Performance & Allocation | Extra List<string>, ToArray() copies on non-hot shutdown path. See inline. |
|
| 4. Test Coverage | Tests are E2E only; process-counting oracle may be incomplete. Flakiness risk in CI. | |
| 5. Error Message Quality | ✅ Pass | No new user-facing messages. |
| 6. Logging & Diagnostics | expectedProcessName in tuple may be misleading during shutdown. |
|
| 7. String Comparison | ✅ Pass | OrdinalIgnoreCase used correctly for process name comparison. |
| 8. API Surface | ✅ Pass | All changes are private. No public API impact. |
| 9–10. Targets / Design | ✅ N/A | |
| 11. Cross-Platform | ✅ Pass | GetFileNameWithoutExtension handles .exe on Windows, no-extension on Unix. GetProcessesByName works correctly with unextended names on both. |
| 12. Concurrency | ✅ Pass | No shared mutable state introduced. |
| 13. Naming | GetProcessNameForNodeReuse (singular) vs GetProcessNamesForShutdown (plural) naming is clear and intentional. |
|
| 14. Code Simplification | Unused usings in test file. Could use collection expressions. | |
| 15. Documentation | ✅ Pass | Good XML doc comments on new methods. |
| 16. Nullable | New test file uses #nullable disable. |
|
| 17. Dependencies | ✅ Pass | No new dependencies. |
| 18. Security | ✅ Pass | No credential/injection risks. |
| 19–20. Build/SDK Integration | ✅ N/A | |
| 21. Evaluation Model | ✅ N/A | |
| 22. IPC Packet Stability | ✅ Pass | No packet format changes. |
| 23. Binary Log Compat | ✅ N/A | |
| 24. Correctness | ✅ Pass | Logic handles all platform combinations correctly. See inline positive note on GetProcessNamesForShutdown. |
Key Observations
The fix is correct and well-scoped. The separation into GetProcessNameForNodeReuse (knows how nodes were launched) vs GetProcessNamesForShutdown (must search all possibilities) is a clean design.
No blocking issues found. All comments are MINOR or MODERATE. The test flakiness concern is worth monitoring after merge but isn't a merge blocker.
Note on the static overload at line 724: There is a second GetPossibleRunningNodes(NodeMode?) overload (static, used by CountActiveNodesWithMode) that only searches for "MSBuild" processes. This is correct for its use case (counting AppHost nodes from within an out-of-proc node), but it's worth noting that the two overloads have different search strategies. A comment cross-referencing them would help future maintainers.
Note
🔒 Integrity filter blocked 2 items
The following items were blocked because they don't meet the GitHub integrity level.
- #13511
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved". - #13511
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | noneGenerated by Expert Code Review (on open) for issue #13511 · ● 4.1M
b15f060 to
17f2af0
Compare
17f2af0 to
2de2ed3
Compare
When an SDK uses the MSBuild AppHost (MSBuild.exe), worker nodes run with process name 'MSBuild'. However, ShutdownAllNodes resolved the current host to 'dotnet' via CurrentHost.GetCurrentHost() and only searched for 'dotnet' processes, missing all AppHost-launched nodes. This change splits the process name resolution into two methods: - GetProcessNameForNodeReuse: used during builds when msbuildLocation is known - returns a single process name (unchanged behavior). - GetProcessNamesForShutdown: used during ShutdownAllNodes when we don't know how nodes were launched - returns both 'dotnet' and 'MSBuild' so all idle nodes are found regardless of launch method. Also adds an ErrorUtilities.VerifyThrow guard to ensure the process name list is never empty. Includes E2E tests verifying shutdown finds AppHost-launched nodes. Fixes dotnet#13508
2de2ed3 to
2e696dc
Compare
Refactor process-name resolution in NodeProviderOutOfProcBase to mirror the launcher logic instead of relying purely on a defensive dual search, as suggested in PR review: - Introduce ResolveProcessNamesToSearch(msbuildLocation) which derives the path that would be launched (BuildParameters.NodeExeLocation ?? CurrentMSBuildExePath, with .NET Core AppHost->.dll remap when running under dotnet) and only adds the alternate host as a defensive fallback. - Extract a pure internal static ResolveProcessNamesToSearchCore so the branching logic is unit-testable without spinning real worker nodes. - Avoid the string.Join allocation when only one process name applies. Replace the previous global-state E2E tests (which raced with other test projects' worker nodes and could not reliably observe shutdown) with focused, deterministic unit tests covering the reuse and shutdown branches for both AppHost and managed-DLL launch paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves conflict in NodeProviderOutOfProcBase.cs: keep the new ResolveProcessNamesToSearch resolver (which on .NET Framework already includes Constants.MSBuildAppName via the alternate-host fallback), superseding main's narrower netfx-only fallback for expectedProcessName. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Deduplicates the AppHost->managed-DLL remap that was inlined in both GetNodes (launch path) and ResolveProcessNamesToSearchCore (shutdown path). Using a single helper guarantees the two stay in sync by construction, addressing review feedback to reuse instead of mirroring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous instance wrapper (ResolveProcessNamesToSearch) and static core (ResolveProcessNamesToSearchCore) only existed to bridge instance state into a unit-testable method. Replace both with a single internal static ResolveProcessNamesToSearch(msbuildLocation, configuredNodeExeLocation) and have GetPossibleRunningNodes pass the instance state directly. The tiny GetProcessNameFromExecutablePath helper is inlined as a local function since it has no other callers. No behavior change; tests updated for the rename. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes #13508.
Problem
BuildManager.ShutdownAllNodes()(invoked bydotnet build-server shutdown) only searched for worker processes whose name matched the current host (dotneton .NET Core,MSBuildon .NET Framework). Worker nodes that were launched as theMSBuild.exeAppHost from adotnet-hosted build were therefore left running.Fix
Derive the process names to search for from the path that would be launched right now, mirroring
GetNodes's launcher logic, and add the alternate host as a defensive fallback.Production change (
NodeProviderOutOfProcBase.cs)Single resolver
ResolveProcessNamesToSearch(msbuildLocation):msbuildLocationprovided): returns the single process name derived from that path (AppHost path →MSBuild; managed DLL path → current host such asdotnet).msbuildLocation == null): derivesBuildParameters.NodeExeLocation ?? BuildEnvironmentHelper.Instance.CurrentMSBuildExePath, applies the .NET CoreAppHost → .dllremap when the current process isdotnet, then adds the alternate host name as defensive fallback so workers launched by the other host kind are still discovered.The branching core is extracted as
internal static ResolveProcessNamesToSearchCorefor direct unit testing.Allocation-aware:
string[]is returned directly and the single-name case avoids thestring.Joinallocation in the diagnostic message.Tests (
NodeProcessNameResolution_Tests.cs)Replaces the previous global-state E2E tests (which raced with concurrently running test projects' worker nodes and could not reliably observe shutdown across installations) with 5 deterministic unit tests of the resolver:
["MSBuild"]MSBuild.dllpath[currentHost](e.g.dotnet)MSBuildMSBuildand current host.dlllocation (.NET Core)MSBuild+ current hostWhy not a true E2E test?
Worker connect handshakes hash
MSBuildToolsDirectory32, so a test process'sMicrosoft.Build.dllcannot connect to / shut down workers that were spawned by the bootstrapped MSBuild — the two installations have different tools directories. The bug lived at the resolver layer, and that is where it is now covered.Verification
Microsoft.Build.csproj: 0 warnings, 0 errors.net10.0.NodeProviderOutOfProc_Tests: 7/7 pass — no regression.Manual end-to-end smoke test
Verified on Windows with the locally bootstrapped MSBuild:
MSBUILDNOINPROCNODE=1and ran a parallel-fan-out build with the bootstrapMSBuild.exe. → A workerMSBuild.exe(AppHost) appeared in the process list.dotnet build-server shutdown(which goes throughBuildManager.ShutdownAllNodes→ the new resolver). → All workerMSBuild.exeprocesses were terminated.This is the exact scenario from #13508 and now works as expected.