[dotnet] Fix hang after packing .NET NuGets. Fixes #13355.#15407
[dotnet] Fix hang after packing .NET NuGets. Fixes #13355.#15407rolfbjarne merged 2 commits intodotnet:mainfrom
Conversation
This has been bothering me for a while... the symptom is that the build just hangs at the end. Curiously it's never happend on the bots, only locally. 1. It only happens when using parallel make. When using parallel make, make is in a jobserver mode, where sub-makes are controlled using a pair of file descriptors inherited by the sub-makes. A consequence of this algorithm is that the controlling make process will wait until all inherited file descriptors have been closed before it will realize that all its sub-makes have finished. 2. 'dotnet pack' will build the corresponding project, and that might start a background compiler server. 3. This background compiler server does not seem to close any file descriptors it inherits. 4. The background compiler server does not necessarily exit by the time `make` is done. 5. The result is that `make` things there are still sub-makes doing stuff, because there are inherited file descriptors still open. 6. Killing the compiler server (in another terminal for instance) will make make realize it's done (and the hang is resolved). So I'm applying the last point: shutting down the compiler server after packing all the .NET NuGets.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
chamons
left a comment
There was a problem hiding this comment.
Do we have a bug for build-server to clean up after itself?
The bug is that the build server doesn't close inherited file descriptors when it's daemonizes itself (at least I think so, but I wasn't able to find the code where that happens). In any case - no, I didn't file a bug, it didn't seem like something that would ever get fixed. |
💻 [PR Build] Tests on macOS Mac Catalina (10.15) passed 💻✅ All tests on macOS Mac Catalina (10.15) passed. Pipeline on Agent |
✅ API diff for current PR / commitLegacy Xamarin (No breaking changes)
NET (empty diffs)
✅ API diff vs stableLegacy Xamarin (No breaking changes).NET (No breaking changes)✅ Generator diffGenerator diff is empty Pipeline on Agent |
❌ [PR Build] Tests on macOS M1 - Mac Big Sur (11.5) failed ❌Failed tests are:
Pipeline on Agent |
🔥 [CI Build] Test results 🔥Test results❌ Tests failed on VSTS: simulator tests 0 tests crashed, 6 tests failed, 217 tests passed. Failures❌ dotnettests testsDetails
Html Report (VSDrops) Download ❌ linker testsDetails
Html Report (VSDrops) Download ❌ monotouch testsDetails
Html Report (VSDrops) Download ❌ mtouch testsDetails
Html Report (VSDrops) Download Successes✅ bcl: All 69 tests passed. Html Report (VSDrops) Download Pipeline on Agent |
|
Test failures are unrelated:
|
Parallel make (e.g. 'make all -j8', 'make world') has been hanging indefinitely at the end of the build. This is a long-standing issue (#13355) that has been patched three times (#15407, #21315, #22300) without fully fixing the root cause. The problem: when using parallel make, GNU Make uses a jobserver with pipe-based file descriptors to coordinate sub-makes. The dotnet CLI can start background build servers (MSBuild server, Roslyn/VBCSCompiler) that inherit these file descriptors but never close them. Make then waits indefinitely for those file descriptors to close, thinking there are still active jobs. The previous workaround attempted to shut down and force-kill dotnet processes after the build via a 'shutdown-build-server' target. This approach was unreliable because: - The shutdown ran from a double-colon all-hook:: rule with no prerequisites, so with -j it could execute in parallel with (or before) the actual build, killing nothing. - Build servers started by later subdirectories (e.g. tests/) after the dotnet/ shutdown were never killed. - The process-matching regex pattern might not match all server processes. The fix: disable build servers entirely via environment variables in Make.config: - DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server - UseSharedCompilation=false: prevents the Roslyn compiler server - MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse This eliminates the root cause - no background servers means no inherited file descriptors means no hang. The shutdown-build-server target and its invocations are removed as they are no longer needed. Additionally, 'make world' now prints the installed workloads at the end of the build for visibility. Build without changes: make world 2149.57s user 258.32s system 107% cpu 37:30.19 total Build with changes: make world 2242.74s user 286.38s system 354% cpu 11:52.55 total Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parallel make (e.g. 'make all -j8', 'make world') has been hanging indefinitely at the end of the build. This is a long-standing issue (#13355) that has been patched three times (#15407, #21315, #22300) without fully fixing the root cause. The problem: when using parallel make, GNU Make uses a jobserver with pipe-based file descriptors to coordinate sub-makes. The dotnet CLI can start background build servers (MSBuild server, Roslyn/VBCSCompiler) that inherit these file descriptors but never close them. Make then waits indefinitely for those file descriptors to close, thinking there are still active jobs. The previous workaround attempted to shut down and force-kill dotnet processes after the build via a 'shutdown-build-server' target. This approach was unreliable because: - The shutdown ran from a double-colon all-hook:: rule with no prerequisites, so with -j it could execute in parallel with (or before) the actual build, killing nothing. - Build servers started by later subdirectories (e.g. tests/) after the dotnet/ shutdown were never killed. - The process-matching regex pattern might not match all server processes. The fix: disable build servers entirely via environment variables in Make.config: - DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-server https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md - UseSharedCompilation=false: prevents the Roslyn compiler server (VBCSCompiler) dotnet/roslyn#27975 - MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse https://github.com/dotnet/msbuild/wiki/MSBuild-Tips-&-Tricks This eliminates the root cause - no background servers means no inherited file descriptors means no hang. The shutdown-build-server target and its invocations are removed as they are no longer needed. Additionally, 'make world' now prints the installed workloads at the end of the build for visibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
) Parallel make (e.g. 'make all -j8', 'make world') has been hanging for a while at the end of the build. This is a long-standing issue (#13355) that has been patched three times (#15407, #21315, #22300) without fully fixing the root cause. The problem: when using parallel make, GNU Make uses a jobserver with pipe-based file descriptors to coordinate sub-makes. The dotnet CLI can start background build servers (MSBuild server, Roslyn/VBCSCompiler) that inherit these file descriptors but never close them. Make then waits for those file descriptors to close (which won't happen until the servers exit - which they typically do about 10 minutes without activity), thinking there are still active jobs. The previous workaround attempted to shut down and force-kill dotnet processes after the build via a 'shutdown-build-server' target. This approach was unreliable because: - The shutdown ran from a double-colon all-hook:: rule with no prerequisites, so with -j it could execute in parallel with (or before) the actual build, killing nothing. - Build servers started by later subdirectories (e.g. tests/) after the dotnet/ shutdown were never killed. - The process-matching regex pattern might not match all server processes. Ideally this would be fixed in when launching the build servers, by making them not inherit handles. Unfortunately this is currently not possible: dotnet/runtime#13943 (although this might change in a not so distant future: dotnet/runtime#123959) The workaround: disable build servers entirely via environment variables in Make.config: - DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-server https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md - UseSharedCompilation=false: prevents the Roslyn compiler server (VBCSCompiler) dotnet/roslyn#27975 - MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse https://github.com/dotnet/msbuild/wiki/MSBuild-Tips-&-Tricks This eliminates the root cause - no background servers means no inherited file descriptors means no hang. The shutdown-build-server target and its invocations are removed as they are no longer needed. Additionally, 'make world' now prints the installed workloads at the end of the build for visibility. Build without changes: > make world 2149.57s user 258.32s system 107% cpu 37:30.19 total Build with changes: > make world 2242.74s user 286.38s system 354% cpu 11:52.55 total Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This has been bothering me for a while... the symptom is that the build just
hangs at the end. Curiously it's never happend on the bots, only locally.
in a jobserver mode, where sub-makes are controlled using a pair of file
descriptors inherited by the sub-makes. A consequence of this algorithm is
that the controlling make process will wait until all inherited file
descriptors have been closed before it will realize that all its sub-makes
have finished.
background compiler server.
it inherits.
makeis done.
makethings there are still sub-makes doing stuff,because there are inherited file descriptors still open.
make realize it's done (and the hang is resolved).
So I'm applying the last point: shutting down the compiler server after
packing all the .NET NuGets.
Fixes #13355.