Skip to content

[dotnet] Fix hang after packing .NET NuGets. Fixes #13355.#15407

Merged
rolfbjarne merged 2 commits intodotnet:mainfrom
rolfbjarne:dotnet-fix-hang
Jul 8, 2022
Merged

[dotnet] Fix hang after packing .NET NuGets. Fixes #13355.#15407
rolfbjarne merged 2 commits intodotnet:mainfrom
rolfbjarne:dotnet-fix-hang

Conversation

@rolfbjarne
Copy link
Copy Markdown
Member

@rolfbjarne rolfbjarne commented Jul 6, 2022

This has been bothering me for a while... the symptom is that the build just
hangs at the end. Curiously it's never happend on the bots, only locally.

  1. It only happens when using parallel make. When using parallel make, make is
    in a jobserver mode, where sub-makes are controlled using a pair of file
    descriptors inherited by the sub-makes. A consequence of this algorithm is
    that the controlling make process will wait until all inherited file
    descriptors have been closed before it will realize that all its sub-makes
    have finished.
  2. 'dotnet pack' will build the corresponding project, and that might start a
    background compiler server.
  3. This background compiler server does not seem to close any file descriptors
    it inherits.
  4. The background compiler server does not necessarily exit by the time make
    is done.
  5. The result is that make things there are still sub-makes doing stuff,
    because there are inherited file descriptors still open.
  6. Killing the compiler server (in another terminal for instance) will make
    make realize it's done (and the hang is resolved).

So I'm applying the last point: shutting down the compiler server after
packing all the .NET NuGets.

Fixes #13355.

This has been bothering me for a while... the symptom is that the build just
hangs at the end. Curiously it's never happend on the bots, only locally.

1. It only happens when using parallel make. When using parallel make, make is
   in a jobserver mode, where sub-makes are controlled using a pair of file
   descriptors inherited by the sub-makes. A consequence of this algorithm is
   that the controlling make process will wait until all inherited file
   descriptors have been closed before it will realize that all its sub-makes
   have finished.
2. 'dotnet pack' will build the corresponding project, and that might start a
   background compiler server.
3. This background compiler server does not seem to close any file descriptors
   it inherits.
4. The background compiler server does not necessarily exit by the time `make`
   is done.
5. The result is that `make` things there are still sub-makes doing stuff,
   because there are inherited file descriptors still open.
6. Killing the compiler server (in another terminal for instance) will make
   make realize it's done (and the hang is resolved).

So I'm applying the last point: shutting down the compiler server after
packing all the .NET NuGets.
@rolfbjarne rolfbjarne added the not-notes-worthy Ignore for release notes label Jul 6, 2022
@rolfbjarne rolfbjarne changed the title [dotnet] Fix hang after packing .NET NuGets. [dotnet] Fix hang after packing .NET NuGets. Fixes #13355. Jul 6, 2022
@vs-mobiletools-engineering-service2

This comment has been minimized.

@vs-mobiletools-engineering-service2

This comment has been minimized.

@vs-mobiletools-engineering-service2

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@chamons chamons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a bug for build-server to clean up after itself?

@rolfbjarne
Copy link
Copy Markdown
Member Author

Do we have a bug for build-server to clean up after itself?

The bug is that the build server doesn't close inherited file descriptors when it's daemonizes itself (at least I think so, but I wasn't able to find the code where that happens).

In any case - no, I didn't file a bug, it didn't seem like something that would ever get fixed.

@vs-mobiletools-engineering-service2
Copy link
Copy Markdown
Collaborator

💻 [PR Build] Tests on macOS Mac Catalina (10.15) passed 💻

All tests on macOS Mac Catalina (10.15) passed.

Pipeline on Agent
Hash: 87d137e6394282c875a849778cf8ba1918411f8c [PR build]

@vs-mobiletools-engineering-service2
Copy link
Copy Markdown
Collaborator

📚 [PR Build] Artifacts 📚

Packages generated

View packages

Pipeline on Agent XAMBOT-1017.Monterey
Hash: 87d137e6394282c875a849778cf8ba1918411f8c [PR build]

@vs-mobiletools-engineering-service2
Copy link
Copy Markdown
Collaborator

✅ API diff for current PR / commit

Legacy Xamarin (No breaking changes)
  • iOS (no change detected)
  • tvOS (no change detected)
  • watchOS (no change detected)
  • macOS (no change detected)
NET (empty diffs)
  • iOS: (empty diff detected)
  • tvOS: (empty diff detected)
  • MacCatalyst: (empty diff detected)
  • macOS: (empty diff detected)

✅ API diff vs stable

Legacy Xamarin (No breaking changes)
.NET (No breaking changes)
Legacy Xamarin (stable) vs .NET

✅ Generator diff

Generator diff is empty

Pipeline on Agent
Hash: 87d137e6394282c875a849778cf8ba1918411f8c [PR build]

@vs-mobiletools-engineering-service2
Copy link
Copy Markdown
Collaborator

❌ [PR Build] Tests on macOS M1 - Mac Big Sur (11.5) failed ❌

Failed tests are:

  • xammac_tests
  • monotouch-test

Pipeline on Agent
Hash: 87d137e6394282c875a849778cf8ba1918411f8c [PR build]

@vs-mobiletools-engineering-service2
Copy link
Copy Markdown
Collaborator

🔥 [CI Build] Test results 🔥

Test results

❌ Tests failed on VSTS: simulator tests

0 tests crashed, 6 tests failed, 217 tests passed.

Failures

❌ dotnettests tests

1 tests failed, 0 tests passed.
Details
  • DotNet tests: Failed (Execution failed with exit code 1)

Html Report (VSDrops) Download

❌ linker tests

1 tests failed, 64 tests passed.
Details
  • dont link/watchOS 32-bits - simulator/Debug: TimedOut

Html Report (VSDrops) Download

❌ monotouch tests

3 tests failed, 20 tests passed.
Details
  • monotouch-test/iOS Unified 64-bits - simulator/Debug (LinkSdk): Failed
  • monotouch-test/iOS Unified 64-bits - simulator/Debug (static registrar): Failed
  • monotouch-test/iOS Unified 64-bits - simulator/Release (all optimizations): Failed

Html Report (VSDrops) Download

❌ mtouch tests

1 tests failed, 0 tests passed.
Details
  • MTouch tests/NUnit: Failed (Execution failed with exit code 19)

Html Report (VSDrops) Download

Successes

✅ bcl: All 69 tests passed. Html Report (VSDrops) Download
✅ cecil: All 1 tests passed. Html Report (VSDrops) Download
✅ fsharp: All 7 tests passed. Html Report (VSDrops) Download
✅ framework: All 8 tests passed. Html Report (VSDrops) Download
✅ generator: All 2 tests passed. Html Report (VSDrops) Download
✅ interdependent_binding_projects: All 7 tests passed. Html Report (VSDrops) Download
✅ install_source: All 1 tests passed. Html Report (VSDrops) Download
✅ introspection: All 8 tests passed. Html Report (VSDrops) Download
✅ mac_binding_project: All 1 tests passed. Html Report (VSDrops) Download
✅ mmp: All 2 tests passed. Html Report (VSDrops) Download
✅ mononative: All 12 tests passed. Html Report (VSDrops) Download
✅ msbuild: All 2 tests passed. Html Report (VSDrops) Download
✅ xammac: All 3 tests passed. Html Report (VSDrops) Download
✅ xcframework: All 8 tests passed. Html Report (VSDrops) Download
✅ xtro: All 2 tests passed. Html Report (VSDrops) Download

Pipeline on Agent
Hash: [PR build]

@rolfbjarne
Copy link
Copy Markdown
Member Author

@rolfbjarne rolfbjarne merged commit 1ecd843 into dotnet:main Jul 8, 2022
@rolfbjarne rolfbjarne deleted the dotnet-fix-hang branch July 8, 2022 14:50
dalexsoto added a commit that referenced this pull request Mar 17, 2026
Parallel make (e.g. 'make all -j8', 'make world') has been hanging
indefinitely at the end of the build. This is a long-standing issue
(#13355) that has been patched three times
(#15407, #21315, #22300) without fully fixing the root cause.

The problem: when using parallel make, GNU Make uses a jobserver with
pipe-based file descriptors to coordinate sub-makes. The dotnet CLI
can start background build servers (MSBuild server, Roslyn/VBCSCompiler)
that inherit these file descriptors but never close them. Make then
waits indefinitely for those file descriptors to close, thinking there
are still active jobs.

The previous workaround attempted to shut down and force-kill dotnet
processes after the build via a 'shutdown-build-server' target. This
approach was unreliable because:
- The shutdown ran from a double-colon all-hook:: rule with no
  prerequisites, so with -j it could execute in parallel with (or
  before) the actual build, killing nothing.
- Build servers started by later subdirectories (e.g. tests/) after
  the dotnet/ shutdown were never killed.
- The process-matching regex pattern might not match all server processes.

The fix: disable build servers entirely via environment variables in
Make.config:
- DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server
- UseSharedCompilation=false: prevents the Roslyn compiler server
- MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse

This eliminates the root cause - no background servers means no
inherited file descriptors means no hang. The shutdown-build-server
target and its invocations are removed as they are no longer needed.

Additionally, 'make world' now prints the installed workloads at the
end of the build for visibility.

Build without changes:
	make world  2149.57s user 258.32s system 107% cpu 37:30.19 total

Build with changes:
	make world  2242.74s user 286.38s system 354% cpu 11:52.55 total

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dalexsoto added a commit that referenced this pull request Mar 17, 2026
Parallel make (e.g. 'make all -j8', 'make world') has been hanging
indefinitely at the end of the build. This is a long-standing issue
(#13355) that has been patched three times
(#15407, #21315, #22300) without fully fixing the root cause.

The problem: when using parallel make, GNU Make uses a jobserver with
pipe-based file descriptors to coordinate sub-makes. The dotnet CLI
can start background build servers (MSBuild server, Roslyn/VBCSCompiler)
that inherit these file descriptors but never close them. Make then
waits indefinitely for those file descriptors to close, thinking there
are still active jobs.

The previous workaround attempted to shut down and force-kill dotnet
processes after the build via a 'shutdown-build-server' target. This
approach was unreliable because:
- The shutdown ran from a double-colon all-hook:: rule with no
  prerequisites, so with -j it could execute in parallel with (or
  before) the actual build, killing nothing.
- Build servers started by later subdirectories (e.g. tests/) after
  the dotnet/ shutdown were never killed.
- The process-matching regex pattern might not match all server processes.

The fix: disable build servers entirely via environment variables in
Make.config:
- DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server
  https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-server
  https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md
- UseSharedCompilation=false: prevents the Roslyn compiler server (VBCSCompiler)
  dotnet/roslyn#27975
- MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse
  https://github.com/dotnet/msbuild/wiki/MSBuild-Tips-&-Tricks

This eliminates the root cause - no background servers means no
inherited file descriptors means no hang. The shutdown-build-server
target and its invocations are removed as they are no longer needed.

Additionally, 'make world' now prints the installed workloads at the
end of the build for visibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dalexsoto added a commit that referenced this pull request Mar 17, 2026
)

Parallel make (e.g. 'make all -j8', 'make world') has been hanging
for a while at the end of the build. This is a long-standing issue
(#13355) that has been patched
three times
(#15407,
#21315,
#22300) without fully fixing the
root cause.

The problem: when using parallel make, GNU Make uses a jobserver with
pipe-based file descriptors to coordinate sub-makes. The dotnet CLI
can start background build servers (MSBuild server, Roslyn/VBCSCompiler)
that inherit these file descriptors but never close them. Make then
waits for those file descriptors to close (which won't happen until
the servers exit - which they typically do about 10 minutes without
activity), thinking there are still active jobs.

The previous workaround attempted to shut down and force-kill dotnet
processes after the build via a 'shutdown-build-server' target. This
approach was unreliable because:
- The shutdown ran from a double-colon all-hook:: rule with no
  prerequisites, so with -j it could execute in parallel with (or
  before) the actual build, killing nothing.
- Build servers started by later subdirectories (e.g. tests/) after
  the dotnet/ shutdown were never killed.
- The process-matching regex pattern might not match all server
processes.

Ideally this would be fixed in when launching the build servers, by
making them not inherit handles. Unfortunately this is currently not
possible: dotnet/runtime#13943 (although this
might change in a not so
distant future: dotnet/runtime#123959)

The workaround: disable build servers entirely via environment variables
in
Make.config:
- DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server
  https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-server

https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md
- UseSharedCompilation=false: prevents the Roslyn compiler server
(VBCSCompiler)
  dotnet/roslyn#27975
- MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse
  https://github.com/dotnet/msbuild/wiki/MSBuild-Tips-&-Tricks

This eliminates the root cause - no background servers means no
inherited file descriptors means no hang. The shutdown-build-server
target and its invocations are removed as they are no longer needed.

Additionally, 'make world' now prints the installed workloads at the
end of the build for visibility.

Build without changes:
> make world  2149.57s user 258.32s system 107% cpu 37:30.19 total

Build with changes:
> make world  2242.74s user 286.38s system 354% cpu 11:52.55 total

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

not-notes-worthy Ignore for release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build hang after 'cp -c ../../global6.json global.json'

4 participants