Skip to content

[NativeAOT] Print OOM message before Abort() on Linux#125311

Open
Copilot wants to merge 11 commits intomainfrom
copilot/fix-out-of-memory-reporting
Open

[NativeAOT] Print OOM message before Abort() on Linux#125311
Copilot wants to merge 11 commits intomainfrom
copilot/fix-out-of-memory-reporting

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 8, 2026

On Linux, NativeAOT processes terminating due to OutOfMemoryException (e.g. with DOTNET_GCHeapHardLimit set) printed only Aborted with no diagnostic context.

Root cause

RuntimeExceptionHelpers.FailFast detects the preallocated OOM exception via minimalFailFast = (exception == PreallocatedOutOfMemoryException.Instance) and skips all stderr output to avoid heap allocations — leaving the user with no indication of why the process died.

Fix

In the minimalFailFast path, print a hardcoded OOM message to stderr before calling Abort(). The write is wrapped in try/catch {} so a secondary allocation failure silently falls through to the existing abort path.

Before: Aborted
After: Process terminated. System.OutOfMemoryException

Test

Added a new OomHandling smoke test in src/tests/nativeaot/SmokeTests/OomHandling/. The test spawns itself as a subprocess with DOTNET_GCHeapHardLimit=20000000 (20 MB) set, waits for the subprocess to run out of memory, and verifies that OutOfMemoryException appears in stderr. This covers both the preallocated OOM path (the fix) and the existing unhandled-exception path. The test is skipped on mobile and browser platforms that do not support process spawning.

Original prompt

This section details on the original issue you should resolve

<issue_title>[NativeAOT] Out of memory reporting on Linux</issue_title>
<issue_description>### Repro

var l = new List<object>();
for (; ; ) l.Add(new object());

Run the native aot compiled binary with with export DOTNET_GCHeapHardLimit=2000000 set

Actual result

Aborted

Expected result

Process is terminating due to OutOfMemoryException

(Reported by partner team.)</issue_description>

Comments on the Issue (you are @copilot in this section)

@ Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details

Repro

var l = new List<object>();
for (; ; ) l.Add(new object());

Run the native aot compiled binary with with export DOTNET_GCHeapHardLimit=2000000 set

Actual result

Aborted

Expected result

Process is terminating due to OutOfMemoryException

(Reported by partner team.)

Author: jkotas
Assignees: -
Labels:

area-NativeAOT-coreclr

Milestone: 8.0.0

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: agocke <515774+agocke@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix out of memory reporting on Linux for NativeAOT [NativeAOT] Print OOM message before Abort() on Linux Mar 8, 2026
@agocke
Copy link
Copy Markdown
Member

agocke commented Mar 9, 2026

Please add a unit test to src/tests

@agocke
Copy link
Copy Markdown
Member

agocke commented Mar 9, 2026

@copilot

Co-authored-by: agocke <515774+agocke@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 9, 2026

Please add a unit test to src/tests

Added a new OomHandling smoke test in src/tests/nativeaot/SmokeTests/OomHandling/ (commit 3ef2a6d). The test spawns itself as a subprocess with DOTNET_GCHeapHardLimit=20000000 (20 MB) set, waits for it to OOM, and verifies that OutOfMemoryException appears in stderr — covering both the preallocated OOM path (my fix) and the existing unhandled-exception path. The test is skipped on mobile and browser platforms that do not support process spawning.

@github-actions

This comment has been minimized.

Copilot AI review requested due to automatic review settings April 15, 2026 01:22
@eduardo-vp eduardo-vp marked this pull request as ready for review April 15, 2026 01:22

This comment was marked as outdated.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copilot AI review requested due to automatic review settings April 16, 2026 06:55
@eduardo-vp eduardo-vp marked this pull request as ready for review April 16, 2026 06:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread src/tests/nativeaot/SmokeTests/OomHandling/OomHandling.cs
@github-actions

This comment has been minimized.

Copilot AI review requested due to automatic review settings April 17, 2026 02:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/tests/nativeaot/SmokeTests/OomHandling/OomHandling.cs Outdated
Comment thread src/tests/nativeaot/SmokeTests/OomHandling/OomHandling.cs
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Copilot Code Review — PR #125311

Note

This review was generated by GitHub Copilot using claude-opus-4.6, with additional analysis from claude-haiku-4.5 and gpt-5.3-codex.

Holistic Assessment

Motivation: The fix is clearly justified. Before this PR, NativeAOT processes that ran out of memory and hit the minimalFailFast path produced zero stderr output—on Linux, users only saw "Aborted" with no indication that OOM was the cause. This is a real diagnosability gap.

Approach: The approach is sound—adding a best-effort stderr write in the minimal fail-fast path, wrapped in try/catch to handle the case where even that small allocation fails. The additional hardening of SerializeCrashInfo and the exception.ToString() fallback chain are appropriate defensive measures.

Summary: ✅ LGTM. The code changes are correct, well-scoped, and follow the defensive patterns appropriate for a fatal error path. The test is well-designed with proper subprocess isolation and covers both allocation strategies. Two minor observations are noted below but neither is merge-blocking.


Detailed Findings

✅ Correctness — minimalFailFast path restructuring

The control flow change from if (!minimalFailFast) { ... } to if (minimalFailFast) { ... } else { ... } is correct. The old code skipped all output for OOM—the new code provides a best-effort diagnostic message.

The message "Process terminated. System.OutOfMemoryException" is a compile-time constant, so the only runtime allocation is the small byte[] from Encoding.UTF8.GetBytes() inside Internal.Console.Error.Write(). The try/catch correctly handles the case where even this allocation fails. This is the right trade-off: any diagnostic output is better than silent termination.

✅ Thread Safety — s_crashInfoPresent placement

Setting s_crashInfoPresent = 1 outside the try/catch block (line 222) is correct and necessary. If it were inside the try block and an exception occurred, s_crashInfoPresent would remain at -1, causing other threads in the spin-wait loop (line 226: while (s_crashInfoPresent != 1)) to spin forever. When the catch fires, s_triageBufferAddress and s_triageBufferSize remain at their default 0/IntPtr.Zero values, which is safe—the crash dump infrastructure handles null/zero triage buffers.

✅ Exception.ToString() fallback chain

The nested try/catch for exception.ToString() (lines 287-301) with a fallback to exception.GetType().FullName is a good hardening measure. ToString() can perform significant allocations (building stack trace strings), so protecting it under low-memory conditions is appropriate. This was flagged independently by all three reviewing models.

✅ Test quality — OomHandling smoke test

The test is well-designed:

  • Proper subprocess isolation via Process.Start with DOTNET_GCHeapHardLimit
  • Two allocation strategies: small blocks (16KB + objects, exhausting nearly all memory) and large blocks (128KB, leaving some free memory)—these exercise both the preallocated-exception and fresh-exception code paths
  • Async stderr reading prevents deadlock with the timeout
  • Timeout path correctly calls p.WaitForExit() and drains stderrTask after Kill(true) to avoid resource leaks
  • ExpectedToken = "OutOfMemoryException" works for both the minimal path ("Process terminated. System.OutOfMemoryException") and the standard unhandled-exception path
  • .csproj correctly excludes mobile/browser platforms where subprocess spawning isn't supported
  • Follows sibling test conventions (return 100 pattern, RequiresProcessIsolation, ReferenceXUnitWrapperGenerator=false)

💡 Minor — Console.Error.Write allocates under the hood

The comment at lines 248-249 says "avoid heap allocations as much as possible" but Internal.Console.Error.Write(string) on Unix calls Encoding.UTF8.GetBytes(s) which allocates a byte[]. The try/catch handles this correctly, but a brief inline note acknowledging the allocation trade-off could aid future maintainers. This is purely a readability suggestion—not blocking.

💡 Minor — Non-minimal path's other Write calls are unprotected

In the else branch (lines 259-283), several Internal.Console.Error.Write() calls (for the prefix, errorSource, message, reason, stack trace) are not wrapped in try/catch. This is a pre-existing pattern that the PR doesn't need to address, and the outer catch in RuntimeFailFast (line 162-169) provides a safety net for most call paths. Noting for completeness, not a concern for this PR.

Generated by Code Review for issue #125311 ·

@@ -0,0 +1,118 @@
// Licensed to the .NET Foundation under one or more agreements.
Copy link
Copy Markdown
Member

@jkotas jkotas Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be NativeAOT smoketest. There is nothing NativeAOT specific about the desired behavior here. We want to see same or similar behavior without NativeAOT too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[NativeAOT] Out of memory reporting on Linux

5 participants