Skip to content

[CoreCLR][Signal] Bump shutdown notif and crashdump before prev handler#123735

Merged
mdh1418 merged 6 commits intodotnet:mainfrom
mdh1418:reorder_previous_signal_handler
Feb 13, 2026
Merged

[CoreCLR][Signal] Bump shutdown notif and crashdump before prev handler#123735
mdh1418 merged 6 commits intodotnet:mainfrom
mdh1418:reorder_previous_signal_handler

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Jan 28, 2026

From discussion, opting into enabling the crash chaining is more correct.

The previously registered signal action/handler aren't guaranteed to return, so we lose out on notifying shutdown and creating a dump in those cases. Specifically, PROCCreateCrashDumpIfEnabled would be the last chance to provide the managed context for the thread that crashed.

e.g. On Android CoreCLR, it seems that, by default, signal handlers are already registered by Android's runtime (/apex/com.android.runtime/bin/linker64 + /system/lib64/libandroid_runtime.so). Whenever an unhandled synchronous fault occurs, the previously registered handler will not return back to invoke_previous_action and aborts the thread itself, so PROCCreateCrashDumpIfEnabled will not be hit.

Sigsegv behavior Android CoreCLR vs other platforms

Android CoreCLR

When intentionally writing to NULL (sigsegv) on Android CoreCLR, the previously registered signal handler goes down this path

action->sa_sigaction(code, siginfo, context);
, and the thread aborts before hitting PROCNotifyProcessShutdown and PROCCreateCrashDumpIfEnabled.

MacOS/Linux/NativeAOT(linux)

On MacOS, Linux, NativeAOT (Only checked linux at time of writing), the same intentional SIGSEGV will hit

if (signalRestarts)
{
// Shutdown and create the core dump before we restore the signal to the default handler.
PROCNotifyProcessShutdown(IsRunningOnAlternateStack(context));
PROCCreateCrashDumpIfEnabled(code, siginfo, context, true);
// Restore the original and restart h/w exception.
restore_signal(code, action);
return;
}
else
{
// We can't invoke the original handler because returning from the
// handler doesn't restart the exception.
PROCAbort(code, siginfo, context);
}
instead because there is no previously registered signal handler. In those cases, PROCCreateCrashDumpIfEnabled is hit and managed callstacks are captured in the dump.

History investigation

From a github history dive, I didn't spot anything in particular requiring the previous signal handler to be invoked before PROCNotifyProcessShutdown + PROCCreateCrashDumpIfEnabled.

PROCNotifyProcessShutdown was first introduced in 1433c3f. It doesn't seem to state a particular reason for invoking it after the previous signal handler.

PROCCreateCrashDumpIfEnabled was added to signal.cpp in 7f9bd2c because the PROCNotifyProcessShutdown didn't create a crash dump. It doesn't state any particular reason for being invoked after the previously registered signal handler, and was probably just placed next to PROCNotifyProcessShutdown.

invoke_previous_action was introduced in a740f65 and was refactoring while maintaining the order.

Android CoreCLR behavior after swapping order

Locally, I have POC changes to emit managed callstacks in Android's PROCCreateCrashDumpIfEnabled.

01-28 17:26:40.951  2416  2440 F DOTNET  : Native crash detected; attempting managed stack trace.
01-28 17:26:40.951  2416  2440 F DOTNET  : {"stack":[
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x0","module":"0x0","offset":"0x0","name":"Program.MemSet(Void*, Int32, UIntPtr)"},
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x78d981145973","module":"0x0","offset":"0x0","name":"Program.MemSet(Void*, Int32, UIntPtr)"},
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x78d981145973","module":"0x0","offset":"0x73","name":"Program.ForceNativeSegv()"},
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x78d981141b60","module":"0x0","offset":"0x70","name":"Program.Main(System.String[])"}
01-28 17:26:40.951  2416  2440 F DOTNET  : ]}
01-28 17:26:40.952  2416  2440 F DOTNET  : Crash dump hook completed.
--------- beginning of crash
01-28 17:26:40.952  2416  2440 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 2440 (.dot.MonoRunner), pid 2416 (ulator.JIT.Test)
.....
01-28 17:26:46.882  2921  2921 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-28 17:26:46.882  2921  2921 F DEBUG   : Build fingerprint: 'google/sdk_gphone64_x86_64/emu64xa:16/BE2A.250530.026.D1/13818094:user/release-keys'
01-28 17:26:46.882  2921  2921 F DEBUG   : Revision: '0'
01-28 17:26:46.882  2921  2921 F DEBUG   : ABI: 'x86_64'
01-28 17:26:46.882  2921  2921 F DEBUG   : Timestamp: 2026-01-28 17:26:41.492831700-0500
01-28 17:26:46.882  2921  2921 F DEBUG   : Process uptime: 20s
01-28 17:26:46.883  2921  2921 F DEBUG   : Cmdline: net.dot.Android.Device_Emulator.JIT.Test
01-28 17:26:46.883  2921  2921 F DEBUG   : pid: 2416, tid: 2440, name: .dot.MonoRunner  >>> net.dot.Android.Device_Emulator.JIT.Test <<<
01-28 17:26:46.883  2921  2921 F DEBUG   : uid: 10219
01-28 17:26:46.883  2921  2921 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000000
01-28 17:26:46.883  2921  2921 F DEBUG   : Cause: null pointer dereference
01-28 17:26:46.883  2921  2921 F DEBUG   : Abort message: 'CoreCLR: previous handler for '
01-28 17:26:46.883  2921  2921 F DEBUG   :     rax 0000000000000000  rbx 000078da87ffade0  rcx 0000000000000000  rdx 0000000000000001
01-28 17:26:46.884  1237  1297 I s.nexuslauncher: AssetManager2(0x78dd08cd9178) locale list changing from [] to [en-US]
01-28 17:26:46.903  2447  2594 I BugleNotifications: Creating notification input ids [CONTEXT im_entry_input="" im_notification_input="" im_settings_store_input="" im_final_input="" ]
01-28 17:26:46.905  2921  2921 F DEBUG   :     r8  00007ffcde5a8080  r9  34d9bb0e67871eb0  r10 000078ddb4111870  r11 0000000000000293
01-28 17:26:46.906  2921  2921 F DEBUG   :     r12 0000000000000001  r13 000078da87ffafa0  r14 0000000000000000  r15 000078da87ffaf18
01-28 17:26:46.906  2921  2921 F DEBUG   :     rdi 0000000000000000  rsi 0000000000000000
01-28 17:26:46.906  2921  2921 F DEBUG   :     rbp 000078da87ffac40  rsp 000078da87ffabc8  rip 000078ddb41118a2
01-28 17:26:46.906  2921  2921 F DEBUG   : 2 total frames
01-28 17:26:46.906  2921  2921 F DEBUG   : backtrace:
01-28 17:26:46.906  2921  2921 F DEBUG   :       #00 pc 000000000008f8a2  /apex/com.android.runtime/lib64/bionic/libc.so (memset_avx2+50) (BuildId: fcb82240218d1473de1e3d2137c0be35)
01-28 17:26:46.906  2921  2921 F DEBUG   :       #01 pc 0000000000049972  /memfd:doublemapper (deleted) (offset 0x111000)

Now theres a window to log managed callstacks before the original signal handler aborts and triggers a tombstone.

Android Mono behavior

Mono provides two embeddings APIs to configure signal and crash chaining

/**
* mono_set_signal_chaining:
*
* Enable/disable signal chaining. This should be called before \c mono_jit_init.
* If signal chaining is enabled, the runtime saves the original signal handlers before
* installing its own handlers, and calls the original ones in the following cases:
* - a \c SIGSEGV / \c SIGABRT signal received while executing native (i.e. not JITted) code.
* - \c SIGPROF
* - \c SIGFPE
* - \c SIGQUIT
* - \c SIGUSR2
* Signal chaining only works on POSIX platforms.
*/
void
mono_set_signal_chaining (gboolean chain_signals)
{
mono_do_signal_chaining = chain_signals;
}
/**
* mono_set_crash_chaining:
*
* Enable/disable crash chaining due to signals. When a fatal signal is delivered and
* Mono doesn't know how to handle it, it will invoke the crash handler. If chrash chaining
* is enabled, it will first print its crash information and then try to chain with the native handler.
*/
void
mono_set_crash_chaining (gboolean chain_crashes)
{
mono_do_crash_chaining = chain_crashes;
}
that determine whether synchronous faults would chain
if (!ji) {
if (!mono_do_crash_chaining && mono_chain_signal (MONO_SIG_HANDLER_PARAMS))
return;
mono_handle_native_crash (mono_get_signame (SIGSEGV), &mctx, (MONO_SIG_HANDLER_INFO_TYPE*)info);
if (mono_do_crash_chaining) {
if (!mono_chain_signal (MONO_SIG_HANDLER_PARAMS))
mono_chain_signal_to_default_sigsegv_handler ();
return;
}
}
They would only chain to the previous signal handler
gboolean
MONO_SIG_HANDLER_SIGNATURE (mono_chain_signal)
{
int signal = MONO_SIG_HANDLER_GET_SIGNO ();
struct sigaction *saved_handler = (struct sigaction *)get_saved_signal_handler (signal);
// Ignores chaining to default signal handlers i.e. when saved_handler->sa_handler == SIG_DFL
if (saved_handler && saved_handler->sa_handler) {
if (!(saved_handler->sa_flags & SA_SIGINFO)) {
saved_handler->sa_handler (signal);
} else {
#ifdef MONO_ARCH_USE_SIGACTION
saved_handler->sa_sigaction (MONO_SIG_HANDLER_PARAMS);
#endif /* MONO_ARCH_USE_SIGACTION */
}
return TRUE;
}
return FALSE;
only after attempting to walk native and managed stacks
g_async_safe_printf("\n=================================================================\n");
g_async_safe_printf("\tNative Crash Reporting\n");
g_async_safe_printf("=================================================================\n");
g_async_safe_printf("Got a %s while executing native code. This usually indicates\n", signal);
g_async_safe_printf("a fatal error in the mono runtime or one of the native libraries \n");
g_async_safe_printf("used by your application.\n");
g_async_safe_printf("=================================================================\n");
mono_dump_native_crash_info (signal, mctx, info);
/* !jit_tls means the thread was not registered with the runtime */
// This must be below the native crash dump, because we can't safely
// do runtime state probing after we have walked the managed stack here.
if (jit_tls && mono_thread_internal_current () && mctx) {
g_async_safe_printf ("\n=================================================================\n");
g_async_safe_printf ("\tManaged Stacktrace:\n");
g_async_safe_printf ("=================================================================\n");
mono_walk_stack_full (print_stack_frame_signal_safe, mctx, jit_tls, mono_get_lmf (), MONO_UNWIND_LOOKUP_IL_OFFSET | MONO_UNWIND_SIGNAL_SAFE, NULL);
g_async_safe_printf ("=================================================================\n");
}

Alternatives

If there is any particular reason to preserve the order of sa_sigaction/sa_handler with respect to PROCNotifyProcessShutdown and PROCCreateCrashDumpIfEnabled for CoreCLR, a config knob can be added to allow Android CoreCLR to opt into the swapped ordering behavior. This may be in the form of config property key/values

const char** propertyKeys,
const char** propertyValues,
or clrconfigvalues. That way AndroidSDK/AndroidAppBuilder may opt-in at build-time.

Given that the history of the ordering didn't reveal any problems with swapping the order, we can fallback to this behavior if the order swap causes problems down the line.

The other way around is more restrictive. Should we first introduce all the overhead to enable an opt-in/opt-out config knob, and later discover that no platforms need to invoke their previous handlers before PROCNotifyProcessShutdown/PROCCreateCrashDumpIfEnabled, it seems harder to justify removing the knob.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts CoreCLR’s signal chaining so shutdown notification and crash dump creation happen before invoking a previously-registered signal handler, ensuring these steps still run when the prior handler doesn’t return (notably on Android).

Changes:

  • Reorders PROCNotifyProcessShutdown and PROCCreateCrashDumpIfEnabled to run before chaining to the prior sigaction handler.
  • Adds an assertion to document/enforce that the “custom handler” path isn’t reached for SIG_DFL/SIG_IGN.

@grendello
Copy link
Contributor

@mdh1418 it doesn't really matter what handlers Android installs, as long as you chain up to any you've captured.

…arsing

- Switch from parsing CRASH_CHAINING property before PAL init to using
  Configuration::GetKnobBooleanValue after InitializeConfigurationKnobs
- Add INTERNAL_CrashReportBeforeSignalChaining to clrconfigvalues.h
- Rename API to PAL_EnableCrashReportBeforeSignalChaining for clarity
- Remove HOST_PROPERTY_CRASH_CHAINING (no longer needed)
- Remove CLRConfigNoCache::Get from signal.cpp (use standard config system)
Copilot AI review requested due to automatic review settings February 11, 2026 19:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

- Rename configuration key for standardization
- Allow PROCNotifyProcessShutdown when crash reporting before signal
  chaining
- Conform to PAL formatting
Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can do the NAOT work in follow up PR.

@mdh1418 mdh1418 merged commit 6e94069 into dotnet:main Feb 13, 2026
165 of 168 checks passed
@mdh1418 mdh1418 deleted the reorder_previous_signal_handler branch February 13, 2026 17:31
richlander pushed a commit to richlander/runtime that referenced this pull request Feb 14, 2026
…er (dotnet#123735)

From discussion, opting into enabling the crash chaining is more
correct.

<s>The previously registered signal action/handler aren't guaranteed to
return, so we lose out on notifying shutdown and creating a dump in
those cases. Specifically, PROCCreateCrashDumpIfEnabled would be the
last chance to provide the managed context for the thread that crashed.

e.g. On Android CoreCLR, it seems that, by default, signal handlers are
already registered by Android's runtime
(/apex/com.android.runtime/bin/linker64 +
/system/lib64/libandroid_runtime.so). Whenever an unhandled synchronous
fault occurs, the previously registered handler will not return back to
invoke_previous_action and aborts the thread itself, so
PROCCreateCrashDumpIfEnabled will not be hit.</s>

## Sigsegv behavior Android CoreCLR vs other platforms

### Android CoreCLR
When intentionally writing to NULL (sigsegv) on Android CoreCLR, the
previously registered signal handler goes down this path
https://github.com/dotnet/runtime/blob/40e8c73b8f3b5f478a9bf03cf55c71d0608a8855/src/coreclr/pal/src/exception/signal.cpp#L454,
and the thread aborts before hitting PROCNotifyProcessShutdown and
PROCCreateCrashDumpIfEnabled.

### MacOS/Linux/NativeAOT(linux)
On MacOS, Linux, NativeAOT (Only checked linux at time of writing), the
same intentional SIGSEGV will hit
https://github.com/dotnet/runtime/blob/40e8c73b8f3b5f478a9bf03cf55c71d0608a8855/src/coreclr/pal/src/exception/signal.cpp#L431-L448
instead because there is no previously registered signal handler. In
those cases, PROCCreateCrashDumpIfEnabled is hit and managed callstacks
are captured in the dump.

## History investigation

From a github history dive, I didn't spot anything in particular
requiring the previous signal handler to be invoked before
PROCNotifyProcessShutdown + PROCCreateCrashDumpIfEnabled.

PROCNotifyProcessShutdown was first introduced in
dotnet@1433c3f.
It doesn't seem to state a particular reason for invoking it after the
previous signal handler.

PROCCreateCrashDumpIfEnabled was added to signal.cpp in
dotnet@7f9bd2c
because the PROCNotifyProcessShutdown didn't create a crash dump. It
doesn't state any particular reason for being invoked after the
previously registered signal handler, and was probably just placed next
to PROCNotifyProcessShutdown.

`invoke_previous_action` was introduced in
dotnet@a740f65
and was refactoring while maintaining the order.

## Android CoreCLR behavior after swapping order

Locally, I have POC changes to emit managed callstacks in Android's
PROCCreateCrashDumpIfEnabled.
```
01-28 17:26:40.951  2416  2440 F DOTNET  : Native crash detected; attempting managed stack trace.
01-28 17:26:40.951  2416  2440 F DOTNET  : {"stack":[
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x0","module":"0x0","offset":"0x0","name":"Program.MemSet(Void*, Int32, UIntPtr)"},
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x78d981145973","module":"0x0","offset":"0x0","name":"Program.MemSet(Void*, Int32, UIntPtr)"},
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x78d981145973","module":"0x0","offset":"0x73","name":"Program.ForceNativeSegv()"},
01-28 17:26:40.951  2416  2440 F DOTNET  : {"ip":"0x78d981141b60","module":"0x0","offset":"0x70","name":"Program.Main(System.String[])"}
01-28 17:26:40.951  2416  2440 F DOTNET  : ]}
01-28 17:26:40.952  2416  2440 F DOTNET  : Crash dump hook completed.
--------- beginning of crash
01-28 17:26:40.952  2416  2440 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 2440 (.dot.MonoRunner), pid 2416 (ulator.JIT.Test)
.....
01-28 17:26:46.882  2921  2921 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-28 17:26:46.882  2921  2921 F DEBUG   : Build fingerprint: 'google/sdk_gphone64_x86_64/emu64xa:16/BE2A.250530.026.D1/13818094:user/release-keys'
01-28 17:26:46.882  2921  2921 F DEBUG   : Revision: '0'
01-28 17:26:46.882  2921  2921 F DEBUG   : ABI: 'x86_64'
01-28 17:26:46.882  2921  2921 F DEBUG   : Timestamp: 2026-01-28 17:26:41.492831700-0500
01-28 17:26:46.882  2921  2921 F DEBUG   : Process uptime: 20s
01-28 17:26:46.883  2921  2921 F DEBUG   : Cmdline: net.dot.Android.Device_Emulator.JIT.Test
01-28 17:26:46.883  2921  2921 F DEBUG   : pid: 2416, tid: 2440, name: .dot.MonoRunner  >>> net.dot.Android.Device_Emulator.JIT.Test <<<
01-28 17:26:46.883  2921  2921 F DEBUG   : uid: 10219
01-28 17:26:46.883  2921  2921 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000000
01-28 17:26:46.883  2921  2921 F DEBUG   : Cause: null pointer dereference
01-28 17:26:46.883  2921  2921 F DEBUG   : Abort message: 'CoreCLR: previous handler for '
01-28 17:26:46.883  2921  2921 F DEBUG   :     rax 0000000000000000  rbx 000078da87ffade0  rcx 0000000000000000  rdx 0000000000000001
01-28 17:26:46.884  1237  1297 I s.nexuslauncher: AssetManager2(0x78dd08cd9178) locale list changing from [] to [en-US]
01-28 17:26:46.903  2447  2594 I BugleNotifications: Creating notification input ids [CONTEXT im_entry_input="" im_notification_input="" im_settings_store_input="" im_final_input="" ]
01-28 17:26:46.905  2921  2921 F DEBUG   :     r8  00007ffcde5a8080  r9  34d9bb0e67871eb0  r10 000078ddb4111870  r11 0000000000000293
01-28 17:26:46.906  2921  2921 F DEBUG   :     r12 0000000000000001  r13 000078da87ffafa0  r14 0000000000000000  r15 000078da87ffaf18
01-28 17:26:46.906  2921  2921 F DEBUG   :     rdi 0000000000000000  rsi 0000000000000000
01-28 17:26:46.906  2921  2921 F DEBUG   :     rbp 000078da87ffac40  rsp 000078da87ffabc8  rip 000078ddb41118a2
01-28 17:26:46.906  2921  2921 F DEBUG   : 2 total frames
01-28 17:26:46.906  2921  2921 F DEBUG   : backtrace:
01-28 17:26:46.906  2921  2921 F DEBUG   :       #00 pc 000000000008f8a2  /apex/com.android.runtime/lib64/bionic/libc.so (memset_avx2+50) (BuildId: fcb82240218d1473de1e3d2137c0be35)
01-28 17:26:46.906  2921  2921 F DEBUG   :       dotnet#1 pc 0000000000049972  /memfd:doublemapper (deleted) (offset 0x111000)
```
Now theres a window to log managed callstacks before the original signal
handler aborts and triggers a tombstone.

## Android Mono behavior

Mono provides two embeddings APIs to configure signal and crash chaining
https://github.com/dotnet/runtime/blob/61d3943de41e948bb0ecf871b92eb456d2dd74d8/src/mono/mono/mini/driver.c#L2864-L2894
that determine whether synchronous faults would chain
https://github.com/dotnet/runtime/blob/61d3943de41e948bb0ecf871b92eb456d2dd74d8/src/mono/mono/mini/mini-runtime.c#L3892-L3903
They would only chain to the previous signal handler
https://github.com/dotnet/runtime/blob/61d3943de41e948bb0ecf871b92eb456d2dd74d8/src/mono/mono/mini/mini-posix.c#L193-L210
only after attempting to walk native and managed stacks
https://github.com/dotnet/runtime/blob/61d3943de41e948bb0ecf871b92eb456d2dd74d8/src/mono/mono/mini/mini-exceptions.c#L2992-L3012

## Alternatives

If there is any particular reason to preserve the order of
sa_sigaction/sa_handler with respect to PROCNotifyProcessShutdown and
PROCCreateCrashDumpIfEnabled for CoreCLR, a config knob can be added to
allow Android CoreCLR to opt into the swapped ordering behavior. This
may be in the form of config property key/values
https://github.com/dotnet/runtime/blob/54ca569eb62800cdb725d776e3dd2e564028594d/src/coreclr/dlls/mscoree/exports.cpp#L237-L238
or `clrconfigvalues`. That way AndroidSDK/AndroidAppBuilder may opt-in
at build-time.

Given that the history of the ordering didn't reveal any problems with
swapping the order, we can fallback to this behavior if the order swap
causes problems down the line.

The other way around is more restrictive. Should we first introduce all
the overhead to enable an opt-in/opt-out config knob, and later discover
that no platforms need to invoke their previous handlers before
PROCNotifyProcessShutdown/PROCCreateCrashDumpIfEnabled, it seems harder
to justify removing the knob.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants