Skip to content

[Android] In-Proc Crash Reporter#126916

Open
mdh1418 wants to merge 50 commits intodotnet:mainfrom
mdh1418:inproc_crashreport_minimal
Open

[Android] In-Proc Crash Reporter#126916
mdh1418 wants to merge 50 commits intodotnet:mainfrom
mdh1418:inproc_crashreport_minimal

Conversation

@mdh1418
Copy link
Copy Markdown
Member

@mdh1418 mdh1418 commented Apr 14, 2026

[Note] The first commit adds the skeleton with stubbed fields, the following commits resolves fields, followed by another commit that emits the json file.


Android CoreCLR does not currently have the same out-of-proc createdump experience available on other platforms. That makes native crashes, aborts, and mixed managed/native failures significantly harder to diagnose in real applications.

This PR adds an opt-in in-proc crash reporter for Android CoreCLR that emits a JSON-formatted crash report modeled after createdump's CrashReportWriter.

When enabled, the runtime writes the crash report directly from the crash path to:

1. logcat / console output
2. an optional *.crashreport.json file when DbgMiniDumpName is set.

The long-term intent is for this reporting path to be async-signal-safe. This PR starts that work by making the lower-risk / low-hanging pieces fit that model where practical, while leaving the more complex runtime-state publication and hardening work to follow-up PRs.

Enabling the crash reporter

The crash reporter is opt-in.

Enable JSON crash reporting

DOTNET_EnableCrashReport=1

Example: write the report to a file as well

DOTNET_EnableCrashReport=1
DOTNET_DbgMiniDumpName=/data/data/<package>/files/dotnet_crash_%p

When configured this way, the runtime will also attempt to write:

/data/data/<package>/files/dotnet_crash_<pid>.crashreport.json

Current design note

This PR is intentionally the first step, not the complete hardening story.

The payload shape and reporting path are designed with async-signal-safety as the target, but today the implementation is a mix of:

  • crash-time logic that is already close to signal-safe
  • best-effort runtime inspection for richer managed diagnostics

Follow-up PRs will continue hardening the remaining pieces so the implementation better matches the intended strict crash-path constraints.

Example crash report shapes

https://gist.github.com/mdh1418/a893c7dd953bf725569b22abc10e59c1

Follow-up work

Cross-platform support (iOS/osx)
Emitting the crash report to logcat/console

mdh1418 and others added 9 commits April 14, 2026 10:03
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

@mdh1418 mdh1418 changed the title Inproc crashreport minimal [Android] In-Proc Crash Reporter Apr 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in, in-process crash reporter for Android CoreCLR that emits a createdump-shaped JSON crash report to logcat/stderr and optionally to a *.crashreport.json file, with VM-provided managed thread/stack/exception callbacks.

Changes:

  • Introduces PAL-side crash report generation (JSON writer, module/process name helpers, crash report generator).
  • Wires VM callbacks for managed thread enumeration/stack walking/exception extraction into the PAL crash reporter on Android startup.
  • Integrates crash report triggering into the Android crash/abort signal path and adds Android-only build plumbing.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/coreclr/vm/crashreportstackwalker.h Declares Android-only VM registration hook for crash report stack walking.
src/coreclr/vm/crashreportstackwalker.cpp Implements VM callbacks for managed stack walking, thread enumeration, and exception extraction.
src/coreclr/vm/ceemain.cpp Registers VM callbacks during Android EE startup.
src/coreclr/vm/CMakeLists.txt Adds the new VM crashreport stack walker source to the build.
src/coreclr/pal/src/thread/process.cpp Adds Android crash report initialization, enablement state, and crash-time generation call.
src/coreclr/pal/src/include/pal/process.h Exposes PROCIsCrashReportEnabled() for signal-path gating.
src/coreclr/pal/src/exception/signal.cpp Avoids duplicate managed stack logging when crash reporting is enabled.
src/coreclr/pal/src/crashreport/moduleenumerator.h Declares helpers to resolve process/module names via /proc.
src/coreclr/pal/src/crashreport/moduleenumerator.cpp Implements /proc/self/cmdline and /proc/self/maps parsing for crash-time module/process lookup.
src/coreclr/pal/src/crashreport/inproccrashreporter.h Declares crash report generator API and VM callback hooks.
src/coreclr/pal/src/crashreport/inproccrashreporter.cpp Implements JSON crash report generation, logcat/stderr output, optional file output, and callback integration.
src/coreclr/pal/src/crashreport/crashjsonwriter.h Declares fixed-buffer JSON writer intended for signal-safe usage.
src/coreclr/pal/src/crashreport/crashjsonwriter.cpp Implements the fixed-buffer JSON writer.
src/coreclr/pal/src/CMakeLists.txt Adds Android-only PAL crashreport sources to the build.

Comment thread src/coreclr/pal/src/thread/process.cpp Outdated
Comment thread src/coreclr/pal/src/thread/process.cpp Outdated
Comment thread src/coreclr/pal/src/crashreport/moduleenumerator.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
Comment thread src/coreclr/debug/crashreport/crashjsonwriter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
}
else if (exceptionType != NULL && exceptionType[0] != '\0')
{
char hresultBuffer[32];
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to condition this if a native code fault occurs in the middle of managed exception processing? It seems like the managed throwable would still be on the thread. Would we want to keep the managed throwable for context?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment have ended up in wrong place ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might no longer be in that spot, but I think the scenario is a native fault while a managed throwable is pinned on the thread, leading to the crash report attributing the crash to that throwable.

Comment thread src/coreclr/debug/crashreport/signalsafejsonwriter.cpp
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.h Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/pal/src/crashreport/moduleenumerator.cpp Outdated
Comment thread src/coreclr/pal/src/exception/signal.cpp
Comment thread src/coreclr/debug/crashreport/crashjsonwriter.h Outdated
Comment thread src/coreclr/debug/crashreport/signalsafejsonwriter.h Outdated
Comment thread src/coreclr/debug/crashreport/signalsafejsonwriter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/signalsafejsonwriter.h Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp
const char* last = path;
for (const char* p = path; *p != '\0'; p++)
{
if (*p == '/')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not make it platform specific, at least put it in a define that can be more easily identified.

// |bufferSize| > 0. This helper is async-signal-safe: it performs no
// allocation, locking, or TLS access.
void
FormatHexValue(
Copy link
Copy Markdown
Member

@lateralusX lateralusX Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these methods fit better in the json writer, so instead of writing strings, add methods to WriteAsHex(uint64_t), WriteDecimal(uint64_t), WriteDecimal(int64_t) etc.

If we emit them as the real data type in JSON, then it should be written as is, but if we emit them as strings surrounded by "", then we should probably also have WriteDecimalAsString methods.

Comment thread src/coreclr/pal/src/thread/process.cpp Outdated
@@ -35,6 +35,10 @@ SET_DEFAULT_DEBUG_CHANNEL(PROCESS); // some headers have code with asserts, so d
#include <generatedumpflags.h>
#include <clrconfignocache.h>

#ifdef FEATURE_INPROC_CRASHREPORT
#include "debug/crashreport/inproccrashreporter.h"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas, ok to add a dependency like this between corelcr pal and the in-proc crash reporter library? Alternative is to do what we did with g_logManagedCallstackForSignalCallback and use registered callbacks and remove the hard dependecy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAL is linked into multiple binaries. Are all those binaries going to need to statically link the crash reporter as well?

You may be getting away with this on Android, but I doubt that the build will work if the crashreporter is enabled on other platforms.

Alternative is to do what we did with g_logManagedCallstackForSignalCallback

I think the direct linking without pointer indirections would be more ok in this case than it is for PAL.

Comment thread src/coreclr/pal/src/CMakeLists.txt Outdated
@@ -37,6 +37,9 @@ endif(CORECLR_SET_RPATH)
# Include directories

include_directories(include)
if(CLR_CMAKE_TARGET_ANDROID)
include_directories(${CLR_DIR})
Copy link
Copy Markdown
Member

@lateralusX lateralusX Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above, ok to setup a dependency between in-crash reporter and coreclr pal? If we are ok with that, maybe we should at least reduce the scope of the include dir to only point to the in-proc crash reporter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its OK we should probably use the in-proc crash reporter feature flag.

Comment thread src/coreclr/vm/CMakeLists.txt Outdated
@@ -4,6 +4,9 @@ set(CMAKE_INCLUDE_CURRENT_DIR ON)
include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR})
include_directories(${ARCH_SOURCES_DIR})
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../interop/inc)
if(CLR_CMAKE_TARGET_ANDROID)
include_directories(${CLR_DIR})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe scope it down to the crash reporter and use feature flag to trigger adding the include directory.

Comment thread src/coreclr/CMakeLists.txt Outdated
@@ -109,6 +109,13 @@ if(CLR_CMAKE_HOST_UNIX)
add_linker_flag(-Wl,-z,notext)
endif()

# PAL consumes FEATURE_INPROC_CRASHREPORT, so the feature variable must be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should set this like other features add their compile definitions, clrdefinitions.cmake.

FrameCallbackAdapter(
CrawlFrame* pCF,
VOID* pData)
{
Copy link
Copy Markdown
Member

@lateralusX lateralusX Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some contract in this method making sure we are not calling methods that could cause issues when called from signal handler? Like:

 CONTRACTL
    {
        NOTHROW;
        GC_NOTRIGGER;
        CANNOT_TAKE_LOCK;
        MODE_ANY;
    }
    CONTRACTL_END;

You need to run on debug/checked build to make sure the contract holds.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime contract checks are enabled on Windows only. (It is still a good idea to add contracts even in non-Windows to document the expectations.)

Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
if (g_pDebugInterface != nullptr && pMD != nullptr)
{
DWORD resolvedILOffset = 0;
if (g_pDebugInterface->GetILOffsetFromNative(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will probably violate a stricter contract since it can throw, we should at least handle the scenario when it throws.

Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
// If DbgMiniDumpName is just a filename (no directory component), write
// the crash report under TMPDIR / /tmp so it lands somewhere writable.
char dumpPathBuf[CRASHREPORT_STRING_BUFFER_SIZE];
if (strchr(dumpName, '/') == nullptr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross platform deliminiator?

char dumpPathBuf[CRASHREPORT_STRING_BUFFER_SIZE];
if (strchr(dumpName, '/') == nullptr)
{
const char* tmpDir = getenv("TMPDIR");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have this logic spread out a little through the codebase, g_get_tmp_dir, ep_rt_temp_path_get etc. Maybe it should end up in shared code at some point (cross platform async signal safe).

mdh1418 and others added 2 commits April 28, 2026 01:48
Bundle of changes addressing reviewer feedback on the in-proc crash
reporter (PR dotnet#126916). Highlights:

- Group file-scope helpers in inproccrashreporter.cpp under a single
  CrashReportHelpers struct; drop static forward-declaration block.
- Fold FormatHexValue / FormatUnsignedDecimal / FormatSignedDecimal
  helpers onto SignalSafeJsonWriter and add WriteHex / WriteDecimal /
  WriteSignedDecimal members; eliminate per-call scratch buffers across
  inproccrashreporter.cpp.
- crashreportstackwalker.cpp: rename CrashReportRegisterStackWalker to
  CrashReportConfigure; have CrashReportSuspendThreads return the
  suspended state and pass it to CrashReportResumeThreads (drop the
  static s_runtimeSuspendedForCrashReport flag); wrap GetILOffset call
  in EX_TRY / EX_CATCH; add CONTRACTL annotations on FrameCallbackAdapter
  and CrashReportGetExceptionForThread.
- PAL callback inversion: register the in-proc crash reporter as a PAL
  signal callback instead of having PAL know about the reporter type
  directly; gate the wiring with FEATURE_INPROC_CRASH_REPORT cmake
  define.
- Various smaller review-feedback cleanups consistent with reviewer
  preferences (e.g. (void)param; for unused parameters).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three places in the runtime independently resolve the system temporary
directory: g_get_tmp_dir (eglib), ep_rt_temp_path_get (EventPipe), and
the new VM crash reporter added in this PR. lateralusX asked for these
to converge on a shared signal-safe helper so cross-platform divergence
(Unix TMPDIR/tmp vs Windows TMP/TEMP) and any future hardening live in
one place.

Add minipal_get_tempdir(buffer, size) under src/native/minipal:

  - Unix: probes \, falls back to "/tmp/".
  - Windows: probes %TMP%, then %TEMP%, falls back to "C:\\Temp\\".
  - Always writes a NUL-terminated path that ends with the platform
    separator, so callers can append a filename directly.
  - Uses only getenv / strlen / memcpy, matching the existing in-proc
    crash reporter contract (no allocation, no syscalls beyond getenv).

Migrate the VM crash reporter caller in CrashReportConfigure to use
the new helper. The other two pre-existing callers (g_get_tmp_dir,
ep_rt_temp_path_get) are intentionally left for follow-up so this
change stays scoped and low-risk for the in-proc crash reporter PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temp directory location is security sensitive. You do not want to open-code it like this, in particular on Windows. Implementations like this led to security incidents in the past.

If you would like to add a general-purpose temp directory method to minipal, it has to do exactly what the reviewed managed implementation does. I am not sure whether it is worth the pain.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API to get tmpdir is a Java one and is not available to you within the crash handler. Should we just use the API and cache its value at startup if the right env variables are set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants