Skip to content

[release/10.0] Fix lookup for current Thread in async signal handler#124308

Merged
steveisok merged 1 commit intodotnet:release/10.0from
janvorli:backport-redo-fix-async-thread-by-threadid-lookup
Feb 16, 2026
Merged

[release/10.0] Fix lookup for current Thread in async signal handler#124308
steveisok merged 1 commit intodotnet:release/10.0from
janvorli:backport-redo-fix-async-thread-by-threadid-lookup

Conversation

@janvorli
Copy link
Member

@janvorli janvorli commented Feb 12, 2026

Backport of #122513 to release/10.0

Customer Impact

  • Customer reported
  • Found internally

The current CheckActivationSafePoint uses thread local storage to
get the current Thread instance. But this function is called from
async signal handler (the activation signal handler) and it is not
allowed to access TLS variables there because the access can allocate
and if the interrupted code was running in an allocation code, it
could crash.
There was no problem with this since .NET 1.0, but a change in the
recent glibc version has broken this. We've got reports of crashes
in this code e.g on recent Ubuntu 25.04 due to this issue.

Regression

  • Yes
  • No

Testing

CI tests, local manual directed tests

Risk

Low, the change was in main for the past two months and we haven't seen any issues related to it. The newly added code is executed very frequently.

…2513)

This is a new version of the fix. The code in HijackCallback was using
the fact whether the
pThreadToHijack is NULL or not as an indicator of whether it should
suspend inline or redirect the thread. So passing in the pThreadToHijack
without other changes caused it to never suspend inline on Unix and it
was causing hangs in System.Collections.Concurrent tests.

The current CheckActivationSafePoint uses thread local storage to
get the current Thread instance. But this function is called from
async signal handler (the activation signal handler) and it is not
allowed to access TLS variables there because the access can allocate
and if the interrupted code was running in an allocation code, it
could crash.
There was no problem with this since .NET 1.0, but a change in the
recent glibc version has broken this. We've got reports of crashes
in this code due to the reason mentioned above.

This change introduces an async safe mechanism for accessing the
current Thread instance from async signal handlers. It uses a
segmented array that can grow, but never shrink. Entries for
threads are added when runtime creates a thread / attaches to an
external thread and removed when the thread dies.

The check for safety of the activation injection was further enhanced
to make sure that the ScanReaderLock is not taken. In cases it would
need to be taken, we just reject the location.

Since NativeAOT is subject to the same issue, the code to maintain the
thread id to thread instance map is placed to the minipal and shared
between coreclr and NativeAOT.

Closes dotnet#121581

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@janvorli janvorli added this to the 10.0.x milestone Feb 12, 2026
@janvorli janvorli self-assigned this Feb 12, 2026
Copilot AI review requested due to automatic review settings February 12, 2026 01:41
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

@janvorli janvorli added the Servicing-consider Issue for next servicing release review label Feb 12, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a backport of #122513 to release/10.0 that fixes a crash caused by accessing Thread Local Storage (TLS) from async signal handlers. The change introduces an async-safe, lock-free thread map that can be safely accessed from signal handlers, replacing the previous TLS-based thread lookup that could crash when interrupted during memory allocation.

Changes:

  • Introduces a lock-free, async-safe thread map implementation shared between CoreCLR and NativeAOT for Unix platforms (excluding WASM)
  • Adds minipal_get_current_thread_id_no_cache() to retrieve thread IDs without TLS access
  • Updates signal handlers and thread suspension logic to use the new async-safe lookup mechanism

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/native/minipal/thread.h Adds minipal_get_current_thread_id_no_cache() function that retrieves thread IDs without using TLS, safe for async signal handlers
src/coreclr/runtime/asyncsafethreadmap.h Header for new async-safe thread map API (shared between CoreCLR and NativeAOT)
src/coreclr/runtime/asyncsafethreadmap.cpp Implements lock-free segmented hash table for thread lookup in signal handlers
src/coreclr/vm/threads.h Declares GetThreadAsyncSafe() function for async-safe thread lookup
src/coreclr/vm/threads.cpp Implements GetThreadAsyncSafe() and integrates thread map management into SetThread() lifecycle
src/coreclr/vm/threadsuspend.cpp Updates CheckActivationSafePoint() to use async-safe thread lookup and avoid taking reader locks
src/coreclr/vm/codeman.h Adds IsManagedCodeNoLock() and updates GetScanFlags() signature
src/coreclr/vm/codeman.cpp Implements IsManagedCodeNoLock() for use when reader lock cannot be acquired
src/coreclr/vm/CMakeLists.txt Adds asyncsafethreadmap files to build for Unix (non-WASM) targets
src/coreclr/pal/src/exception/signal.cpp Refactors signal handler to always call original handler regardless of activation success
src/coreclr/nativeaot/Runtime/threadstore.h Declares GetCurrentThreadIfAvailableAsyncSafe() for NativeAOT
src/coreclr/nativeaot/Runtime/threadstore.cpp Integrates thread map management into attach/detach and implements async-safe lookup
src/coreclr/nativeaot/Runtime/thread.h Updates HijackCallback() signature to include doInlineSuspend parameter
src/coreclr/nativeaot/Runtime/thread.cpp Updates HijackCallback() to use explicit parameter instead of NULL check for inline suspend decision
src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp Updates signal handler to use async-safe thread lookup and pass explicit doInlineSuspend parameter
src/coreclr/nativeaot/Runtime/windows/PalMinWin.cpp Updates APC handlers to pass explicit doInlineSuspend parameter
src/coreclr/nativeaot/Runtime/CMakeLists.txt Adds asyncsafethreadmap files to build for Unix (non-WASM) targets

@rbhanda rbhanda modified the milestones: 10.0.x, 10.0.4 Feb 13, 2026
@rbhanda rbhanda added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Feb 13, 2026
@steveisok steveisok enabled auto-merge (squash) February 16, 2026 13:29
@steveisok
Copy link
Member

/ba-g Known issues #103347 #103347

@steveisok steveisok merged commit 0231e42 into dotnet:release/10.0 Feb 16, 2026
176 of 182 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-VM-coreclr Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants