-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Redo fix lookup for current Thread in async signal handler #122513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo fix lookup for current Thread in async signal handler #122513
Conversation
The code in HijackCallback was using the fact whether the pThreadToHijack is NULL or not as an indicator of whether it should suspend inline or redirect the thread. So passing in the pThreadToHijack without other changes caused it to never suspend inline on Unix.
|
cc: @agocke |
|
Tagging subscribers to this area: @mangod9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical issue where accessing Thread-Local Storage (TLS) in async signal handlers could crash on recent glibc versions. The fix introduces an async-safe mechanism using a lock-free segmented array to map OS thread IDs to Thread instances, avoiding TLS access in signal handlers.
Key changes:
- Added async-safe thread map infrastructure using lock-free atomics
- Enhanced CheckActivationSafePoint to use async-safe thread lookup and verify reader lock safety
- Updated HijackCallback signature to explicitly control inline vs redirect suspension behavior
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/minipal/thread.h | Added minipal_get_current_thread_id_no_cache() function that retrieves thread ID without TLS caching for async-safe usage |
| src/coreclr/runtime/asyncsafethreadmap.h | New header defining async-safe thread map API for coreclr |
| src/coreclr/runtime/asyncsafethreadmap.cpp | New implementation of lock-free segmented thread map using atomic operations |
| src/coreclr/vm/threads.h | Added GetThreadAsyncSafe() declaration for async-safe thread retrieval |
| src/coreclr/vm/threads.cpp | Implemented GetThreadAsyncSafe() and integrated thread map operations into SetThread() |
| src/coreclr/vm/threadsuspend.cpp | Updated CheckActivationSafePoint to use async-safe thread lookup and verify scan lock safety |
| src/coreclr/vm/codeman.h | Added GetScanFlags() thread parameter and IsManagedCodeNoLock() declaration |
| src/coreclr/vm/codeman.cpp | Implemented IsManagedCodeNoLock() and updated GetScanFlags() to accept optional thread parameter |
| src/coreclr/vm/CMakeLists.txt | Added asyncsafethreadmap source files to coreclr build for Unix non-WASM targets |
| src/coreclr/pal/src/exception/signal.cpp | Restructured signal handler control flow (cosmetic change) |
| src/coreclr/nativeaot/Runtime/threadstore.h | Added GetCurrentThreadIfAvailableAsyncSafe() declaration |
| src/coreclr/nativeaot/Runtime/threadstore.cpp | Implemented GetCurrentThreadIfAvailableAsyncSafe() and integrated thread map into attach/detach |
| src/coreclr/nativeaot/Runtime/thread.h | Updated HijackCallback signature to include doInlineSuspend parameter |
| src/coreclr/nativeaot/Runtime/thread.cpp | Updated HijackCallback implementation to use doInlineSuspend parameter instead of null check |
| src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp | Updated ActivationHandler to use async-safe thread lookup and pass doInlineSuspend flag |
| src/coreclr/nativeaot/Runtime/windows/PalMinWin.cpp | Updated Windows activation and hijack handlers to pass doInlineSuspend flag |
| src/coreclr/nativeaot/Runtime/CMakeLists.txt | Added asyncsafethreadmap source files to NativeAOT build for Unix non-WASM targets |
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/ba-g generic wasm failure - unrelated to the changes in the PR |
This is a new version of the fix. The code in HijackCallback was using the fact whether the
pThreadToHijack is NULL or not as an indicator of whether it should
suspend inline or redirect the thread. So passing in the pThreadToHijack
without other changes caused it to never suspend inline on Unix and it
was causing hangs in System.Collections.Concurrent tests.
There are two commits - one is the original change and the second it
the fix.
The current CheckActivationSafePoint uses thread local storage to
get the current Thread instance. But this function is called from
async signal handler (the activation signal handler) and it is not
allowed to access TLS variables there because the access can allocate
and if the interrupted code was running in an allocation code, it
could crash.
There was no problem with this since .NET 1.0, but a change in the
recent glibc version has broken this. We've got reports of crashes
in this code due to the reason mentioned above.
This change introduces an async safe mechanism for accessing the
current Thread instance from async signal handlers. It uses a
segmented array that can grow, but never shrink. Entries for
threads are added when runtime creates a thread / attaches to an
external thread and removed when the thread dies.
The check for safety of the activation injection was further enhanced
to make sure that the ScanReaderLock is not taken. In cases it would
need to be taken, we just reject the location.
Since NativeAOT is subject to the same issue, the code to maintain the
thread id to thread instance map is placed to the minipal and shared
between coreclr and NativeAOT.
Closes #121581