Conversation
macrogreg
pushed a commit
that referenced
this pull request
Aug 20, 2021
andrewlock
pushed a commit
that referenced
this pull request
Jul 26, 2024
…5808) ## Summary of changes Prevent deadlock betwen signal-based profilers (walltime/manual cpu profilers) and non-signal based profilers (exception, contention....) ## Reason for change When an exception occurs, the thread can be interrupted by a signal-based profiler (walltime/manual cpu). It can be interrupted while holding the lock used to update the `dl-iterate-phdr` cache. ``` Thread 18 (LWP 995): #0 __syscall_cp_c (nr=202, u=140244538814536, v=128, w=-1, x=0, y=0, z=0) at ./arch/x86_64/syscall_arch.h:61 #1 0x00007f8dba343ccd in __futex4_cp (to=0x0, val=-1, op=128, addr=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at src/thread/__timedwait.c:24 #2 __timedwait_cp (addr=addr@entry=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, val=val@entry=-1, clk=clk@entry=0, at=at@entry=0x0, priv=priv@entry=128) at src/thread/__timedwait.c:52 #3 0x00007f8dba343d74 in __timedwait (addr=addr@entry=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, val=-1, clk=clk@entry=0, at=at@entry=0x0, priv=128) at src/thread/__timedwait.c:68 #4 0x00007f8dba3463e6 in __pthread_rwlock_timedrdlock (at=<optimized out>, rw=<optimized out>) at src/thread/pthread_rwlock_timedrdlock.c:18 #5 __pthread_rwlock_timedrdlock (rw=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, at=0x0) at src/thread/pthread_rwlock_timedrdlock.c:3 #6 0x00007f8d398f3ca8 in std::__glibcxx_rwlock_rdlock (__rwlock=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:73 #7 std::__shared_mutex_pthread::lock_shared (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:224 #8 std::shared_mutex::lock_shared (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:421 #9 std::shared_lock<std::shared_mutex>::shared_lock (this=0x7f4ca05a2ac0, __m=...) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:722 #10 LibrariesInfoCache::DlIteratePhdrImpl (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, callback=0x7f8d3997d900 <_Ux86_64_dwarf_callback>, data=0x7f4ca05a2b20) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LibrariesInfoCache.cpp:104 #11 0x00007f8d3997e4ee in _Ux86_64_dwarf_find_proc_info (as=0x7f8d39eb2a00 <local_addr_space>, ip=140246691112115, pi=0x7f4ca05a3170, need_unwind_info=1, arg=0x7f4ca05a3411) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gfind_proc_info-lsb.c:807 #12 0x00007f8d3997e690 in fetch_proc_info (c=0x7f4ca05a3018, ip=140246691112115) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:473 #13 0x00007f8d3998113d in find_reg_state (sr=0x7f4ca05a2dc0, c=0x7f4ca05a3018) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:1024 #14 _Ux86_64_dwarf_step (c=c@entry=0x7f4ca05a3018) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:1069 #15 0x00007f8d3997d13a in _Ux86_64_step (cursor=0x7f4ca05a3018) at /project/obj/libunwind-prefix/src/libunwind/src/x86_64/Gstep.c:75 #16 0x00007f8d398f55c8 in LinuxStackFramesCollector::CollectStackManually (this=this@entry=0x7f8d392dc6d0, ctx=ctx@entry=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:288 #17 0x00007f8d398f53dc in LinuxStackFramesCollector::CollectCallStackCurrentThread (this=this@entry=0x7f8d392dc6d0, ctx=ctx@entry=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:227 #18 0x00007f8d398f4672 in LinuxStackFramesCollector::CollectStackSampleSignalHandler (signal=<optimized out>, info=<optimized out>, context=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:373 #19 0x00007f8d398fb871 in ProfilerSignalManager::CallCustomHandler (this=0x7f8d39eaf928 <ProfilerSignalManager::Get(int)::signalManagers+1944>, signal=10, info=0x7f4ca05a39b0, context=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp:197 #20 ProfilerSignalManager::SignalHandler (signal=10, info=0x7f4ca05a39b0, context=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp:188 #21 <signal handler called> #22 __pthread_rwlock_unlock (rw=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at src/thread/pthread_rwlock_unlock.c:5 #23 0x00007f8d398f3bf9 in std::__glibcxx_rwlock_unlock (__rwlock=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:77 #24 std::__shared_mutex_pthread::unlock (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:208 #25 std::shared_mutex::unlock (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:417 #26 std::unique_lock<std::shared_mutex>::unlock (this=0x7f4ca05a3e20) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/bits/unique_lock.h:194 #27 std::unique_lock<std::shared_mutex>::~unique_lock (this=0x7f4ca05a3e20) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/bits/unique_lock.h:103 #28 LibrariesInfoCache::UpdateCache (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LibrariesInfoCache.cpp:88 #29 0x00007f8d398f4e59 in LinuxStackFramesCollector::CollectStackSampleImplementation (this=0x7f8d3b91bc90, pThreadInfo=0x7f4ca06b9900, pHR=0x7f8d3a63c510, selfCollect=true) at /p--Type <RET> for more, q to quit, c to continue without paging-- roject/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:100 #30 0x00007f8d399637ba in StackFramesCollectorBase::CollectStackSample (this=0x7f8d3b91bc90, pThreadInfo=0x7f4ca06b9900, pHR=0x7f4ca05a3fdc) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native/StackFramesCollectorBase.cpp:185 #31 0x00007f8d3992acb9 in ExceptionsProvider::OnExceptionThrown (this=0x7f8d392a7160, thrownObjectId=139969739182080) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native/ExceptionsProvider.cpp:149 #32 0x00007f8d39917045 in CorProfilerCallback::ExceptionThrown (this=0x7f8d392c0d20, thrownObjectId=139969739182080) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native/CorProfilerCallback.cpp:1734 ``` ## Implementation details - move the call which updates the cache after acquiring the thread lock - call Update before sending signal ## Test coverage ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. -->
gleocadie
added a commit
that referenced
this pull request
Jan 7, 2026
## Summary of changes Prevent calling into `dl_iterate_phdr` while unwinding. ## Reason for change libunwind calls `dl_iterate_phdr` to get unwinding information for each library. Calling into `dl_iterate_phdr` while unwinding may lead to dead lock. ``` #0 __syscall_cp_c (nr=202, u=125628101208320, v=128, w=-1, x=0, y=0, z=0) at ./arch/x86_64/syscall_arch.h:61 #1 0x0000724212545c3b in __futex4_cp (to=0x0, val=-1, op=128, addr=0x72421258a900 <lock>) at src/thread/__timedwait.c:24 #2 __timedwait_cp (addr=addr@entry=0x72421258a900 <lock>, val=val@entry=-1, clk=clk@entry=0, at=at@entry=0x0, priv=priv@entry=128) at src/thread/__timedwait.c:52 #3 0x0000724212545ce0 in __timedwait (addr=addr@entry=0x72421258a900 <lock>, val=-1, clk=clk@entry=0, at=at@entry=0x0, priv=128) at src/thread/__timedwait.c:68 #4 0x000072421254849e in __pthread_rwlock_timedrdlock (at=<optimized out>, rw=<optimized out>) at src/thread/pthread_rwlock_timedrdlock.c:18 #5 __pthread_rwlock_timedrdlock (rw=rw@entry=0x72421258a900 <lock>, at=at@entry=0x0) at src/thread/pthread_rwlock_timedrdlock.c:3 #6 0x0000724212548427 in __pthread_rwlock_rdlock (rw=rw@entry=0x72421258a900 <lock>) at src/thread/pthread_rwlock_rdlock.c:5 --Type <RET> for more, q to quit, c to continue without paging-- #7 0x00007242125507ab in dl_iterate_phdr (callback=0x724191d60b00 <_ULx86_64_dwarf_callback>, data=0x7200f66ba220) at ldso/dynlink.c:2362 #8 0x00007242124e6724 in dl_iterate_phdr (callback=0x724191d60b00 <_ULx86_64_dwarf_callback>, data=0x7200f66ba220) at /project/profiler/src/ProfilerEngine/Datadog.Linux.ApiWrapper/functions_to_wrap.c:383 #9 0x0000724191d61212 in _ULx86_64_dwarf_find_proc_info (as=0x724191edf760 <local_addr_space>, ip=ip@entry=125628100813076, pi=pi@entry=0x7200f66bad88, need_unwind_info=need_unwind_info@entry=1, arg=0x7200f66ba881) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gfind_proc_info-lsb.c:807 #10 0x0000724191d5cb27 in fetch_proc_info (c=0x7200f66bac30, ip=125628100813076) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:473 #11 0x0000724191d5e1ea in find_reg_state (sr=0x7200f66ba4c0, c=0x7200f66bac30) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:1024 #12 _ULx86_64_dwarf_step (c=c@entry=0x7200f66bac30) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:1069 #13 0x0000724191d5b7d7 in _ULx86_64_step (cursor=cursor@entry=0x7200f66bac30) at /project/obj/libunwind-prefix/src/libunwind/src/x86_64/Gstep.c:75 #14 0x0000724191d5c58a in trace_init_addr (rsp=<optimized out>, rbp=<optimized out>, rip=<optimized out>, cfa=<optimized out>, cursor=0x7200f66bac30, f=0x7200f650dde0) at /project/obj/libunwind-prefix/src/libunwind/src/x86_64/Gtrace.c:249 #15 trace_lookup (rsp=<optimized out>, rbp=<optimized out>, rip=<optimized out>, cfa=<optimized out>, cache=0x724191169f40, cursor=0x7200f66bac30) at /project/obj/libunwind-prefix/src/libunwind/src/x86_64/Gtrace.c:331 #16 _ULx86_64_tdep_trace (cursor=cursor@entry=0x7200f66bac30, buffer=buffer@entry=0x724191f2a4c0, size=size@entry=0x7200f66ba86c) at /project/obj/libunwind-prefix/src/libunwind/src/x86_64/Gtrace.c:449 #17 0x0000724191d5ab29 in unw_backtrace2 (buffer=0x724191f2a4c0, size=1024, uc2=0x7200f66bb540, flag=1) at /project/obj/libunwind-prefix/src/libunwind/src/mi/backtrace.c:111 #18 0x0000724191caecf6 in TimerCreateCpuProfiler::Collect (this=0x724191f99dd0, ctx=0x7200f66bb540) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/TimerCreateCpuProfiler.cpp:257 #19 0x0000724191cab041 in ProfilerSignalManager::CallCustomHandler (this=0x724191ed2810 <ProfilerSignalManager::Get(int)::signalManagers+5616>, signal=27, info=0x7200f66bb670, context=0x7200f66bb540) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp:208 #20 ProfilerSignalManager::SignalHandler (signal=27, info=0x7200f66bb670, context=0x7200f66bb540) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp:199 #21 <signal handler called> #22 0x000072421252a114 in __mmap (start=start@entry=0x0, len=len@entry=16384, prot=prot@entry=3, flags=flags@entry=34, fd=fd@entry=-1, off=off@entry=0) at src/mman/mmap.c:17 #23 0x000072421251845c in alloc_group (req=<optimized out>, sc=25) at src/malloc/mallocng/malloc.c:249 #24 alloc_slot (sc=sc@entry=25, req=req@entry=2496) at src/malloc/mallocng/malloc.c:291 #25 0x0000724212518783 in __libc_malloc_impl (n=2496) at src/malloc/mallocng/malloc.c:369 #26 0x0000724212355c2c in operator new(unsigned long) () from /usr/lib/libstdc++.so.6 ``` At frame `#26`, the thread took the malloc lock. The profiler interrupted the thread (frame `#21`). The libunwind calls `dl_iterate_phdr` (`#8`) but the thread is blocked trying to acquire a lock. This lock is hold by another thread which called `dlopen` but got block while calling into `malloc`. The `LibrariesInfoCache` was designed to prevent that situation but it actually failed. When starting the profiler `LibrariesInfoCache` instance registers a custom `dl_iterate_phdr` to libunwind. This function should be called instead of the one from the C runtime. Why it's not called though ? Libunwind is designed for remote and local unwinding. All libunwind symbols will have a prefix: `_<remote or local><arch>_<symbol_name>` Local: `_ULx86_64_xxxxxx` Remote: `_U86_64_xxxxxxx` If we include `libunwind.h`, by default, the compilation unit will use the remote symbol(s). So when we register our custom `dl_iterate_phdr`, we registered our function to the remote symbol specifying the function libunwind must call. When we call `unw_backtrace2`, by default, it use the local symbol, but since this symbol does not have a pointer to our custom implementation, it will call the default one (C Runtime `dl_iterate_phdr`). To avoid that, we have to add `#define UNW_LOCAL_ONLY` before `#include <libunwind.h>`. This will make sure we use the local symbol(s) instead. ## Implementation details - Add `#define UNW_LOCAL_ONLY` before including `libunwind.h`. ## Test coverage - Add test to check that C runtime `dl_iterate_phdr` is not called - Add test to check that the LibrariesInfoCache serves the same info as `dl_iterate_phdr`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR bumps the version number and removes the style checker from the package dependencies.