Fix pthread_mutex_trylock deadlock in jemalloc #2727
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: resolve #2726
Problem Summary:
#2692 未使用
__dl_sym的原因是,UT无法运行,报错信息:symbol lookup error: ./libbrpc.so: undefined symbol: pthread_mutex_trylock。相关issue:#2266 #1086 。报错原因总结:
libpthread.so先于libbrpc.dbg.so加载,导致使用__dl_sym RTLD_NEXT在后续加载的动态库中找不到pthread_mutex_trylock符号。解决方法有两个:
libbrpc.dbg.so先于libpthread.so加载。具体分析:
man文档提到
RTLD_NEXT的作用:在本场景下,大致意思是在从加载顺序在
libbrpc.dbg.so之后的动态库中查找pthread_mutex_*符号。那么,libbrpc.dbg.so要先于libpthread.so加载,才能找到pthread_mutex_*系列符号。在master分支下编译出brpc_channel_unittest程序用作调试。为了更好地展示,会对输出进行适当的处理(过滤、删减)。
通过
LD_DEBUG=libs查看动态库加载顺序,发现libpthread.so先于libbrpc.dbg.so加载了。同时,发现了使用
dlsym也有同样的报错,但是dlsym不会让进程退出,而是通过dlerror返回错误信息(#2726 的死锁问题是因为这一块申请内存导致的)。所以,此时
sys_pthread_mutex_trylock是NULL。UT之所以没有crash,应该是所有UT以及依赖的库都没用pthread_mutex_trylock。另一方面,没有
pthread_mutex_lock和pthread_mutex_unlock相关的报错,换而言之,它们的符号是能被找到的。那么,这两个符号来自于哪里呢?增加一行代码,方便识别出
pthread_mutex_lock和pthread_mutex_unlock符号的相关绑定信息。通过
LD_DEBUG=bindings,libs找到了,pthread_mutex_lock和pthread_mutex_unlock符号来自于libc.so.6(两个pthread_mutex_trylock报错之间的输出)。在
libc.so.6搜索pthread_mutex_*相关符号,确实没有pthread_mutex_trylock的符号。nm -D /usr/lib/x86_64-linux-gnu/libc.so.6 | grep pthread_mutex 0000000000094480 T pthread_mutex_destroy 00000000000944b0 T pthread_mutex_init 00000000000944e0 T pthread_mutex_lock 0000000000094510 T pthread_mutex_unlocklibc.so中的pthread_mutex_*相关函数应该是stub function,参考[1] [[2] [3]。在这个场景下,即使
pthread_mutex_lock和pthread_mutex_unlock使用了错误的函数,pthread_mutex_trylock是NULL,也不会影响进程运行。因为libpthread.so先加载了,这时候进程使用的pthread_mutex_*符号都来自于libpthread.so,即libbrpc.dbg.so的hook失效了。What is changed and the side effects?
Changed:
__dl_sym加载pthread_mutex_try,规避malloc库死锁问题。使用时需要满足以下其中一点:libbrpc.dbg.so先于libpthread.so加载。(UT使用了这个方法)NO_PTHREAD_MUTEX_HOOK宏关闭pthread_mutex_*相关的hook。关闭后,只是contention profiler采集不到pthread_mutex的竞争,在可接受范围内。Side effects:
Performance effects(性能影响):
Breaking backward compatibility(向后兼容性):
Check List: