-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Describe the bug (描述bug)
创建子进程,然后在子进程中调用start_brpc_server 接口,之后出现brpc::Acceptor::StartAccept和brpc::Acceptor::BeforeRecycle之间构成死锁,curl访问该监听端口,卡住。详情见如下堆栈
To Reproduce (复现方法)
1.创建子进程,然后在子进程中调用start_brpc_server 接口
2.杀掉子进程,父进程会有个监听线程,监听到子进程挂掉之后,又拉起子进程(之后会重复步骤1的过程)。
Expected behavior (期望行为)
子进程启动之后,端口能正常监听
Versions (各种版本)
OS: 基于linux内核3.10.0的自定义系统
Compiler: gcc 4.7
brpc: 2019年fork过去的版本
protobuf:
Additional context/screenshots (更多上下文/截图)
(gdb) bt
#0 0x00007fdcb176042d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fdcb175bdcb in _L_lock_812 () from /lib64/libpthread.so.0
#2 0x00007fdcb175bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fdcafe9650f in lock (this=0x55920555bc90) at /test/src/brpc/src/butil/synchronization/lock.h:55
#4 lock_guard (__m=..., this=) at /usr/include/c++/4.8.2/mutex:414
#5 brpc::Acceptor::BeforeRecycle (this=0x55920555bc20, sock=0x55920ac626c0) at /test/src/brpc/src/brpc/acceptor.cpp:325
#6 0x00007fdcafebc4ea in brpc::Socket::OnRecycle (this=0x55920ac626c0) at /test/src/brpc/src/brpc/socket.cpp:1015
#7 0x00007fdcafebccad in Dereference (this=0x238) at /test/src/brpc/src/brpc/socket_inl.h:110
#8 brpc::Socket::ReleaseAdditionalReference (this=this@entry=0x55920ac626c0) at /test/src/brpc/src/brpc/socket.cpp:783
#9 0x00007fdcafebd1ee in brpc::Socket::SetFailed (this=this@entry=0x55920ac626c0, error_code=error_code@entry=9, error_fmt=error_fmt@entry=0x7fdcb00c66c0 "Fail to ResetFileDescriptor: %s")
at /test/src/brpc/src/brpc/socket.cpp:848
#10 0x00007fdcafebdcbd in brpc::Socket::Create (options=..., id=id@entry=0x55920555bc88) at /test/src/brpc/src/brpc/socket.cpp:667
#11 0x00007fdcafe96a30 in brpc::Acceptor::StartAccept (this=0x55920555bc20, listened_fd=listened_fd@entry=3, idle_timeout_sec=-1, ssl_ctx=)
at /test/src/brpc/src/brpc/acceptor.cpp:82
#12 0x00007fdcafd99cd7 in brpc::Server::StartInternal (this=this@entry=0x5592009cf080, ip=..., port_range=..., opt=opt@entry=0x0) at /test/src/brpc/src/brpc/server.cpp:919
#13 0x00007fdcafd9b020 in brpc::Server::Start (this=this@entry=0x5592009cf080, endpoint=..., opt=opt@entry=0x0) at /test/src/brpc/src/brpc/server.cpp:997
#14 0x00007fdcb58061af in test::start_brpc_server (this=this@entry=0x5592009cf040) at /test//src/test/test_manager.cpp:194
#15 0x00007fdcb580626a in test::start (this=this@entry=0x5592009cf040) at /test//src/test/test_manager.cpp:106
#16 0x00007fdcb58073da in test::run (this=this@entry=0x5592009cf000) at /test//src/test/test.cpp:189
#17 0x00007fdcb5807a1f in test::start_work_process (this=this@entry=0x5592009cf000) at /test//src/test/test.cpp:177
#18 0x00007fdcb5808257 in test::daemon_thread (arg=0x5592009cf000) at /test//src/test/test.cpp:80
#19 0x00007fdcb1759e25 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fdcae6d834d in clone () from /lib64/libc.so.6
(gdb) f 5
#5 brpc::Acceptor::BeforeRecycle (this=0x55920555bc20, sock=0x55920ac626c0) at /test/src/brpc/src/brpc/acceptor.cpp:325
325 /test/src/brpc/src/brpc/acceptor.cpp: No such file or directory.
(gdb) p _map_mutex
$1 = {_native_handle = {__data = {__lock = 2, __count = 0, __owner = 88919, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000W[\001\000\001", '\000' <repeats 26 times>, __align = 2}}
(gdb) info thr 1
Id Target Id Frame
- 1 Thread 0x7fdc95e0f700 (LWP 88919) "test" brpc::Acceptor::BeforeRecycle (this=0x55920555bc20, sock=0x55920ac626c0) at /test/src/brpc/src/brpc/acceptor.cpp:325
(gdb)
---程序运行的日志----
2022-06-01 10:54:08.367247 - info test-5d6f30c3 W0601 10:54:08.367146 186761 socket.cpp:1219] Fail to add fd=4 into epoll: Bad file descriptor
2022-06-01 10:54:08.367722 - info test-5d6f30c3 E0601 10:54:08.367461 186746 socket.cpp:589] Fail to add SocketId=455 into EventDispatcher, fd 3 ret -1 errno 9 reason Bad file descriptor: Bad file des
criptor
2022-06-01 10:54:08.367728 - info test-5d6f30c3 E0601 10:54:08.367470 186746 socket.cpp:669] Fail to ResetFileDescriptor: Bad file descriptor
当前通过日志,暂时没有找到为何epoll_ctl失败的原因。目前只能看到这个epoll_ctl失败之后导致的死锁。
@JiaoZiLang