Skip to content

使用h2:grpc当server端服务(grpc)持续不可用时,client端(brpc)陷入死循环 #1666

@romiguan

Description

@romiguan

Describe the bug (描述bug)
使用h2:grpc时,server端服务(grpc)持续不可用,client端(brpc)陷入死循环。grpc服务不可用时,client端出错日志如下:

E0108 07:08:32.286566 111758 xxx_client.cc:166] call xxx server failed, Request to x.x.x.x:52618 failed: [E2001][11.18.42.196:52618][E112]xxx_server response :[E112]Not connected to x.x.x.x:8000 yet, server_id=xxxx [R1][E112]Not connected to x.x.x.x:8000 yet, server_id=x.x.x.x [R2][E112]Not connected to x.x.x.x:8000 yet, server_id=x.x.x.x [R3][E112]Not connected to x.x.x.x:8000 yet, server_id=x.x.x.x

当时已经没有流量了,但client端CPU一直在98%左右无法恢复,pstack输出如下:

大量线程都卡住在这个地方,但实际已经没有任何流量了,多台机器都有这个问题。

Thread 494 (Thread 0x7f6fa67c6700 (LWP 491633)):
#0 0x0000000009aa5cc0 in load (__m=std::memory_order_acquire, this=0x7f8767b88080) at /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7/../../../../include/c++/7/bits/atomic_base.h:396
#1 steal (val=0x7f6fa67c2888, this=0x7f8767b88080) at external/brpc/src/bthread/work_stealing_queue.h:116
#2 bthread::TaskControl::steal_task (this=0x7f9eef03f000, tid=tid@entry=0x7f6fa67c2888, seed=seed@entry=0x7f88a4714050, offset=) at external/brpc/src/bthread/task_control.cpp:347
#3 0x0000000009a9db10 in steal_task (tid=0x7f6fa67c2888, this=0x7f88a4714000) at external/brpc/src/bthread/task_group.h:224
#4 bthread::TaskGroup::wait_task (this=this@entry=0x7f88a4714000, tid=tid@entry=0x7f6fa67c2888) at external/brpc/src/bthread/task_group.cpp:123
#5 0x0000000009aa3a6f in bthread::TaskGroup::run_main_task (this=this@entry=0x7f88a4714000) at external/brpc/src/bthread/task_group.cpp:150
#6 0x0000000009aa702d in bthread::TaskControl::worker_thread (arg=0x7f9eef03f000) at external/brpc/src/bthread/task_control.cpp:73
#7 0x00007f9fed8aadc5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f9febe9aced in clone () from /lib64/libc.so.6

Thread 549 (Thread 0x7f6fc1ffd700 (LWP 491578)):
#0 0x0000000009aa5ca8 in bthread::TaskControl::steal_task (this=0x7f9eef03f000, tid=tid@entry=0x7f994d7f7cc8, seed=seed@entry=0x7f9fb3c0d1d0, offset=) at external/brpc/src/bthread/task_control.cpp:344
#1 0x0000000009aa4266 in steal_task (tid=0x7f994d7f7cc8, this=0x7f9fb3c0d180) at external/brpc/src/bthread/task_group.h:224
#2 bthread::TaskGroup::sched (pg=pg@entry=0x7f994d7f7d48) at external/brpc/src/bthread/task_group.cpp:590
#3 0x0000000009aa43b0 in bthread::TaskGroup::usleep (pg=pg@entry=0x7f994d7f7d48, timeout_us=timeout_us@entry=100000) at external/brpc/src/bthread/task_group.cpp:827
#4 0x0000000009a98b4c in bthread_usleep (microseconds=microseconds@entry=100000) at external/brpc/src/bthread/bthread.cpp:358
#5 0x0000000009839bf0 in brpc::policy::XXXNamingService::RunNamingService (this=0x7f86af3f3f50, service_name=0x7f86aad8af18 "service_xxxx", actions=0x7f86ac6251e0) at external/brpc/src/brpc/policy/xxx_naming_service.cpp:111
#6 0x00000000097cdbca in brpc::NamingServiceThread::Run (this=0x7f86ac625140) at external/brpc/src/brpc/details/naming_service_thread.cpp:365
#7 0x00000000097cdcf9 in brpc::NamingServiceThread::RunThis (arg=) at external/brpc/src/brpc/details/naming_service_thread.cpp:268
#8 0x0000000009aa3207 in bthread::TaskGroup::task_runner (skip_remained=) at external/brpc/src/bthread/task_group.cpp:309
#9 0x0000000009aba771 in bthread_make_fcontext ()
#10 0x0000000000000000 in ?? ()

To Reproduce (复现方法)

Expected behavior (期望行为)

Versions (各种版本)
OS:
Compiler:
brpc:
protobuf:

Additional context/screenshots (更多上下文/截图)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugthe code does not work as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions