-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
All HTTP requests from the webserver port of BE are processed in the HTTP thread pool, including the stream load tasks. Sometimes stream load tasks occupy all the threads in HTTP thread pool and last for a long time, which results in other HTTP requests are blocked.
Case:
In our cluster, Sometimes prometheus will generate false alarms that BE is down. After problem tracing, we found all the threads in HTTP thread pool are occupied by stream load tasks and the operation of prometheus pulling metrics from BE is blocked. So, prometheus will mistakenly think BE has died.
Thread number in HTTP thread pool is 32 and all the threads are occupied by stream load tasks.

Throuht checking stack, we found all the stream load threads are waitting for the load execution plan to finish and the thread can not be released. As shown below.
#0 0x00007fae2a7b7bf9 in syscall () from /lib64/libc.so.6
#1 0x00000000027afb68 in std::__atomic_futex_unsigned_base::_M_futex_wait_until (this=this@entry=0x4eb309130, __addr=__addr@entry=0x4eb309130, __val=2147483648, __has_timeout=__has_timeout@entry=false, __s=..., __s@entry=..., __ns=..., __ns@entry=...) at ../../../.././libstdc++-v3/src/c++11/futex.cc:55
#2 0x0000000001484ce0 in _M_load_and_test_until (__ns=..., __s=..., __has_timeout=<optimized out>, __mo=<optimized out>, __equal=<optimized out>, __operand=<optimized out>, __assumed=<optimized out>, this=<optimized out>) at /usr/include/c++/7.3.0/bits/atomic_futex.h:102
#3 _M_load_and_test (__mo=<optimized out>, __equal=<optimized out>, __operand=<optimized out>, __assumed=<optimized out>, this=<optimized out>) at /usr/include/c++/7.3.0/bits/atomic_futex.h:122
#4 _M_load_when_equal (__mo=std::memory_order_acquire, __val=1, this=0x4eb309130) at /usr/include/c++/7.3.0/bits/atomic_futex.h:162
#5 wait (this=0x4eb309120) at /usr/include/c++/7.3.0/future:337
#6 _M_get_result (this=0x7c4100cb8) at /usr/include/c++/7.3.0/future:717
#7 get (this=0x7c4100cb8) at /usr/include/c++/7.3.0/future:796
#8 doris::StreamLoadAction::_handle (this=this@entry=0x86603940, ctx=ctx@entry=0x7c4100000) at /builds/olap/doris/be/src/http/action/stream_load.cpp:157
#9 0x00000000014851ae in doris::StreamLoadAction::handle (this=0x86603940, req=0x462a00360) at /builds/olap/doris/be/src/http/action/stream_load.cpp:111
#10 0x0000000001bd4d6c in evhttp_handle_request (req=0xfdf9e54a0, arg=<optimized out>) at http.c:3454
#11 0x0000000001bd29f8 in evhttp_read_body (evcon=0xa26bb8d80, req=0xfdf9e54a0) at http.c:1103
#12 0x0000000001bd58e6 in bufferevent_trigger_nolock_ (options=0, iotype=2, bufev=0x47ddd9b00) at bufferevent-internal.h:411
#13 bufferevent_readcb (fd=26653, event=<optimized out>, arg=0x47ddd9b00) at bufferevent_sock.c:219
#14 0x0000000001bc375f in event_persist_closure (ev=<optimized out>, base=0x7d8db340) at event.c:1608
#15 event_process_active_single_queue (base=base@entry=0x7d8db340, activeq=0x253f5530, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1667
#16 0x0000000001bc407f in event_process_active (base=0x7d8db340) at event.c:1768
#17 event_base_loop (base=base@entry=0x7d8db340, flags=flags@entry=0) at event.c:1991
#18 0x0000000001bc430c in event_base_dispatch (event_base=event_base@entry=0x7d8db340) at event.c:1802
#19 0x0000000001456393 in operator() (__closure=0x81c685d8) at /builds/olap/doris/be/src/http/ev_http_server.cpp:105
#20 std::_Function_handler<void(), doris::EvHttpServer::start()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/7.3.0/bits/std_function.h:316
#21 0x00000000011aa902 in operator() (this=0x81c685d8) at /usr/include/c++/7.3.0/bits/std_function.h:706
#22 run (this=0x81c685d0) at /builds/olap/doris/be/src/util/threadpool.cpp:42
#23 doris::ThreadPool::dispatch_thread (this=0x7e4ba780) at /builds/olap/doris/be/src/util/threadpool.cpp:551
#24 0x00000000011a2768 in operator() (this=0x7e8f8798) at /usr/include/c++/7.3.0/bits/std_function.h:706
#25 doris::Thread::supervise_thread (arg=0x7e8f8780) at /builds/olap/doris/be/src/util/thread.cpp:385
#26 0x00007fae2a4b1dc5 in start_thread () from /lib64/libpthread.so.0
#27 0x00007fae2a7bd73d in clone () from /lib64/libc.so.6