-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Description
Describe the bug
Recently we found BE occasionally crash inside brpc with the following stacktrace.
#0 dec_ref (this=0x2000000000134) at /home/kangkaisen/palo/thirdparty/src/incubator-brpc-0.9.5/src/butil/iobuf.cpp:248
#1 butil::IOBuf::clear (this=this@entry=0x7f0e2f176830) at /home/kangkaisen/palo/thirdparty/src/incubator-brpc-0.9.5/src/butil/iobuf.cpp:723
#2 0x0000000002189662 in ~IOBuf (this=0x7f0e2f176830, __in_chrg=<optimized out>) at /home/kangkaisen/palo/thirdparty/src/incubator-brpc-0.9.5/src/butil/iobuf.h:108
#3 brpc::Controller::IssueRPC (this=this@entry=0x4d70ba788, start_realtime_us=start_realtime_us@entry=1576230170582720)
at /home/kangkaisen/palo/thirdparty/src/incubator-brpc-0.9.5/src/brpc/controller.cpp:1096
#4 0x0000000002180977 in brpc::Channel::CallMethod (this=0x692ed700, method=0x66dca380, controller_base=0x4d70ba788, request=0x8571315a0, response=0x4d70ba9d8, done=0x4d70ba780)
at /home/kangkaisen/palo/thirdparty/src/incubator-brpc-0.9.5/src/brpc/channel.cpp:529
#5 0x000000000115b31f in palo::PInternalService_Stub::transmit_data (this=<optimized out>, controller=0x4d70ba788, request=0x8571315a0, response=0x4d70ba9d8, done=0x4d70ba780)
at /home/kangkaisen/palo/gensrc/build/gen_cpp/palo_internal_service.pb.cc:299
#6 0x00000000013a4651 in doris::DataStreamSender::Channel::send_batch (this=this@entry=0x8571314a0, batch=batch@entry=0x0, eos=eos@entry=true)
at /home/kangkaisen/palo/be/src/runtime/data_stream_sender.cpp:232
#7 0x00000000013a515c in doris::DataStreamSender::Channel::close_internal (this=0x8571314a0) at /home/kangkaisen/palo/be/src/runtime/data_stream_sender.cpp:289
#8 0x00000000013a5309 in close (state=0x39763a600, this=<optimized out>) at /home/kangkaisen/palo/be/src/runtime/data_stream_sender.cpp:296
#9 doris::DataStreamSender::close (this=0x7305669c0, state=0x39763a600, exec_status=...) at /home/kangkaisen/palo/be/src/runtime/data_stream_sender.cpp:607
#10 0x0000000000e944cc in doris::PlanFragmentExecutor::open_internal (this=this@entry=0x5e61ab670) at /home/kangkaisen/palo/be/src/runtime/plan_fragment_executor.cpp:348
#11 0x0000000000e946b4 in doris::PlanFragmentExecutor::open (this=this@entry=0x5e61ab670) at /home/kangkaisen/palo/be/src/runtime/plan_fragment_executor.cpp:281
#12 0x0000000000e23797 in doris::FragmentExecState::execute (this=0x5e61ab600) at /home/kangkaisen/palo/be/src/runtime/fragment_mgr.cpp:204
#13 0x0000000000e24f9c in doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) (this=0x6ebc000, exec_state=..., cb=...)
at /home/kangkaisen/palo/be/src/runtime/fragment_mgr.cpp:368
#14 0x0000000000e28364 in operator() (a2=<error reading variable: access outside bounds of object referenced via synthetic pointer>, a1=..., p=<optimized out>, this=<optimized out>)
at /home/kangkaisen/palo/thirdparty/installed/include/boost/bind/mem_fn_template.hpp:280
#15 operator()<boost::_mfi::mf2<void, doris::FragmentMgr, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)> >, boost::_bi::list0> (a=<synthetic pointer>, f=...,
this=<optimized out>) at /home/kangkaisen/palo/thirdparty/installed/include/boost/bind/bind.hpp:398
#16 operator() (this=<optimized out>) at /home/kangkaisen/palo/thirdparty/installed/include/boost/bind/bind.hpp:1294
#17 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, doris::FragmentMgr, std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)> >, boost::_bi::list3<boost::_bi::value<doris::FragmentMgr*>, boost::_bi::value<std::shared_ptr<doris::FragmentExecState> >, boost::_bi::value<std::function<void (doris::PlanFragmentExecutor*)> > > >, void>::invoke(boost::detail::function::function_buffer&) (function_obj_ptr=...) at /home/kangkaisen/palo/thirdparty/installed/include/boost/function/function_template.hpp:159
#18 0x0000000000ddf08a in operator() (this=0x7f0e2f177020) at /home/kangkaisen/palo/thirdparty/installed/include/boost/function/function_template.hpp:759
#19 doris::ThreadPool::work_thread (this=0x6ebc080, thread_id=<optimized out>) at /home/kangkaisen/palo/be/src/util/thread_pool.hpp:120
#20 0x00000000016f194d in thread_proxy ()
#21 0x00007f0e74130dd5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f0e73535ead in clone () from /lib64/libc.so.6
Content of the problematic butil::IOBuf is pasted below, both 0x2000000000134 and 0x30000000002a6 are invalid memory locations causing the segment fault we saw.
{_bv = {magic = 0, start = 400949248, refs = 0x2000000000134, nref = 0, cap_mask = 2281504768,
nbytes = 844424930132646}, _sv = {refs = {{offset = 0, length = 400949248, block = 0x2000000000134}, {offset = 0, length = 2281504768, block = 0x30000000002a6}}}}
The context of DataStreamSender::Channel::send_batch is that _packet_seq == 1 which means the DataStreamSender::Channel is closed right after init.
What's also strange is that _closure->_refs == 1, does is mean the response of the RPC has been received?
(gdb) p _closure->_refs
$13 = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 1}, <No data fields>}
(gdb) p *_closure
$14 = {<google::protobuf::Closure> = {<No data fields>}, cntl = {<google::protobuf::RpcController> = {<No data fields>}, static FLAGS_IGNORE_EOVERCROWDED = 1, static FLAGS_SECURITY_MODE = 2,
static FLAGS_ADDED_CONCURRENCY = 4, static FLAGS_READ_PROGRESSIVELY = 8, static FLAGS_PROGRESSIVE_READER = 16, static FLAGS_BACKUP_REQUEST = 32, static FLAGS_DESTROY_CID_IN_DONE = 128,
static FLAGS_CLOSE_CONNECTION = 256, static FLAGS_LOG_ID = 512, static FLAGS_REQUEST_CODE = 1024, static FLAGS_PB_BYTES_TO_BASE64 = 2048, static FLAGS_ALLOW_DONE_TO_RUN_IN_PLACE = 4096,
static FLAGS_USED_BY_RPC = 8192, static FLAGS_REQUEST_WITH_AUTH = 32768, static FLAGS_PB_JSONIFY_EMPTY_ARRAY = 65536, static FLAGS_ENABLED_CIRCUIT_BREAKER = 131072, _span = 0x0, _flags = 10240,
_error_code = 0, _error_text = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0x4d70ba7b0 ""}, _M_string_length = 0, {_M_local_buf = "\000\377\377\377\377\377\377\377\000\000\000\000\000\000\000", _M_allocated_capacity = 18446744073709551360}}, _remote_side = {ip = {
s_addr = 713365002}, port = 8060}, _local_side = {ip = {s_addr = 361312522}, port = 54256}, _session_local_data = 0x0, _server = 0x0, _oncancel_id = {value = 0}, _auth_context = 0x0,
_mongo_session_data = {px = 0x0}, _rpc_dump_meta = 0x0, _request_protocol = brpc::PROTOCOL_BAIDU_STD, _max_retry = 3, _retry_policy = 0x0, _correlation_id = {value = 501484676562425},
_connection_type = brpc::CONNECTION_TYPE_SINGLE, _fail_limit = -123456789, _pipelined_count = 0, _timeout_ms = 20000, _connect_timeout_ms = 200, _backup_request_ms = -1, _abstime_us = 1576230190582720,
_timeout_id = 0, _begin_time_us = 1576230170582720, _end_time_us = 1576230170584451, _tos = 0, _preferred_index = 1, _request_compress_type = brpc::COMPRESS_TYPE_NONE,
_response_compress_type = brpc::COMPRESS_TYPE_NONE, _log_id = 0, _pchan_sub_count = 0, _response = 0x4d70ba9d8, _done = 0x4d70ba780, _sender = 0x0, _request_code = 0, _single_server_id = 640, _lb = {
px = 0x0}, _tmp_completion_info = {id = {value = 0}, responded = false}, _current_call = {nretry = 0, need_feedback = false, enable_circuit_breaker = false, touched_by_stream_creator = false,
peer_id = 640, begin_time_us = 1576230170582720, sending_sock = {_M_t = {
_M_t = {<std::_Tuple_impl<0, brpc::Socket*, brpc::SocketDeleter>> = {<std::_Tuple_impl<1, brpc::SocketDeleter>> = {<std::_Head_base<1, brpc::SocketDeleter, true>> = {<brpc::SocketDeleter> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, brpc::Socket*, false>> = {_M_head_impl = 0x0}, <No data fields>}, <No data fields>}}}, stream_user_data = 0x0}, _unfinished_call = 0x0,
_accessed = 0x0, _stream_creator = 0x0,
_pack_request = 0x21b69c0 <brpc::policy::PackRpcRequest(butil::IOBuf*, brpc::SocketMessage**, unsigned long, google::protobuf::MethodDescriptor const*, brpc::Controller*, butil::IOBuf const&, brpc::Authenticator const*)>, _method = 0x66dca380, _auth = 0x0, _request_buf = {static DEFAULT_BLOCK_SIZE = 8192, static INITIAL_CAP = 32, static INVALID_AREA = 0, {_bv = {magic = 6100, start = 34, refs = 0x4dae30000,
nref = 0, cap_mask = 0, nbytes = 0}, _sv = {refs = {{offset = 6100, length = 34, block = 0x4dae30000}, {offset = 0, length = 0, block = 0x0}}}}}, _idl_names = {request_name = 0x2f5d3f0 "req",
response_name = 0x2aed848 "res"}, _idl_result = 12345678987654321, _http_request = 0x0, _http_response = 0x0, _request_attachment = {static DEFAULT_BLOCK_SIZE = 8192, static INITIAL_CAP = 32,
static INVALID_AREA = 0, {_bv = {magic = 0, start = 0, refs = 0x0, nref = 0, cap_mask = 0, nbytes = 0}, _sv = {refs = {{offset = 0, length = 0, block = 0x0}, {offset = 0, length = 0, block = 0x0}}}}},
_response_attachment = {static DEFAULT_BLOCK_SIZE = 8192, static INITIAL_CAP = 32, static INVALID_AREA = 0, {_bv = {magic = 0, start = 0, refs = 0x0, nref = 0, cap_mask = 0, nbytes = 0}, _sv = {refs = {{
offset = 0, length = 0, block = 0x0}, {offset = 0, length = 0, block = 0x0}}}}}, _wpa = {px = 0x0}, _rpa = {px = 0x0}, _request_stream = 18446744073709551615,
_response_stream = 18446744073709551615, _remote_stream_settings = 0x0, _thrift_method_name = {static npos = 18446744073709551615,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x4d70ba9c8 ""}, _M_string_length = 0, {
_M_local_buf = "\000\000\000\000\000\000\000\000`\005\000\000\000\000\000", _M_allocated_capacity = 0}}}, result = {<google::protobuf::Message> = {<No data fields>}, static kIndexInFileMessages = 1,
static kStatusFieldNumber = 1,
_internal_metadata_ = {<google::protobuf::internal::InternalMetadataWithArenaBase<google::protobuf::UnknownFieldSet, google::protobuf::internal::InternalMetadataWithArena>> = {ptr_ = 0x0,
static kPtrTagMask = 1, static kPtrValueMask = -2}, <No data fields>}, _has_bits_ = {has_bits_ = {0}}, _cached_size_ = 0, status_ = 0x0}, _refs = {<std::__atomic_base<int>> = {
static _S_alignment = 4, _M_i = 1}, <No data fields>}}
To Reproduce
Right row I can't find a stable way to reproduce the bug. The version we use is 0.10 with #1830 patched.
acelyc111 and hxianshun
Metadata
Metadata
Assignees
Labels
No labels