-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
【问题现象】
我们目前通过bvar来采集metric信息,然后通过prometheus来导出metirc。但是现在进程启动的时候直接挂掉了,查看堆栈信息,发现挂在 describe_exposed ,堆栈信息如下:
【堆栈】
Thread 7 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f317effd700 (LWP 947843)]
0x00007f31957441b4 in bvar::Variable::describe_exposed (name="bvar_sampler_collector_usage", os=..., quote_string=true, display_filter=bvar::DISPLAY_ON_PLAIN_TEXT)
at external/com_github_brpc_brpc/src/bvar/variable.cpp:256
256 external/com_github_brpc_brpc/src/bvar/variable.cpp: No such file or directory.
(gdb) bt
#0 0x00007f31957441b4 in bvar::Variable::describe_exposed (name="bvar_sampler_collector_usage", os=..., quote_string=true, display_filter=bvar::DISPLAY_ON_PLAIN_TEXT)
at external/com_github_brpc_brpc/src/bvar/variable.cpp:256
#1 0x00007f3195744b7e in bvar::Variable::dump_exposed (dumper=0x7f315fbdb900, poptions=0x0) at external/com_github_brpc_brpc/src/bvar/variable.cpp:513
#2 0x00007f319627908e in brpc::PrometheusMetricsService::default_method (this=0x56544725da70, cntl_base=0x7f3174132770, done=0x7f3174132bb0)
at external/com_github_brpc_brpc/src/brpc/builtin/prometheus_metrics_service.cpp:186
#3 0x00007f3194f8a195 in brpc::metrics::CallMethod (this=0x56544725da70, method=0x565447249a40, controller=0x7f3174132770, request=0x7f3174132b70, response=0x7f3174132b90,
done=0x7f3174132bb0) at bazel-out/k8-dbg/genfiles/external/com_github_brpc_brpc/src/brpc/builtin_service.pb.cc:10526
#4 0x00007f31963433a7 in brpc::policy::ProcessHttpRequest (msg=0x7f3174131a80) at external/com_github_brpc_brpc/src/brpc/policy/http_rpc_protocol.cpp:1483
#5 0x00007f31962ea636 in brpc::ProcessInputMessage (void_arg=0x7f3174131a80) at external/com_github_brpc_brpc/src/brpc/input_messenger.cpp:133
#6 0x00007f31962ecd54 in brpc::RunLastMessage::operator() (this=0x7f315fbdbd88, last_msg=0x7f3174131a80) at external/com_github_brpc_brpc/src/brpc/input_messenger.cpp:139
#7 0x00007f31962ecebb in std::unique_ptr<brpc::InputMessageBase, brpc::RunLastMessage>::~unique_ptr (this=0x7f315fbdbd88, __in_chrg=)
at /usr/include/c++/6/bits/unique_ptr.h:239
#8 0x00007f31962eb39f in brpc::InputMessenger::OnNewMessages (m=0x7f3180128880) at external/com_github_brpc_brpc/src/brpc/input_messenger.cpp:331
#9 0x00007f31963dfbbd in brpc::Socket::ProcessEvent (arg=0x7f3180128880) at external/com_github_brpc_brpc/src/brpc/socket.cpp:1108
#10 0x00007f31958478be in bthread::TaskGroup::task_runner (skip_remained=0) at external/com_github_brpc_brpc/src/bthread/task_group.cpp:293
#11 0x00007f31958175b1 in bthread_make_fcontext () at /usr/include/c++/6/typeinfo:100
#12 0x00010102464c457f in ?? ()
#13 0x0000000000000000 in ?? ()
【分析】
就这个问题在ISSUES里面找了一下,存在类似的issue:
#697
#208
里面的描述可能原因是fork导致的,我们的代码中也确实用到了fork,但是仍然有几点疑问?
【疑问】
1.通过注释下面两行代码,就不会有问题了,请问bvar_sampler_collector_usage这个指标是什么含义?

3.SamplerCollector的线程是做什么事情的,如果fork出来的子进程的SamplerCollector的线程不会运行,会对其他监控指标产生什么影响?
3.有什么办法将线程的创建放在main之后?
【补充】
另外如果进程是fork出来的,那么prometheus采集的时候是不是应该百分百挂掉才对?但是我们在试验的时候发现,如果先启动进程,然后启动prometheus好像就不会有问题;但是如果先起的prometheus,然后去起进程就会挂掉,这点也比较疑惑。
还请相关专家能帮忙分析解答一下,谢谢。