Skip to content

Work-around for deadlock in DbcCtl instance initialization when loading plugins.#10142

Closed
ywkaras wants to merge 0 commit intoapache:masterfrom
ywkaras:DbgCtl_deadlock
Closed

Work-around for deadlock in DbcCtl instance initialization when loading plugins.#10142
ywkaras wants to merge 0 commit intoapache:masterfrom
ywkaras:DbgCtl_deadlock

Conversation

@ywkaras
Copy link
Copy Markdown
Contributor

@ywkaras ywkaras commented Aug 2, 2023

No description provided.

@ywkaras ywkaras self-assigned this Aug 2, 2023
@ywkaras ywkaras added the Debug Support for system debugging label Aug 2, 2023
@ywkaras ywkaras added this to the 10.0.0 milestone Aug 2, 2023
@ywkaras ywkaras linked an issue Aug 2, 2023 that may be closed by this pull request
@bneradt
Copy link
Copy Markdown
Contributor

bneradt commented Aug 2, 2023

Your proposal makes sense, but I still reproduced the deadlock with this patch. Here's the relevant threads:

Thread 10 (Thread 0x7eff5bbfc6c0 (LWP 157468) "[LOG_FLUSH]"):                  
#0  futex_wait (private=0, expected=2, futex_word=0x7eff71dbda08 <_rtld_local+2568>) at ../sysdeps/nptl/futex-internal.h:146                                   
#1  __GI___lll_lock_wait (futex=futex@entry=0x7eff71dbda08 <_rtld_local+2568>, private=0) at lowlevellock.c:49                                                 
#2  0x00007eff70fcae77 in lll_mutex_lock_optimized (mutex=0x7eff71dbda08 <_rtld_local+2568>) at pthread_mutex_lock.c:48                                                                                                                                                                                                       
#3  ___pthread_mutex_lock (mutex=mutex@entry=0x7eff71dbda08 <_rtld_local+2568>) at pthread_mutex_lock.c:128           
#4  0x00007eff70f7adaa in __cxa_thread_atexit_impl (func=<optimized out>, obj=0x7eff5bbfc6a0, dso_symbol=0xd48008) at cxa_thread_atexit_impl.c:117                                                                                                                                                                            
#5  0x00007eff712d35dd in __cxxabiv1::__cxa_thread_atexit (dtor=<optimized out>, obj=<optimized out>, dso_handle=<optimized out>) at ../../../../libstdc++-v3/libsupc++/atexit_thread.cc:69
#6  0x0000000000d15a86 in get_jit_stack (data=0x0) at Regex.cc:43                                                                                              
#7  0x00007eff71b3b965 in _pcre_jit_exec (offset_count=2, offsets=0x7eff5bbfaf10, options=0, start_offset=0, length=12, subject=0xdb58d7 "log-periodic", extra_data=0x31779e0) at /usr/src/debug/pcre-8.45-1.fc38.3.x86_64/pcre_jit_compile.c:11650
#8  pcre_exec (argument_re=0x32f8160, extra_data=0x31779e0, subject=<optimized out>, length=12, start_offset=0, options=<optimized out>, offsets=0x7eff5bbfaf10, offsetcount=30) at /usr/src/debug/pcre-8.45-1.fc38.3.x86_64/pcre_exec.c:6489
#9  0x0000000000d15cca in Regex::exec (this=0x32f8260, str="log-periodic", ovector=0x7eff5bbfaf10, ovecsize=30) at Regex.cc:119
#10 0x0000000000d15c64 in Regex::exec (this=0x32f8260, str="log-periodic") at Regex.cc:111                                                                                                                                                                                                                                    
#11 0x0000000000d16014 in DFA::match (this=0x32f8140, str="log-periodic") at Regex.cc:188                                                                      
#12 0x0000000000ceb7d3 in Diags::tag_activated (this=0x31e8e90, tag=0xdb58d7 "log-periodic", mode=DiagsTagType_Debug) at Diags.cc:379                                                                                                                                                                                         
#13 0x0000000000ce9632 in DbgCtl::_new_reference (tag=0xdb58d7 "log-periodic") at DbgCtl.cc:182                                                                                                                                                                                                                               
#14 0x000000000085b76c in DbgCtl::DbgCtl (this=0x1001cf0 <Log::periodic_tasks(long)::Debug_ctl>, tag=0xdb58d7 "log-periodic") at ../../include/tscore/DbgCtl.h:36
#15 0x0000000000a63984 in Log::periodic_tasks (time_now=1691014388) at Log.cc:224              
#16 0x0000000000a6ed19 in Log::flush_thread_main () at Log.cc:1415                                                                                                                                                                                                                                                            
#17 0x0000000000a6f95d in LoggingFlushContinuation::mainEvent (this=0x3a8a400) at Log.cc:284                                                                                                                                                                                                                                  
#18 0x0000000000863ac9 in Continuation::handleEvent (this=0x3a8a400, event=1, data=0x3378820) at /var/tmp/trafficserver/iocore/eventsystem/I_Continuation.h:228
#19 0x0000000000cdf621 in EThread::execute (this=0x3c18050) at UnixEThread.cc:318                                                     
#20 0x0000000000cddf66 in spawn_thread_internal (a=0x3a8a460) at Thread.cc:79                                                                                  
#21 0x00007eff70fc7907 in start_thread (arg=<optimized out>) at pthread_create.c:444                                                                                                                                                                                                                                          
#22 0x00007eff7104d774 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100                                                                                                                                                                                        
Thread 1 (Thread 0x7eff70f37340 (LWP 157458) "traffic_server"):
#0  futex_wait (private=0, expected=2, futex_word=0x31779b0) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x31779b0, private=0) at lowlevellock.c:49
#2  0x00007eff70fcae21 in lll_mutex_lock_optimized (mutex=0x31779b0) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x31779b0) at pthread_mutex_lock.c:93
#4  0x000000000086e5e9 in __gthread_mutex_lock (__mutex=0x31779b0) at /usr/include/c++/13/x86_64-redhat-linux/bits/gthr-default.h:749
#5  0x000000000086e868 in std::mutex::lock (this=0x31779b0) at /usr/include/c++/13/bits/std_mutex.h:113
#6  0x0000000000ce9af8 in DbgCtl::_RegistryAccessor::_RegistryAccessor (this=0x7ffdb17aeb4f) at DbgCtl.cc:108
#7  0x0000000000ce953f in DbgCtl::_new_reference (tag=0x7eff70081828 "TSVConnFd") at DbgCtl.cc:165
#8  0x000000000089ccb8 in TSDbgCtlCreate (tag=0x7eff70081828 "TSVConnFd") at traffic_server/InkAPI.cc:10136
#9  0x00007eff700798d4 in __static_initialization_and_destruction_0 () at gold_tests/pluginTest/TSVConnFd/TSVConnFd.cc:449
#10 0x00007eff700799f2 in _GLOBAL__sub_I_TSVConnFd.cc(void) () at gold_tests/pluginTest/TSVConnFd/TSVConnFd.cc:864
#11 0x00007eff71d8f17f in call_init (env=0x31762b0, argv=0x7ffdb17b0f68, argc=1, l=<optimized out>) at dl-init.c:70
#12 call_init (l=<optimized out>, argc=1, argv=0x7ffdb17b0f68, env=0x31762b0) at dl-init.c:26
#13 0x00007eff71d8f27d in _dl_init (main_map=0x3a8a890, argc=1, argv=0x7ffdb17b0f68, env=0x31762b0) at dl-init.c:117
#14 0x00007eff71d8b5c2 in __GI__dl_catch_exception (exception=exception@entry=0x0, operate=operate@entry=0x7eff71d95ea0 <call_dl_init>, args=args@entry=0x7ffdb17aeda0) at dl-catch.c:211
#15 0x00007eff71d95e3c in dl_open_worker (a=a@entry=0x7ffdb17aef50) at dl-open.c:808
#16 0x00007eff71d8b523 in __GI__dl_catch_exception (exception=exception@entry=0x7ffdb17aef30, operate=operate@entry=0x7eff71d95da0 <dl_open_worker>, args=args@entry=0x7ffdb17aef50) at dl-catch.c:237
#17 0x00007eff71d961b4 in _dl_open (file=0x7ffdb17af300 "/tmp/sb/TSVConnFd/ts/plugin/TSVConnFd.so", mode=<optimized out>, caller_dlopen=0xbcc915 <plugin_dso_load(char const*, void*&, void*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+44>, nsid=<optimized out>, argc=1, argv=0x7ffdb17b0f68, env=0x31762b0) at dl-open.c:884
#18 0x00007eff70fc36d4 in dlopen_doit (a=a@entry=0x7ffdb17af200) at dlopen.c:56
#19 0x00007eff71d8b523 in __GI__dl_catch_exception (exception=exception@entry=0x7ffdb17af140, operate=0x7eff70fc3670 <dlopen_doit>, args=0x7ffdb17af200) at dl-catch.c:237
#20 0x00007eff71d8b679 in _dl_catch_error (objname=0x7ffdb17af1a8, errstring=0x7ffdb17af1b0, mallocedp=0x7ffdb17af1a7, operate=<optimized out>, args=<optimized out>) at dl-catch.c:256
#21 0x00007eff70fc31b3 in _dlerror_run (operate=operate@entry=0x7eff70fc3670 <dlopen_doit>, args=args@entry=0x7ffdb17af200) at dlerror.c:138
#22 0x00007eff70fc378f in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at dlopen.c:71
#23 ___dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:81
#24 0x0000000000bcc915 in plugin_dso_load (path=0x7ffdb17af300 "/tmp/sb/TSVConnFd/ts/plugin/TSVConnFd.so", handle=@0x7ffdb17af2d8: 0xd11604 <Layout::relative[abi:cxx11](std::basic_string_view<char, std::char_traits<char> >)+80>, init=@0x7ffdb17af2d0: 0x0, error="") at Plugin.cc:105
#25 0x0000000000bcce9c in single_plugin_init (argc=3, argv=0x7ffdb17b0560, validateOnly=false) at Plugin.cc:153
#26 0x0000000000bcdbe3 in plugin_init (validateOnly=false) at Plugin.cc:351
#27 0x00000000008c3e07 in main (argv=0x7ffdb17b0f68) at traffic_server/traffic_server.cc:2160

Note that thread 10 is still trying to initialize the PCRE type for the thread here:

#13 0x0000000000ce9632 in DbgCtl::_new_reference (tag=0xdb58d7 "log-periodic") at DbgCtl.cc:182                                                                                                                                                                                                                               

Comment thread src/tscore/DbgCtl.cc Outdated
// The work-around here is to make sure, in each thread, the Regex thread_local variable is initialized
// before any locks of the Registry mutex.
if (thread_local bool done{false}; !done) {
Regex r;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems to deadlock.

Do we need to make this Regex thread local?

@ywkaras ywkaras marked this pull request as draft August 2, 2023 22:42
@ywkaras ywkaras closed this Aug 3, 2023
@ywkaras ywkaras deleted the DbgCtl_deadlock branch August 3, 2023 02:08
@ywkaras ywkaras restored the DbgCtl_deadlock branch August 3, 2023 02:08
@zwoop zwoop removed this from the 10.0.0 milestone Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Debug Support for system debugging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock between DbgCtl and PCRE

3 participants