We have some 7.0.0 boxes, which ends up completely wedged, where all ET_NET threads get stuck on the same lock (so, a deadlock):
#6 HostDBProcessor::getbyname_imm (this=<optimized out>, cont=cont@entry=0x2ab037b1d420, process_hostdb_info=<optimized out>, hostname=<optimized out>, len=<optimized out>, opt=...) at HostDB.cc:816
#6 HostDBProcessor::getbyname_imm (this=<optimized out>, cont=cont@entry=0x2aabc1e66a00, process_hostdb_info=<optimized out>, hostname=<optimized out>, len=<optimized out>, opt=...) at HostDB.cc:816
...
The trace is always the same in every thread:
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00002aaaad73e5d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00002aaaad73e4a7 in __pthread_mutex_lock (mutex=0x2aaab098a290) at pthread_mutex_lock.c:61
#3 0x00002aaaaadca986 in ink_mutex_acquire (m=0x2aaab098a290) at ../../lib/ts/ink_mutex.h:90
#4 Mutex_lock (t=0x2aaab160db40, m=0x2aaab098a280) at ../../iocore/eventsystem/I_Lock.h:410
#5 MutexLock::MutexLock (t=0x2aaab160db40, am=0x2aaab098a280, this=0x2aaab470a890) at ../../iocore/eventsystem/I_Lock.h:497
#6 HostDBProcessor::getbyname_imm (this=<optimized out>, cont=cont@entry=0x2aab91432580, process_hostdb_info=<optimized out>, hostname=<optimized out>, len=<optimized out>, opt=...) at HostDB.cc:816
#7 0x00002aaaaacae21c in HttpSM::do_hostdb_lookup (this=this@entry=0x2aab91432580) at HttpSM.cc:4133
#8 0x00002aaaaacc0093 in HttpSM::set_next_state (this=0x2aab91432580) at HttpSM.cc:7248
#9 0x00002aaaaacad47a in HttpSM::call_transact_and_set_next_state (this=this@entry=0x2aab91432580, f=f@entry=0x0) at HttpSM.cc:7111
#10 0x00002aaaaacb7baf in HttpSM::handle_api_return (this=0x2aab91432580) at HttpSM.cc:1604
#11 0x00002aaaaacba5eb in HttpSM::state_api_callout (this=0x2aab91432580, event=0, data=0x0) at HttpSM.cc:1542
#12 0x00002aaaaacbf62b in HttpSM::set_next_state (this=0x2aab91432580) at HttpSM.cc:7144
#13 0x00002aaaaacad47a in HttpSM::call_transact_and_set_next_state (this=this@entry=0x2aab91432580, f=f@entry=0x0) at HttpSM.cc:7111
#14 0x00002aaaaacb9910 in HttpSM::state_hostdb_lookup (this=0x2aab91432580, event=500, data=0x2aebe3144800) at HttpSM.cc:2217
#15 0x00002aaaaacc165d in HttpSM::main_handler (this=0x2aab91432580, event=500, data=0x2aebe3144800) at HttpSM.cc:2661
#16 0x00002aaaaadc7f37 in Continuation::handleEvent (data=0x2aebe3144800, event=500, this=0x2aab91432580) at ../../iocore/eventsystem/I_Continuation.h:153
#17 reply_to_cont (cont=0x2aab91432580, r=0x2aebe3144800, is_srv=<optimized out>) at HostDB.cc:474
#18 0x00002aaaaadcc79d in HostDBContinuation::dnsEvent (this=<optimized out>, event=<optimized out>, e=<optimized out>) at HostDB.cc:1450
#19 0x00002aaaaade3821 in Continuation::handleEvent (data=<optimized out>, event=600, this=<optimized out>) at ../../iocore/eventsystem/I_Continuation.h:153
#20 DNSEntry::postEvent (this=this@entry=0x2aaab76b4e00) at DNS.cc:1269
#21 0x00002aaaaade880b in dns_result (h=h@entry=0x2aaabafc9ec0, e=e@entry=0x2aaab76b4e00, ent=<optimized out>, ent@entry=0x2aaaee3aa440, retry=retry@entry=false) at DNS.cc:1221
#22 0x00002aaaaadeb189 in dns_process (len=<optimized out>, buf=0x2aaaee3aa440, handler=0x2aaabafc9ec0) at DNS.cc:1587
#23 DNSHandler::recv_dns (this=this@entry=0x2aaabafc9ec0) at DNS.cc:782
#24 0x00002aaaaadebac9 in DNSHandler::mainEvent (this=0x2aaabafc9ec0, event=<optimized out>, e=<optimized out>) at DNS.cc:794
#25 0x00002aaaaaf0758e in Continuation::handleEvent (data=0x2aaab1788980, event=5, this=<optimized out>) at I_Continuation.h:153
#26 EThread::process_event (calling_code=5, e=0x2aaab1788980, this=0x2aaab160db40) at UnixEThread.cc:143
#27 EThread::execute (this=0x2aaab160db40) at UnixEThread.cc:270
#28 0x00002aaaaaf06136 in spawn_thread_internal (a=0x2aaab09981f0) at Thread.cc:84
#29 0x00002aaaad73caa1 in start_thread (arg=0x2aaab470c700) at pthread_create.c:301
#30 0x00002aaaae5f393d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
We're not sure if this relates to HostDB sync or not, but the boxes we encountered this on, did have syncing on.
We have some 7.0.0 boxes, which ends up completely wedged, where all ET_NET threads get stuck on the same lock (so, a deadlock):
The trace is always the same in every thread:
We're not sure if this relates to HostDB sync or not, but the boxes we encountered this on, did have syncing on.