I am running on current master and run into deadlocks occasionally so I am still trying to figure out why that is happening.
During my analysis of the goroutine dump I think I spotted a case where we would not properly shutdown the PingManager.
goroutine 6923 [semacquire, 1414 minutes]: 78 times: [[6923, 6955, 6407671, 8264, 1995338, 6985, 5699487, 2235206, 188409, 1831425, 1033199, 5585420, 5101298, 1298527, 545536, 1791165, 7811, 11501, 7226, 5840461, 8304, 6100015, 10513, 9998, 7264008, 427482, 11612, 4264510, 2840924, 26964, 2246775, 8000, 8211518, 1517528, 7207645, 6342585, 11856, 3211788, 5671775, 3811495, 8397, 5869298, 31165, 1565189, 2760834, 3320264, 826060, 5796262, 5773187, 5411665, 5574157, 1340207, 972090, 7094789, 5734595, 5466438, 6074809, 5675977, 5349029, 7288363, 3033861, 5264593, 3854985, 5360125, 6618159, 4593483, 3875138, 7845605, 3935443, 6839478, 3965553, 3309451, 7784032, 7977750, 5722528, 4147913, 6250425, 8063005]
sync.runtime_Semacquire(0x4004f19d08?)
runtime/sema.go:62 +0x2c
sync.(*WaitGroup).Wait(0x40003ab248)
sync/waitgroup.go:116 +0x74
github.com/lightningnetwork/lnd/peer.(*PingManager).Stop.func1()
github.com/lightningnetwork/lnd/peer/ping_manager.go:210 +0x34
sync.(*Once).doSlow(0x4001526d80?, 0x4013b89f20?)
sync/once.go:74 +0x100
sync.(*Once).Do(...)
sync/once.go:65
github.com/lightningnetwork/lnd/peer.(*PingManager).Stop(0x40039151a0?)
github.com/lightningnetwork/lnd/peer/ping_manager.go:208 +0x4c
github.com/lightningnetwork/lnd/peer.(*Brontide).Disconnect(0x400376ca80, {0x1c98940?, 0x4008c90080})
github.com/lightningnetwork/lnd/peer/brontide.go:1192 +0x194
github.com/lightningnetwork/lnd/peer.NewBrontide.func4({0x1c98940?, 0x4008c90060})
github.com/lightningnetwork/lnd/peer/brontide.go:579 +0x130
github.com/lightningnetwork/lnd/peer.(*PingManager).start.func1()
github.com/lightningnetwork/lnd/peer/ping_manager.go:160 +0x300
created by github.com/lightningnetwork/lnd/peer.(*PingManager).start in goroutine 6916
github.com/lightningnetwork/lnd/peer/ping_manager.go:120 +0x138
I think we will wait endlessly for the WaitGroup counter to hit zero because we will never quit the start function of the ping manager because we are not able to consume the quit channel of the PingManager
I am running on current master and run into deadlocks occasionally so I am still trying to figure out why that is happening.
During my analysis of the goroutine dump I think I spotted a case where we would not properly shutdown the
PingManager.So my goroutine dumps shows a lot of goroutines in this state:
I think we will wait endlessly for the WaitGroup counter to hit zero because we will never quit the
startfunction of the ping manager because we are not able to consume the quit channel of thePingManagerWe are here: https://github.com/lightningnetwork/lnd/blob/master/peer/ping_manager.go#L160
and waiting here:
https://github.com/lightningnetwork/lnd/blob/master/peer/ping_manager.go#L210
Could you please verify @ProofOfKeags ?