Skip to content

[bug]: PingManager not shutting down correctly #8379

@ziggie1984

Description

@ziggie1984

I am running on current master and run into deadlocks occasionally so I am still trying to figure out why that is happening.

During my analysis of the goroutine dump I think I spotted a case where we would not properly shutdown the PingManager.

So my goroutine dumps shows a lot of goroutines in this state:

goroutine 6923 [semacquire, 1414 minutes]: 78 times: [[6923, 6955, 6407671, 8264, 1995338, 6985, 5699487, 2235206, 188409, 1831425, 1033199, 5585420, 5101298, 1298527, 545536, 1791165, 7811, 11501, 7226, 5840461, 8304, 6100015, 10513, 9998, 7264008, 427482, 11612, 4264510, 2840924, 26964, 2246775, 8000, 8211518, 1517528, 7207645, 6342585, 11856, 3211788, 5671775, 3811495, 8397, 5869298, 31165, 1565189, 2760834, 3320264, 826060, 5796262, 5773187, 5411665, 5574157, 1340207, 972090, 7094789, 5734595, 5466438, 6074809, 5675977, 5349029, 7288363, 3033861, 5264593, 3854985, 5360125, 6618159, 4593483, 3875138, 7845605, 3935443, 6839478, 3965553, 3309451, 7784032, 7977750, 5722528, 4147913, 6250425, 8063005]
sync.runtime_Semacquire(0x4004f19d08?)
        runtime/sema.go:62 +0x2c
sync.(*WaitGroup).Wait(0x40003ab248)
        sync/waitgroup.go:116 +0x74
github.com/lightningnetwork/lnd/peer.(*PingManager).Stop.func1()
        github.com/lightningnetwork/lnd/peer/ping_manager.go:210 +0x34
sync.(*Once).doSlow(0x4001526d80?, 0x4013b89f20?)
        sync/once.go:74 +0x100
sync.(*Once).Do(...)
        sync/once.go:65
github.com/lightningnetwork/lnd/peer.(*PingManager).Stop(0x40039151a0?)
        github.com/lightningnetwork/lnd/peer/ping_manager.go:208 +0x4c
github.com/lightningnetwork/lnd/peer.(*Brontide).Disconnect(0x400376ca80, {0x1c98940?, 0x4008c90080})
        github.com/lightningnetwork/lnd/peer/brontide.go:1192 +0x194
github.com/lightningnetwork/lnd/peer.NewBrontide.func4({0x1c98940?, 0x4008c90060})
        github.com/lightningnetwork/lnd/peer/brontide.go:579 +0x130
github.com/lightningnetwork/lnd/peer.(*PingManager).start.func1()
        github.com/lightningnetwork/lnd/peer/ping_manager.go:160 +0x300
created by github.com/lightningnetwork/lnd/peer.(*PingManager).start in goroutine 6916
        github.com/lightningnetwork/lnd/peer/ping_manager.go:120 +0x138

I think we will wait endlessly for the WaitGroup counter to hit zero because we will never quit the start function of the ping manager because we are not able to consume the quit channel of the PingManager

We are here: https://github.com/lightningnetwork/lnd/blob/master/peer/ping_manager.go#L160

and waiting here:

https://github.com/lightningnetwork/lnd/blob/master/peer/ping_manager.go#L210

Could you please verify @ProofOfKeags ?

Metadata

Metadata

Assignees

Labels

brontidebugUnintended code behaviour

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions