Skip to content

[bug]: htlcswitch may crash upon shutdown because of a race in WaitGroup #9308

@starius

Description

@starius

Background

The htlcswitch package uses sync.WaitGroup in a manner that allows parallel Add(1) and Wait() calls. Specifically:

  • GetAttemptResult calls wg.Add(1) before launching a goroutine,
  • Stop() calls wg.Done().

This is explicitly prohibited by the WaitGroup documentation.

Note that calls with a positive delta that occur when the counter is zero must happen before a Wait.

Your environment

  • version of lnd: master (fbeab72)
  • which operating system (uname -a on *Nix): Linux host 6.1.75-1.qubes.fc32.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 06:49:25 CET 2024 x86_64 GNU/Linux
  • version of btcd, bitcoind, or other backend: not used
  • any other relevant environment details

Steps to reproduce

I reproduced this in branch reproduce-race in my fork. I added test TestSwitchGetAttemptResultStress which runs many GetAttemptResult() calls sequentially and a Stop() call in parallel.

How to run it. Checkout the branch and run the following commands:

$ cd htlcswitch
$ go test -race -run TestSwitchGetAttemptResultStress

Expected behaviour

I expect TestSwitchGetAttemptResultStress not to crash, i.e. that there are no race condition between GetAttemptResult and Stop methods.

Actual behaviour

The stress test crashed

htlcswitch$ go test -race -run TestSwitchGetAttemptResultStress
==================
WARNING: DATA RACE
Read at 0x00c0001d4118 by goroutine 21:
  runtime.raceread()
      <autogenerated>:1 +0x1e
  github.com/lightningnetwork/lnd/htlcswitch.(*Switch).GetAttemptResult()
      /home/user/lnd/htlcswitch/switch.go:496 +0x1c4
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress.func1()
      /home/user/lnd/htlcswitch/switch_test.go:3211 +0x168

Previous write at 0x00c0001d4118 by goroutine 22:
  runtime.racewrite()
      <autogenerated>:1 +0x1e
  github.com/lightningnetwork/lnd/htlcswitch.(*Switch).Stop()
      /home/user/lnd/htlcswitch/switch.go:1995 +0x1e9
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress.func2()
      /home/user/lnd/htlcswitch/switch_test.go:3232 +0xae

Goroutine 21 (running) created at:
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress()
      /home/user/lnd/htlcswitch/switch_test.go:3203 +0x356
  testing.tRunner()
      /home/user/.goroot/src/testing/testing.go:1690 +0x226
  testing.(*T).Run.gowrap1()
      /home/user/.goroot/src/testing/testing.go:1743 +0x44

Goroutine 22 (finished) created at:
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress()
      /home/user/lnd/htlcswitch/switch_test.go:3222 +0x45c
  testing.tRunner()
      /home/user/.goroot/src/testing/testing.go:1690 +0x226
  testing.(*T).Run.gowrap1()
      /home/user/.goroot/src/testing/testing.go:1743 +0x44
==================
--- FAIL: TestSwitchGetAttemptResultStress (0.08s)
    testing.go:1399: race detected during execution of test
FAIL
exit status 1
FAIL    github.com/lightningnetwork/lnd/htlcswitch      0.380s

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions