Skip to content

coordinator: fix race condition in authority test#1371

Merged
burgerdev merged 1 commit into
mainfrom
burgerdev/deadlock
Apr 15, 2025
Merged

coordinator: fix race condition in authority test#1371
burgerdev merged 1 commit into
mainfrom
burgerdev/deadlock

Conversation

@burgerdev
Copy link
Copy Markdown
Member

@burgerdev burgerdev commented Apr 11, 2025

It turns out that #1317 introduced two race conditions in test code:

  1. If the testclock was stepped before the watcher called clock.After, the notification about the step would go nowhere and the watcher would wait forever to restart the loop.
  2. In badStore.Watch we sent a notification about the call happening before actually handing out the channel. This could lead to the caller receiving the new channel, not the one the test logic was closing.

I fixed both, restructured the test code so that the main test logic is easier to understand (by hiding the channel sends and receives in functions of store and clock), and switched from clock.After to clock.NewTimer because clock.After is still documented to leak resources (not sure whether this is still the case, though, as this was fixed at least in the time package).

Tested:

$ go test -count 10000 -failfast -run '^TestBadStoreWatcherIsRestarted$' github.com/edgelesssys/contrast/coordinator/internal/authority -race 
ok  	github.com/edgelesssys/contrast/coordinator/internal/authority	104.262s

@burgerdev burgerdev added the no changelog PRs not listed in the release notes label Apr 11, 2025
@burgerdev burgerdev added this to the v1.8.0 milestone Apr 11, 2025
@burgerdev burgerdev requested a review from katexochen April 11, 2025 14:05
@burgerdev burgerdev added the needs: review Someone needs to review to bring this forward label Apr 14, 2025
Copy link
Copy Markdown
Contributor

@katexochen katexochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm

}
}
m.logger.Error("WatchLatestTransitions failed, starting a new watcher", "error", err)
t := m.clock.NewTimer(5 * time.Second)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think clock.After should be fine, it makes the code much more readable and shouldn't leak anything at this point.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, reverted back to clock.After.

@burgerdev burgerdev force-pushed the burgerdev/deadlock branch from bfc3be9 to 209fc37 Compare April 15, 2025 09:13
@katexochen katexochen removed the needs: review Someone needs to review to bring this forward label Apr 15, 2025
@burgerdev burgerdev merged commit 6404a41 into main Apr 15, 2025
17 checks passed
@burgerdev burgerdev deleted the burgerdev/deadlock branch April 15, 2025 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog PRs not listed in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants