Skip to content

Conversation

@jhump
Copy link
Member

@jhump jhump commented May 12, 2025

The previous formulation had a race condition between the prober completing its first probe in one goroutine and it being cancelled by a concurrent goroutine calling the Close() function (which would cancel the prober's context).

You can see two failures in a row here:
https://github.com/bufbuild/httplb/actions/runs/14962744579/attempts/3
It would also happen pretty reliable if you did go test ./health -count 10. So I'm a little surprised we haven't seen this flake be more of an issue. It's possible that the context-related fix in the PR from that failed CI run (#80) actually tickled things just enough to make it happen more frequently.

…obe and it being cancelled by concurrent call to prober.Close()
// StateUnhealthy (HTTP error)
connection := make(fakeConnChan)
close(connection)
err := checker.New(ctx, connection, tracker).Close()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to New spawns a goroutine that launches a probe. But we'd then immediately call Close(), which cancels the prober's context. Both the prober goroutine and this test goroutine are running concurrently, so it's a race as to whether the initial probe can complete before the context is cancelled.

@jhump jhump requested a review from a user May 12, 2025 14:19
@jhump jhump changed the title Fix flaky TestPollingChcker in health package Fix flaky TestPollingChecker in health package May 12, 2025
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very tricky. Hopefully this is solid now.

@jhump jhump merged commit 2fe4e17 into main May 13, 2025
7 checks passed
@jhump jhump deleted the jh/fix-flaky-test branch May 13, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants