Race condition in flightcontrol causing occasional test failure

When running unrelated tests, I saw a one-off failure in `TestCancelBoth` of `util/flightcontrol/flightcontrol_test.go`:
```
=== CONT  TestCancelBoth
    flightcontrol_test.go:199:
                Error Trace:    flightcontrol_test.go:199
                Error:          Not equal:
                                expected: "bar"
                                actual  : "baz"

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1 +1 @@
                                -bar
                                +baz
                Test:           TestCancelBoth
    flightcontrol_test.go:204:
                Error Trace:    flightcontrol_test.go:204
                Error:          Not equal:
                                expected: 3
                                actual  : 4
                Test:           TestCancelBoth
--- FAIL: TestCancelBoth (0.45s)
```

It doesn't reproduce very often, but I found if I cd into `util/flightcontrol` and run `go test -run TestCancelBoth -count 250` I'll almost always get at least 1 failure (same as the one above, it's always the same errors on the same lines). Others might need to increase `-count` depending on the hardware being run on I suppose.

Not 100% sure but I think it happens when:
1. `g.Do` is called [here](https://github.com/moby/buildkit/blob/b2195cec759428a7775a6fe01799b961d14a81bc/util/flightcontrol/flightcontrol_test.go#L192). It returns, but the go routine it created internally is still [here and hasn't been scheduled to run yet](https://github.com/moby/buildkit/blob/b2195cec759428a7775a6fe01799b961d14a81bc/util/flightcontrol/flightcontrol.go#L71).
2. This second `g.Do` gets called [here](https://github.com/moby/buildkit/blob/b2195cec759428a7775a6fe01799b961d14a81bc/util/flightcontrol/flightcontrol_test.go#L197) (with different func as callback). It gets all the way to [here](https://github.com/moby/buildkit/blob/b2195cec759428a7775a6fe01799b961d14a81bc/util/flightcontrol/flightcontrol.go#L143) and thus returns the value from the previous `g.Do` call, even though the callback func is different.

@tonistiigi If the above is correct then I think this was introduced in [this commit](https://github.com/moby/buildkit/pull/2195). What do you think the right behavior is here? Should it be that once a call to `g.Do` returns then any state for the key is completely removed? Or do you think that would undo the improvements to go routine contention from the previous commit?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in flightcontrol causing occasional test failure #2270

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race condition in flightcontrol causing occasional test failure #2270

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions