-
Notifications
You must be signed in to change notification settings - Fork 327
Configurable chunks interval #3521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5703a49 to
81c136c
Compare
| mb := mockBootstrap(t) | ||
| mb.Expect().Once().AndCallFunc(func(c *bintest.Call) { | ||
| start := time.Now() | ||
| for time.Since(start) < 10*time.Second { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL this construct. handy!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i reckon
| ratio := float64(avg2s) / float64(avg1s) | ||
| t.Logf("Ratio of 2s/1s intervals: %.2f", ratio) | ||
|
|
||
| if ratio < 1.4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 1.4? we're not jittering the interval internally, so shouldn't the interval on the 2s always be 2x longer than the interval on 1s? i suppose the chunk collector will sometimes collect more than its size limit and preempt the serverside timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah 1.4 is a bit of a weird number... I had a bit of a struggle with testing this tbh.
that code in run_job.go streamJobLogsAfterProcessStart has a few funky interactions
- the jitter sleep is up-to the
processInterval, usingrand.N - the jitter is applied after the ticker
- the ticker will keep ticking on its schedule irrespective of the jitter
- we also have this
firstchannel to immediately try and push out a chunk without waiting
this all kinda means that the intervals themselves are kinda all over the joint depending on the dice-roll of the jitter itself which can be pathologically lumpy
eg. with 1s interval
| Ticker | Jitter | Final Time | Time since last |
|---|---|---|---|
| 1.0s | 0.9s | 1.9s | N/A |
| 2.0s | 0.1s | 2.1s | 0.2s |
| 3.0s | 0.9s | 3.9s | 1.8s |
| 4.0s | 0.1s | 4.1s | 0.2s |
| 5.0s | 0.5s | 5.5s | 1.4s |
Ideally it would be straightforward to control or disable the jitter in the tests too, but that also requires some surgery or exposing internals that aren't very integration test-ey which I don't love either. Sigh.
| avg1s := runTestWithInterval(t, 1) | ||
| avg2s := runTestWithInterval(t, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test suite is currently pretty quick — 1.6 seconds on my machine at current. This test needs(?) to wait for 10s to complete, but i think we should probably do both of these runs concurrently to minimise the amount of time we're lengthening this suite by. something like:
times := map[int]time.Duration{}
wg := sync.WaitGroup{}
wg.Add(2)
go func() {
defer wg.Done()
times[1] = runTestWithInterval(t, 1)
}()
go func() {
defer wg.Done()
times[2] = runTestWithInterval(t, 2)
}()
wg.Wait()
avg1s := times[1]
avg2s := times[2]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good idea
moskyb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! exciting.
Description
To give us more control over log chunk, I've added an additional field to the agent job representation to allow us to remotely dictate the log chunk interval.
This PR accepts that field and uses it to determine the chunk ticker interval at the beginning of the job.
Testing
go test ./...). Buildkite employees may check this if the pipeline has run automatically.go fmt ./...)Disclosures / Credits
I wrote the app code, and had
ampgenerate the test, which is kind of a crap test, but this codepath didn't have great test plumbing around it, and needs a bit of a refactor to make it easier to test I think. I'm keen to hear ideas from @buildkite/agent-stewards of how I can pin this down a bit more (mostly around the jitter, and first chunk plumbing) to make the tests more closer to the intent of the change.