Configurable chunks interval #3521

catkins · 2025-10-12T23:33:20Z

Description

To give us more control over log chunk, I've added an additional field to the agent job representation to allow us to remotely dictate the log chunk interval.

This PR accepts that field and uses it to determine the chunk ticker interval at the beginning of the job.

Testing

Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
Code is formatted (with go fmt ./...)

Disclosures / Credits

I wrote the app code, and had amp generate the test, which is kind of a crap test, but this codepath didn't have great test plumbing around it, and needs a bit of a refactor to make it easier to test I think. I'm keen to hear ideas from @buildkite/agent-stewards of how I can pin this down a bit more (mostly around the jitter, and first chunk plumbing) to make the tests more closer to the intent of the change.

moskyb · 2025-10-12T23:52:51Z

agent/integration/job_runner_integration_test.go

+		mb := mockBootstrap(t)
+		mb.Expect().Once().AndCallFunc(func(c *bintest.Call) {
+			start := time.Now()
+			for time.Since(start) < 10*time.Second {


TIL this construct. handy!

moskyb · 2025-10-12T23:56:07Z

agent/integration/job_runner_integration_test.go

+		ratio := float64(avg2s) / float64(avg1s)
+		t.Logf("Ratio of 2s/1s intervals: %.2f", ratio)
+
+		if ratio < 1.4 {


why 1.4? we're not jittering the interval internally, so shouldn't the interval on the 2s always be 2x longer than the interval on 1s? i suppose the chunk collector will sometimes collect more than its size limit and preempt the serverside timeout?

yeah 1.4 is a bit of a weird number... I had a bit of a struggle with testing this tbh.

that code in run_job.go streamJobLogsAfterProcessStart has a few funky interactions

the jitter sleep is up-to the processInterval, using rand.N

the jitter is applied after the ticker

the ticker will keep ticking on its schedule irrespective of the jitter

we also have this first channel to immediately try and push out a chunk without waiting

this all kinda means that the intervals themselves are kinda all over the joint depending on the dice-roll of the jitter itself which can be pathologically lumpy

eg. with 1s interval

Ticker Jitter Final Time Time since last

1.0s 0.9s 1.9s N/A

2.0s 0.1s 2.1s 0.2s

3.0s 0.9s 3.9s 1.8s

4.0s 0.1s 4.1s 0.2s

5.0s 0.5s 5.5s 1.4s

Ideally it would be straightforward to control or disable the jitter in the tests too, but that also requires some surgery or exposing internals that aren't very integration test-ey which I don't love either. Sigh.

moskyb · 2025-10-12T23:59:33Z

agent/integration/job_runner_integration_test.go

+		avg1s := runTestWithInterval(t, 1)
+		avg2s := runTestWithInterval(t, 2)


this test suite is currently pretty quick — 1.6 seconds on my machine at current. This test needs(?) to wait for 10s to complete, but i think we should probably do both of these runs concurrently to minimise the amount of time we're lengthening this suite by. something like:

times := map[int]time.Duration{} wg := sync.WaitGroup{} wg.Add(2) go func() { defer wg.Done() times[1] = runTestWithInterval(t, 1) }() go func() { defer wg.Done() times[2] = runTestWithInterval(t, 2) }() wg.Wait() avg1s := times[1] avg2s := times[2]

yeah good idea

moskyb

LGTM! exciting.

catkins requested a review from a team October 12, 2025 23:33

DrJosh9000 self-requested a review October 12, 2025 23:36

catkins added 2 commits October 13, 2025 10:46

allow defining log chunk interval from server

b1f650f

add integration test for the job chunk interval property

81c136c

catkins force-pushed the catkins/configurable-chunks-interval branch from 5703a49 to 81c136c Compare October 12, 2025 23:46

moskyb reviewed Oct 13, 2025

View reviewed changes

rewrite test to assert on chunk counts not intervals

c1e5bf7

moskyb self-requested a review October 15, 2025 01:20

moskyb approved these changes Oct 15, 2025

View reviewed changes

DrJosh9000 approved these changes Oct 15, 2025

View reviewed changes

catkins merged commit 6f99559 into main Oct 15, 2025
1 check passed

catkins deleted the catkins/configurable-chunks-interval branch October 15, 2025 11:21

DrJosh9000 mentioned this pull request Oct 22, 2025

Bump version and changelog for v3.110.0 #3550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable chunks interval #3521

Configurable chunks interval #3521

Uh oh!

catkins commented Oct 12, 2025 •

edited

Loading

Uh oh!

moskyb Oct 12, 2025

Uh oh!

catkins Oct 13, 2025

Uh oh!

moskyb Oct 12, 2025

Uh oh!

catkins Oct 13, 2025

Uh oh!

moskyb Oct 12, 2025

Uh oh!

catkins Oct 13, 2025

Uh oh!

moskyb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ticker	Jitter	Final Time	Time since last
1.0s	0.9s	1.9s	N/A
2.0s	0.1s	2.1s	0.2s
3.0s	0.9s	3.9s	1.8s
4.0s	0.1s	4.1s	0.2s
5.0s	0.5s	5.5s	1.4s

		avg1s := runTestWithInterval(t, 1)
		avg2s := runTestWithInterval(t, 2)

Configurable chunks interval #3521

Configurable chunks interval #3521

Uh oh!

Conversation

catkins commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Disclosures / Credits

Uh oh!

moskyb Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

catkins Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

moskyb Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

catkins Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

moskyb Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

catkins Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

moskyb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

catkins commented Oct 12, 2025 •

edited

Loading