Skip to content

Conversation

@scadu
Copy link
Contributor

@scadu scadu commented Nov 13, 2025

Description

In #3570 idle logic has changed and idleMon.markIdle would get only triggered via AcceptAndRunJob().
Agents that never received a job wouldn't be marked as idle.

Context

Issue discovered in buildkite/elastic-ci-stack-for-aws#1652 where the test stacks couldn't be destroyed.
I found out that a single instance would prevent ASG from being destroyed.
I tested agent 3.112.0 on a separate stack and could confirm this behavior.
Agents that received jobs were terminated properly, while agents that never received job would never get marked as idle, handing indefinitely (or until job got assigned to it).

Changes

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go fmt ./...)

I tested build 11168 of the agent on Elastic CI Stack for AWS instance.
It now disconnects immediately when idle, following my agent's configuration.

Disclosures / Credits

I did not use AI tools at all.

In #3570 idle logic has changed and `idleMon.markIdle` would get only
triggered via `AcceptAndRunJob()`.
Agents that never received a job wouldn't be marked as idle.
@scadu scadu marked this pull request as ready for review November 13, 2025 13:42
@scadu scadu requested a review from DrJosh9000 November 13, 2025 14:36
Copy link
Contributor

@moskyb moskyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks @scadu!

@moskyb moskyb merged commit 006265e into main Nov 13, 2025
1 check passed
@moskyb moskyb deleted the fix/idle_tracking_agent_without_jobs branch November 13, 2025 23:45
@DrJosh9000
Copy link
Contributor

Just checking, this is deliberately a behaviour change, not just a fix? As far as I can tell, disconnect-after-idle-timeout has until now not timed out agents that haven't yet run a job?

@DrJosh9000
Copy link
Contributor

Actually, you're right, it seems that it used to set the lastActionTime at the start of the method, meaning that the agent effectively implicitly starts idle.

https://github.com/buildkite/agent/blob/1e8fa75d6293b2d8e90c071bdf9d3d6df1b5c7f0/agent/agent_worker.go#L271C2-L271C16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants