Skip to content

add SEMAPHORE_AGENT_ASG_MAX_INSTANCE_LIFETIME#169

Merged
lucaspin merged 3 commits intorenderedtext:masterfrom
cchristous:max-instance-lifetime
Mar 31, 2025
Merged

add SEMAPHORE_AGENT_ASG_MAX_INSTANCE_LIFETIME#169
lucaspin merged 3 commits intorenderedtext:masterfrom
cchristous:max-instance-lifetime

Conversation

@cchristous
Copy link
Copy Markdown
Contributor

@cchristous cchristous commented Mar 29, 2025

Change description

This pull request introduces a new configuration option to set the maximum instance lifetime for the Semaphore agent auto-scaling group. This allows users to specify the maximum time an instance can exist before it must be terminated and replaced. This will help catch the scenario where an instance is in a bad state with the Semaphore agent not running and there isn't a way to recover it, other than manually terminating the instance. We see this sometimes happen if userdata fails to run properly (for example if the IMDS returns a 500 error, which also prevents the error handling self cleanup from running)

The default value is 0 which indicates no maximum lifetime. See https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-max-instance-lifetime.html for more details.

I considered an approach where the value would be determined programmatically as SEMAPHORE_AGENT_DISCONNECT_AFTER_IDLE_TIMEOUT + 24 hours (because 24 hours is the maximum during of a Semaphore job). However, I decided against it because that would have yielded more complicated conditional logic when only SEMAPHORE_AGENT_DISCONNECT_AFTER_IDLE_TIMEOUT was set and that approach wouldn't allow users to have as much control.

Testing

added a unit test

@cchristous cchristous marked this pull request as ready for review March 31, 2025 19:53
@lucaspin lucaspin self-requested a review March 31, 2025 19:56
@lucaspin
Copy link
Copy Markdown
Contributor

/sem-approve

@lucaspin
Copy link
Copy Markdown
Contributor

/sem-approve

@lucaspin lucaspin merged commit 31b8d87 into renderedtext:master Mar 31, 2025
1 check passed
@cchristous cchristous deleted the max-instance-lifetime branch July 24, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants