chore(acp): raise default idle timeout from 320s to 620s#566
Merged
Conversation
tlongwell-block
pushed a commit
that referenced
this pull request
May 13, 2026
Per review on #566: the 620 literal was repeated at four call sites (the real default in `from_args`, the `resolve_idle_timeout` test helper that re-implements precedence, and two `test_config()` fixtures) plus two test assertions. The helper was the real drift hazard — a future bump would update the prod default but silently leave the test helper on the old value, masking the regression. Single `pub(crate) const DEFAULT_IDLE_TIMEOUT_SECS: u64 = 620` in config.rs with the rationale in its docstring. All real call sites now reference it; the test assertions derive from it so they auto-track. No behavior change.
The default idle timeout was sized as goose's 300s turn timeout + 20s buffer. In practice agents now go silent on the outer ACP channel for much longer stretches — sprout-agents running other agents as tools, codex/claude doing multi-minute single tool calls, model providers with slow generation — and 320s causes spurious cancel+respawn on otherwise healthy turns. This bumps the default to 620s (600s working budget + 20s buffer). The hard turn cap (`max_turn_duration`, default 3600s) is unchanged, so the safety valve still fires for genuinely wedged agents. Operators with custom `max_turn_duration` <= 620 and no explicit `--idle-timeout` will now fail the existing `idle < max_turn` validation at startup; they should either raise `max_turn_duration` or set `--idle-timeout` explicitly. Also corrects a stale README entry that documented the default as 300. Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
Per review on #566: the 620 literal was repeated at four call sites (the real default in `from_args`, the `resolve_idle_timeout` test helper that re-implements precedence, and two `test_config()` fixtures) plus two test assertions. The helper was the real drift hazard — a future bump would update the prod default but silently leave the test helper on the old value, masking the regression. Single `pub(crate) const DEFAULT_IDLE_TIMEOUT_SECS: u64 = 620` in config.rs with the rationale in its docstring. All real call sites now reference it; the test assertions derive from it so they auto-track. No behavior change. Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
3ecdc62 to
807fcdb
Compare
This was referenced May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Raise the default ACP idle timeout from 320s → 620s, and extract the value into a single
DEFAULT_IDLE_TIMEOUT_SECSconstant.Why the bump
The default was sized as "goose's 300s turn timeout + 20s buffer". That's no longer a good fit for the agent shapes we actually run:
Each of these triggers
idle timeout (320s) — no agent activityfollowed bysession/canceland a full agent subprocess respawn — losing session state, paying cold-start cost, and counting against the circuit breaker. The agent was healthy the whole time; it just wasn't talking on the outer channel.Why the constant
The 620 literal lived at four real call sites: the actual default in
from_args, theresolve_idle_timeouttest helper that re-implements precedence, and twotest_config()fixtures. The helper was a genuine drift hazard — a future bump would update the prod default but silently leave the test helper on the old value, masking the regression. Onepub(crate) const DEFAULT_IDLE_TIMEOUT_SECS: u64 = 620with the rationale in its docstring fixes that. Two test assertions also now derive from the constant so they auto-track.Commits
chore(acp): raise default idle timeout from 320s to 620s— the bump itself, plus a fix for a stale README entry that documented the default as300(pre-existing drift).refactor(acp): extract DEFAULT_IDLE_TIMEOUT_SECS constant— single source of truth, no behavior change.What does not change
max_turn_duration(the absolute wall-clock cap) stays at3600s. A genuinely wedged agent still gets killed at the hard cap.idle_timeout < max_turn_durationvalidation at startup is unchanged.SPROUT_ACP_IDLE_TIMEOUT/--idle-timeoutenv-var and flag overrides are unchanged.Operator-facing notes
Edge case worth flagging: operators who have set a custom
--max-turn-duration≤ 620 and rely on the default idle timeout will now fail validation at startup with:Resolution: either raise
--max-turn-durationabove 620, or set--idle-timeoutexplicitly to a value below your custom max. The default-default combo (idle=620,max=3600) is well within the constraint.Test plan
cargo build -p sprout-acp— clean.cargo clippy -p sprout-acp --all-targets -- -D warnings— clean.cargo test -p sprout-acp— 259 passed, 0 failed.What I considered and decided against
320inidle_timeout_error_includes_duration(acp.rs:1688). That test uses 320 as an arbitraryDurationvalue to verify error-message formatting; it doesn't reference the default. Touching it would be unrelated churn.SPROUT_ACP_IDLE_TIMEOUTif their workload needs more.