Skip to content

feat(session): watchdog for stuck tool/session recovery#20104

Open
ESRE-dev wants to merge 1 commit intoanomalyco:devfrom
ESRE-dev:pr/session-watchdog
Open

feat(session): watchdog for stuck tool/session recovery#20104
ESRE-dev wants to merge 1 commit intoanomalyco:devfrom
ESRE-dev:pr/session-watchdog

Conversation

@ESRE-dev
Copy link
Copy Markdown

@ESRE-dev ESRE-dev commented Mar 30, 2026

Issue for this PR

Closes #20099

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds a runtime watchdog safety net for stuck tools and sessions, complementing startup recovery work.

This PR introduces:

  1. Session activity tracking

    • Adds a lightweight activity tracker used to determine session staleness.
  2. Startup orphan cleanup

    • Marks orphaned running tool parts from prior process exits as errored on bootstrap.
  3. Runtime watchdog tick

    • Periodically scans for long-running stuck tool parts and cancels affected sessions.
    • Uses leaf filtering so task tools waiting on child sessions are not force-failed prematurely.
    • Applies tool_timeout and task_timeout as independent cutoffs per tool type, so a short tool timeout is not overridden by a longer task timeout.
  4. Idle session detection

    • Cancels stale idle subagent sessions based on activity timestamps.
    • Root (top-level) sessions are never idle-cancelled — only child sessions spawned by the task tool are subject to idle detection.
  5. Config wiring

    • Uses experimental.tool_timeout, experimental.task_timeout, and experimental.idle_timeout to tune watchdog behavior.

This is intentionally scoped as a runtime watchdog feature and is complementary to #19023 (startup recovery).

How did you verify your code works?

  • Added and ran packages/opencode/test/session/watchdog.test.ts (21 tests).
  • Verified leaf-filter behavior (stuck leaf tools are recovered; waiting task tools are preserved).
  • Verified idle detection and activity tracking behavior.
  • Verified root sessions are excluded from idle cancellation.
  • Verified startup orphan cleanup transitions stale running parts to terminal error state.
  • Typecheck passes clean (bun typecheck).

Screenshots / recordings

Not a UI change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

If you do not follow this template your PR will be automatically rejected.

@github-actions github-actions bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

- Add SessionActivity tracker for session liveness monitoring
- Add cleanupOrphanedParts() to recover tool parts stuck from prior crashes
- Add periodic watchdog that detects stuck tools beyond configured timeout
- Leaf-filtering: only force-errors actual stuck tools, not task tools waiting on children
- Idle detection: cancels sessions with no activity beyond idle threshold
- Config support for tool_timeout, task_timeout, idle_timeout
@ESRE-dev ESRE-dev force-pushed the pr/session-watchdog branch from 753a4b4 to 2423da1 Compare April 1, 2026 17:50
avion23 pushed a commit to avion23/opencode that referenced this pull request Apr 2, 2026
…nomalyco#20104, anomalyco#20103)

- SessionActivity tracker for per-session activity timestamps
- Watchdog tick (60s) with leaf-filtering to force-error stuck tools
- Idle detection (default 5min) cancels unresponsive subagent sessions
- Orphan cleanup on startup for crash recovery
- raceSignal() for abort-aware tool execution with configurable timeouts
- Non-task tools: 15min global timeout (configurable)
- Task tools: 4hr default timeout with partial output recovery
- Fixed pre-existing typecheck error in app.tsx (removed externalOutputMode)
avion23 pushed a commit to avion23/opencode that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Runtime watchdog for stuck tool/session recovery (complements #19023 startup recovery)

1 participant