Skip to content

fix(copilot): harden idle detection, add session timeout, and fix startup race#25

Merged
jrob5756 merged 4 commits intomainfrom
fix/copilot-idle-detection-and-startup-race
Mar 6, 2026
Merged

fix(copilot): harden idle detection, add session timeout, and fix startup race#25
jrob5756 merged 4 commits intomainfrom
fix/copilot-idle-detection-and-startup-race

Conversation

@jrob5756
Copy link
Copy Markdown
Collaborator

@jrob5756 jrob5756 commented Mar 6, 2026

Summary

  • Smarter idle detection: Exclude internal bookkeeping events (pending_messages.modified, session.start, session.info) from resetting the idle clock, so the watchdog can properly detect hangs during stuck MCP initialization.
  • Session timeout: Add a hard wall-clock max_session_seconds limit (default 30 min) to IdleRecoveryConfig that terminates sessions even when non-idle events keep flowing indefinitely.
  • Startup race fix: Protect _ensure_client_started() with an asyncio.Lock to prevent concurrent for-each items from racing to start the same client subprocess multiple times.

Test plan

  • Verify idle detection triggers correctly when only pending_messages.modified events are flowing
  • Verify sessions exceeding max_session_seconds are terminated with a descriptive ProviderError
  • Verify concurrent for-each execution doesn't trigger duplicate client starts
  • Run existing test suite: uv run pytest tests/test_providers/

🤖 Generated with Claude Code

Jason Robert and others added 4 commits March 6, 2026 14:16
…rtup race

- Exclude bookkeeping events (pending_messages.modified, session.start,
  session.info) from resetting the idle clock so the watchdog can detect
  hangs during stuck MCP initialization.
- Add a hard wall-clock max_session_seconds limit (default 30 min) that
  terminates sessions even when non-idle events keep flowing.
- Protect _ensure_client_started() with an asyncio.Lock to prevent
  concurrent for-each items from racing to start the subprocess twice.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ace PR

- Extract _IDLE_IGNORED_EVENTS to module-level frozenset to avoid
  per-event set allocation inside on_event() callback
- Change is_retryable to False on wall-clock timeout ProviderError to
  prevent retrying sessions that will hit the same root cause
- Remove redundant pre-lock `if not self._started` guard since the
  internal lock already handles the fast path
- Add time-since-last-event to session timeout error for diagnostics
- Document wall-clock check granularity (max ≈ max_session + idle_timeout)
- Broaden _ensure_client_started docstring to cover parallel groups
- Update Raises docstring for wall-clock timeout path
- Add 10 new tests: _IDLE_IGNORED_EVENTS membership, session timeout
  behavior, and concurrent _ensure_client_started race safety

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change idle_timeout_seconds from 60s to 0.05s in
  test_session_timeout_fires_even_with_flowing_events so the loop
  iterates quickly instead of blocking for a full minute (62s → 2s)
- Add missing is_retryable assertion to flowing-events timeout test
- Fix misleading "fast-path" comment on _ensure_client_started call

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrob5756 jrob5756 merged commit 56ee947 into main Mar 6, 2026
7 checks passed
@jrob5756 jrob5756 deleted the fix/copilot-idle-detection-and-startup-race branch March 6, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant