fix(copilot): add permission handler, event logging, and idle detection race fix#15
Conversation
…on race fix Three fixes for the Copilot SDK provider: 1. Add on_permission_request auto-approve handler to session config. The Copilot SDK now requires this handler when creating sessions. Without it, every workflow fails with 'An on_permission_request handler is required'. Auto-approve is correct for non-interactive conductor workflows. 2. Add SDK event logging via logger.debug() in the on_event callback. Every SDK event (including tool execution starts) is logged at DEBUG level, visible when --log-file is passed. This enables post-mortem debugging of session stalls. Works with --web-bg because bg_runner.py already forwards --log-file to the background process. 3. Fix done.clear() race condition in _wait_with_idle_detection. If session.idle arrives between a timeout check and done.clear(), the completion signal is lost, causing the loop to wait another full idle timeout (5 minutes) before checking again. Added done.is_set() guards at three locations before done.clear() and an early-return check at the top of the loop.
62b33b8 to
2426156
Compare
| "model": model, | ||
| # Auto-approve all permission requests (shell, write, mcp, read, url) | ||
| # since Conductor workflows run non-interactively. | ||
| "on_permission_request": lambda _req, _ctx: {"kind": "approved"}, |
There was a problem hiding this comment.
🔴 Lint failure (CI blocker): This inline lambda is causing the ruff format check to fail in CI. Extract it to a named static method with proper type hints and a docstring.
Suggestion:
@staticmethod
def _default_permission_handler(
request: dict[str, Any],
invocation: dict[str, str],
) -> dict[str, Any]:
"""Default permission handler that approves all requests.
SDK v0.1.28+ requires a permission handler on session creation.
In orchestration mode, we approve all tool permissions since the
workflow author controls which tools are available to each agent.
"""
return {"kind": "approved"}Then reference it here as self._default_permission_handler.
| # Build session config with MCP servers from workflow configuration | ||
| session_config: dict[str, Any] = { | ||
| "model": model, | ||
| # Auto-approve all permission requests (shell, write, mcp, read, url) |
There was a problem hiding this comment.
🔴 Missing resume_session handler: SDK v0.1.28+ also requires on_permission_request when calling resume_session(). Without it, resuming workflows will still raise the permission denied error.
Around line 429 where resume_session is called, pass a config dict:
session = await self._client.resume_session(
resume_sid,
{"on_permission_request": self._default_permission_handler},
)| session_config: dict[str, Any] = { | ||
| "model": model, | ||
| # Auto-approve all permission requests (shell, write, mcp, read, url) | ||
| # since Conductor workflows run non-interactively. |
There was a problem hiding this comment.
🟡 No audit logging on permission approvals. Since this handler silently auto-approves dangerous categories (shell, write, mcp), consider adding a logger.debug so approvals are visible in --log-file output:
logger.debug("auto-approved permission request: %s", request)
return {"kind": "approved"}| tool_info = "" | ||
| if event_type == "tool.execution_start": | ||
| tn = getattr(event.data, "tool_name", None) or getattr( | ||
| event.data, "name", "?" |
There was a problem hiding this comment.
🟢 Minor: The getattr(event.data, ...) chain will raise AttributeError if the event object itself has no data attribute (e.g., a malformed SDK event). Consider adding a defensive guard:
if event_type == "tool.execution_start" and hasattr(event, "data") and event.data is not None:Not a regression (existing code has the same pattern), but since this is new code it's worth hardening.
| @@ -1267,6 +1280,11 @@ | |||
| idle_timeout = self._idle_recovery_config.idle_timeout_seconds | |||
|
|
|||
| while True: | |||
There was a problem hiding this comment.
✅ Good fix. This early-return guard closes a real race window where session.idle could arrive between a previous done.clear() and the await wait_for(), causing the completion signal to be lost and an unnecessary full idle-timeout wait.
| @@ -1288,11 +1306,15 @@ | |||
| # just hasn't finished yet. Reset recovery counter (new task) | |||
There was a problem hiding this comment.
✅ Good fix. Without this guard, if wait_for raises TimeoutError but done was set concurrently (e.g., session.idle arrived during the timeout window), the unconditional clear() would erase the legitimate completion signal. The check-then-act is safe here since there's no await between is_set() and clear().
| @@ -1321,9 +1343,10 @@ | |||
There was a problem hiding this comment.
✅ Good fix. Between await session.send() and done.clear(), the SDK could process the recovery prompt fast enough that session.idle fires and sets done before we reach clear(). Without this guard, we'd erase that signal and loop again, sending a redundant recovery prompt.
…ume_session - Extract inline lambda to static `_default_permission_handler` with proper type hints and docstring (fixes ruff format CI failure) - Pass `on_permission_request` to `resume_session()` (was only on `create_session`, causing permission denied on resumed workflows) - Add `logger.debug` audit logging on permission approvals - Add defensive `hasattr(event, "data")` guard in event logging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes for the Copilot SDK provider:
Add on_permission_request auto-approve handler - The Copilot SDK now requires this handler when creating sessions. Without it, every workflow fails. Auto-approve is correct for non-interactive conductor workflows.
Add SDK event logging via logger.debug() - Every SDK event is logged at DEBUG level, visible when --log-file is passed. Enables post-mortem debugging of session stalls. Works with --web-bg.
Fix done.clear() race condition - If session.idle arrives between a timeout check and done.clear(), the completion signal is lost, causing the loop to wait another full idle timeout (5 min). Added done.is_set() guards at three locations.