Summary
Serialize create_session calls in CopilotProvider so that concurrent for-each agents don't hit the race condition described in #27.
Problem
The copilot-sdk's CopilotClient.create_session() has a race window: the CLI starts using a session before the Python SDK registers it in _sessions. When the CLI sends a permission.request for the not-yet-registered session, the SDK raises ValueError("unknown session"), which the CLI surfaces as "Permission denied" to the model. The agent then retries every tool for up to 30 minutes before timing out.
This only happens when multiple create_session calls overlap — i.e., in for_each groups with max_concurrent > 1.
Proposed Fix
Add a lock around create_session in CopilotProvider.execute() so sessions are created one at a time. Sessions still run in parallel — only creation is serialized.
class CopilotProvider:
def __init__(self, ...):
...
self._session_create_lock = asyncio.Lock()
async def execute(self, ...):
...
async with self._session_create_lock:
session = await self._client.create_session(session_config)
...
Cost: ~200ms per session creation × N sessions, which is negligible against workflow runtimes of 10+ minutes.
Benefit: Eliminates the race window entirely. Each session is fully registered in _sessions with its permission handler before the next create_session begins, so no permission.request can arrive for an unknown session.
Related
Summary
Serialize
create_sessioncalls inCopilotProviderso that concurrent for-each agents don't hit the race condition described in #27.Problem
The
copilot-sdk'sCopilotClient.create_session()has a race window: the CLI starts using a session before the Python SDK registers it in_sessions. When the CLI sends apermission.requestfor the not-yet-registered session, the SDK raisesValueError("unknown session"), which the CLI surfaces as "Permission denied" to the model. The agent then retries every tool for up to 30 minutes before timing out.This only happens when multiple
create_sessioncalls overlap — i.e., infor_eachgroups withmax_concurrent > 1.Proposed Fix
Add a lock around
create_sessioninCopilotProvider.execute()so sessions are created one at a time. Sessions still run in parallel — only creation is serialized.Cost: ~200ms per session creation × N sessions, which is negligible against workflow runtimes of 10+ minutes.
Benefit: Eliminates the race window entirely. Each session is fully registered in
_sessionswith its permission handler before the nextcreate_sessionbegins, so nopermission.requestcan arrive for an unknown session.Related