Summary
When running for_each groups with max_concurrent > 1, Copilot sessions intermittently receive "Permission denied and could not request permission from user" on all tool calls. This causes agents to spin uselessly for the full 30-minute max_session_seconds timeout before failing, turning a 13-minute workflow into a 60-minute timeout.
Root Cause
There is a race condition in copilot-sdk's CopilotClient.create_session() (client.py lines 442–451):
response = await self._client.request("session.create", payload) # 1. CLI creates session
session_id = response["sessionId"]
session = CopilotSession(session_id, self._client, workspace_path)
session._register_tools(tools)
if on_permission_request:
session._register_permission_handler(on_permission_request) # 2. Handler registered
with self._sessions_lock:
self._sessions[session_id] = session # 3. Session added to lookup dict
The CLI process starts the session at step 1 and can immediately begin sending permission.request JSON-RPC messages. But the Python SDK doesn't register the session in _sessions until step 3. If a permission.request arrives between steps 1 and 3:
_handle_permission_request() looks up the session in _sessions → not found
- Raises
ValueError("unknown session {session_id}")
_dispatch_request() catches the exception and sends a JSON-RPC error response (-32603)
- The CLI interprets this as a permission denial and returns "Permission denied" to the model
With 5 concurrent create_session calls (from max_concurrent: 5 in a for-each group), the race window widens significantly. Once a session's first permission request is denied, the model starts retrying every tool it has — each retry also gets denied — and the agent burns its entire 1800s session timeout on futile retries.
Evidence
Comparing CI runs on the same workflow with same Copilot CLI 1.0.2:
| Date |
gather_sources duration |
"Permission denied" count |
Total run |
| Mar 7 |
396s (10/10 succeeded) |
0 |
13 min ✅ |
| Mar 8 |
3,050s (9/10 succeeded, 1 timed out at 1800s) |
hundreds |
59 min ⚠️ |
| Mar 9 |
never completed (agents stuck in retry loops) |
83+ |
60 min ❌ cancelled |
The pattern is consistent: agents get "Permission denied" on their very first tool call, then every subsequent tool call also fails. Other agents in the same batch work fine — they won the race.
Suggested Fixes
In copilot-sdk (root cause)
Register the session in _sessions before sending session.create to the CLI, or use a placeholder entry:
# Pre-register with a placeholder so permission requests can find the session
session = CopilotSession(None, self._client, None)
if on_permission_request:
session._register_permission_handler(on_permission_request)
# Now create on CLI side
response = await self._client.request("session.create", payload)
session_id = response["sessionId"]
session._session_id = session_id
with self._sessions_lock:
self._sessions[session_id] = session
Or alternatively, queue incoming permission.request messages for unknown sessions and replay them once the session is registered.
In Conductor (mitigation)
-
Expose max_session_seconds in workflow YAML — the hardcoded 1800s is far too long for a for-each item that should take ~60s. A 5-minute cap would limit damage to 5 min instead of 30 min per stuck agent.
-
Detect permission-denied loops — if an agent receives "Permission denied" on N consecutive tool calls, fail fast instead of waiting for the session timeout.
Reproduction
Any workflow with a for_each group using max_concurrent >= 2 and the Copilot provider can hit this intermittently. Higher concurrency = higher probability.
Environment
- Conductor: installed from
main
github-copilot-sdk: 0.1.18
- Copilot CLI: 1.0.2
- Runtime: GitHub Actions
ubuntu-latest
Summary
When running
for_eachgroups withmax_concurrent > 1, Copilot sessions intermittently receive "Permission denied and could not request permission from user" on all tool calls. This causes agents to spin uselessly for the full 30-minutemax_session_secondstimeout before failing, turning a 13-minute workflow into a 60-minute timeout.Root Cause
There is a race condition in
copilot-sdk'sCopilotClient.create_session()(client.py lines 442–451):The CLI process starts the session at step 1 and can immediately begin sending
permission.requestJSON-RPC messages. But the Python SDK doesn't register the session in_sessionsuntil step 3. If apermission.requestarrives between steps 1 and 3:_handle_permission_request()looks up the session in_sessions→ not foundValueError("unknown session {session_id}")_dispatch_request()catches the exception and sends a JSON-RPC error response (-32603)With 5 concurrent
create_sessioncalls (frommax_concurrent: 5in a for-each group), the race window widens significantly. Once a session's first permission request is denied, the model starts retrying every tool it has — each retry also gets denied — and the agent burns its entire 1800s session timeout on futile retries.Evidence
Comparing CI runs on the same workflow with same Copilot CLI 1.0.2:
gather_sourcesdurationThe pattern is consistent: agents get "Permission denied" on their very first tool call, then every subsequent tool call also fails. Other agents in the same batch work fine — they won the race.
Suggested Fixes
In
copilot-sdk(root cause)Register the session in
_sessionsbefore sendingsession.createto the CLI, or use a placeholder entry:Or alternatively, queue incoming
permission.requestmessages for unknown sessions and replay them once the session is registered.In Conductor (mitigation)
Expose
max_session_secondsin workflow YAML — the hardcoded 1800s is far too long for a for-each item that should take ~60s. A 5-minute cap would limit damage to 5 min instead of 30 min per stuck agent.Detect permission-denied loops — if an agent receives "Permission denied" on N consecutive tool calls, fail fast instead of waiting for the session timeout.
Reproduction
Any workflow with a
for_eachgroup usingmax_concurrent >= 2and the Copilot provider can hit this intermittently. Higher concurrency = higher probability.Environment
maingithub-copilot-sdk: 0.1.18ubuntu-latest