Context
When startAgent is called for an agent that's already in starting status (between sessionCount++ and session.create()), the current code calls stopAgent() to kill the old session before proceeding. However, stopAgent() cannot actually cancel a startup that hasn't created a sessionId yet. The original startAgent() call keeps going, subscribes to events, and can leave an extra live session that is no longer tracked in the agents map.
This was identified during PR #1336 review (comment by kilo-code-bot).
Current behavior
In container/src/process-manager.ts:startAgent():
if (existing && (existing.status === 'running' || existing.status === 'starting')) {
await stopAgent(request.agentId).catch(...);
}
If existing.status === 'starting', stopAgent tries to kill it but has no sessionId to target. The original startup continues in the background, creating an orphaned session.
Expected behavior
When a new startAgent request arrives for an agent in starting status, the system should either:
- Wait for the existing startup to complete (with a timeout), then stop the resulting session
- Use an
AbortController to cancel the in-flight startup
- Track the pending startup promise and await it before stopping
Option 2 is cleanest — thread an AbortController through the startup sequence so stopAgent can signal cancellation before session.create() completes.
Impact
Low — this race requires two startAgent calls for the same agent within the ~1-2 second window between sessionCount++ and session.create(). In practice this is rare because the reconciler runs every 5s and the DO serializes RPC calls. But it could happen during rapid container eviction/restart cycles.
Parent: #204
Context
When
startAgentis called for an agent that's already instartingstatus (betweensessionCount++andsession.create()), the current code callsstopAgent()to kill the old session before proceeding. However,stopAgent()cannot actually cancel a startup that hasn't created asessionIdyet. The originalstartAgent()call keeps going, subscribes to events, and can leave an extra live session that is no longer tracked in theagentsmap.This was identified during PR #1336 review (comment by kilo-code-bot).
Current behavior
In
container/src/process-manager.ts:startAgent():If
existing.status === 'starting',stopAgenttries to kill it but has nosessionIdto target. The original startup continues in the background, creating an orphaned session.Expected behavior
When a new
startAgentrequest arrives for an agent instartingstatus, the system should either:AbortControllerto cancel the in-flight startupOption 2 is cleanest — thread an
AbortControllerthrough the startup sequence sostopAgentcan signal cancellation beforesession.create()completes.Impact
Low — this race requires two
startAgentcalls for the same agent within the ~1-2 second window betweensessionCount++andsession.create(). In practice this is rare because the reconciler runs every 5s and the DO serializes RPC calls. But it could happen during rapid container eviction/restart cycles.Parent: #204