Skip to content

CLI: processQueuedItems() lacks error handling — session.idle may never be emitted #794

@PureWeen

Description

@PureWeen

Summary

The Copilot CLI's processQueuedItems() method lacks a try-catch/try-finally wrapper. If any unhandled exception occurs during message processing (or in the post-loop code that collects background task info), session.idle is never emitted. This causes sendAndWait() in all SDKs to hang until the 60-second timeout.

This may be a contributing factor behind #558 ("TimeoutError: Timeout after XXXs waiting for session.idle"), though the exact root cause has not been confirmed through a real CLI reproduction.

Note: This issue was filed on copilot-sdk for visibility, but the vulnerabilities described are in the Copilot CLI (@github/copilot). The correct tracker for a CLI fix would be github/copilot-cli.

Confidence Level

Medium confidence. The code analysis below is based on reading the minified CLI source (@github/copilot v1.0.3-0). The vulnerabilities identified are real code patterns, but:

Identified Vulnerabilities

Vulnerability 1: Missing try-catch in processQueuedItems()

Package: @github/copilot v1.0.3-0

The de-minified processQueuedItems() looks like this:

async processQueuedItems() {
    if (this.isProcessing || this.itemQueue.length === 0) return;

    this.isProcessing = true;
    await this.initializeMcpHost();

    while (this.itemQueue.length > 0) {
        let item = this.itemQueue.shift();
        // ... process commands and messages ...
        // For messages: await this.runAgenticLoop(...)
    }

    // ⚠️ ALL OF THIS IS OUTSIDE ANY TRY-CATCH:
    this.isProcessing = false;
    let agents = this.taskRegistry.list({ includeCompleted: false }).filter(o => o.type === "agent");
    let shells = this.shellContext?.getRunningCommandsInfo() ?? [];
    let backgroundTasks = (agents.length > 0 || shells.length > 0) ? { agents: ..., shells: ... } : undefined;
    this.emitEphemeral("session.idle", { backgroundTasks });  // ← Never reached if anything above throws
}

Impact: If taskRegistry.list(), getRunningCommandsInfo(), or any post-loop code throws, session.idle is never emitted AND isProcessing remains true, blocking all future messages.

Likelihood: Low. These are simple accessor methods unlikely to throw in normal operation. runAgenticLoop() has its own thorough try-catch-finally and does NOT re-throw, so errors during agentic processing don't escape to this level.

Vulnerability 2: Silent JSON-RPC notification drops

The event forwarding layer catches and silently drops notification failures:

for (let c of this.connections)
    c.sendNotification(SESSION_EVENT, payload).catch(err => {
        debug(`Failed to send event notification: ${err}`);
    });

Impact: Even if session.idle IS emitted internally by the CLI, it can be lost in transit if the JSON-RPC connection has issues. This is logged at debug level only — invisible unless debug logging is enabled.

Likelihood: Medium. This is the more plausible explanation for intermittent failures in GitHub Actions, where process lifecycle and resource pressure could cause transport-level failures.

Suggested CLI Fix

Wrap processQueuedItems() in a try-finally that guarantees session.idle emission:

async processQueuedItems() {
    if (this.isProcessing || this.itemQueue.length === 0) return;

    this.isProcessing = true;
    try {
        await this.initializeMcpHost();
        while (this.itemQueue.length > 0) {
            // ... existing processing logic ...
        }

        let backgroundTasks;
        try {
            let agents = this.taskRegistry.list({ includeCompleted: false }).filter(o => o.type === "agent");
            let shells = this.shellContext?.getRunningCommandsInfo() ?? [];
            backgroundTasks = (agents.length > 0 || shells.length > 0)
                ? { agents: ..., shells: ... }
                : undefined;
        } catch (err) {
            // Log but don't block idle emission
        }

        this.emitEphemeral("session.idle", { backgroundTasks });
    } catch (err) {
        // Ensure idle is emitted even on unexpected errors
        this.emitEphemeral("session.idle", {});
    } finally {
        this.isProcessing = false;
    }
}

What We Tested (SDK-Side Mock)

We built a mock CLI server that deliberately omits session.idle after assistant.turn_end. This does not reproduce the real CLI bug — it hard-codes the symptom to validate the SDK-side mitigation:

  • Happy path (session.idle sent): sendAndWait() resolves in ~130ms
  • session.idle omitted: sendAndWait() hangs until timeout (60s default), or resolves via SDK fallback timer (~5s)

The mock is useful for testing the SDK fix but is not evidence of a specific CLI bug.

SDK-Level Mitigation

We've implemented an idle fallback timer in all 4 SDKs (Node.js, Python, Go, .NET) as defense-in-depth. The timer fires a synthetic session.idle event if assistant.turn_end arrives but session.idle doesn't follow within a configurable grace period (default 5s). This prevents indefinite hangs regardless of root cause.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions