Skip to content

Copilot CLI should retry on transient CAPIError 400 Bad Request during agentic workflow execution #25313

@pholleran

Description

@pholleran

Summary

When running the Copilot CLI inside a gh-aw agentic workflow, the CLI fails with a non-retried CAPIError: 400 Bad Request from the Copilot inference API (CAPI) mid-session. The CLI (or the gh-aw execution wrapper) should implement retry logic for transient 400 errors that occur after the session has already completed successful turns.

Reproduction

Error

Execution failed: CAPIError: 400 400 Bad Request
 (Request ID: C818:3ED713:19D401B:1C446B7:69D653CA)

The CLI had been running normally for ~3.5 minutes — completing 8+ tool calls (reading files, listing issues, exploring the codebase) — when CAPI returned this 400 error. The CLI exited with code 1, failing the workflow.

Expected Behavior

A 400 that occurs mid-conversation (after multiple successful inference turns) is likely transient rather than a genuinely malformed request. The CLI should:

  1. Detect CAPIError: 400 during an active session (after at least one successful turn)
  2. Retry the failed inference request with exponential backoff (e.g., 3-5 attempts)
  3. Only fail the session if retries are exhausted

Context

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions