Skip to content

Add copilot driver with retry logic for partial session failures#25329

Merged
pelikhan merged 6 commits intomainfrom
copilot/fix-retry-logic-capierror-400
Apr 8, 2026
Merged

Add copilot driver with retry logic for partial session failures#25329
pelikhan merged 6 commits intomainfrom
copilot/fix-retry-logic-capierror-400

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

Copilot CLI exits with CAPIError: 400 Bad Request mid-session (after successful tool calls), wasting the entire premium request. The error is transient — not a malformed request — and warrants retry with --resume to continue the session. More broadly, any failure that occurs after the session has partially executed (produced output) is now eligible for a resume retry.

Changes

New: actions/setup/js/copilot_driver.cjs

Transparent subprocess wrapper for the Copilot CLI:

  • Forwards all args, stdin/stdout/stderr unchanged
  • Retries with --resume flag whenever a run was partially executed (produced output), regardless of error type (3 attempts, 5s→10s→20s backoff, 60s cap)
  • CAPIError 400 is detected and named explicitly in logs as the well-known transient case; other partial-execution failures are labelled "partial execution"
  • No retry when the process produced no output (failed to start / auth error before any work)
  • Extensive structured logging with ISO timestamps and [copilot-driver] prefix (grep-friendly in agent-stdio.log):
    • Startup config (maxRetries, delays, backoff multiplier)
    • Per-attempt: pid, args (with --prompt value redacted), stdout/stderr byte counts, exit code, signal, duration
    • Retry decisions with context (isCAPIError400, hasOutput, retries remaining, named reason)
    • Final summary with total wall-clock duration via human-readable formatDuration() (e.g. 3m 12s)

New: DriverProvider interface (pkg/workflow/agentic_engine.go)

Optional interface engines can implement to supply a JS driver script:

type DriverProvider interface {
    GetDriverScriptName() string // e.g. "copilot_driver.cjs"
}

CopilotEngine wired to driver (copilot_engine.go, copilot_engine_execution.go)

CopilotEngine implements DriverProvider. GetExecutionSteps() now prefixes the copilot invocation with node ${RUNNER_TEMP}/gh-aw/actions/copilot_driver.cjs in both AWF-sandbox and standard execution modes:

# Before
-- /bin/bash -c '/usr/local/bin/copilot --add-dir ... --prompt "..."'

# After
-- /bin/bash -c 'node ${RUNNER_TEMP}/gh-aw/actions/copilot_driver.cjs /usr/local/bin/copilot --add-dir ... --prompt "..."'

The driver file is accessible inside AWF via the existing ${RUNNER_TEMP}/gh-aw:ro mount, and node is available via --env-all + chroot PATH passthrough.


Changeset

  • Type: patch
  • Description: Add a Copilot CLI driver wrapper that retries partial-session failures with --resume, improving reliability for transient mid-session errors after output has begun.

Generated by Changeset Generator for issue #25329 ·



✨ PR Review Safe Output Test - Run 24146254696

💥 [THE END] — Illustrated by Smoke Claude · ● 289.1K ·

…rrors

- Create actions/setup/js/copilot_driver.cjs: Node.js driver that wraps
  copilot CLI subprocess, detects CAPIError 400, and retries with --resume
  flag using exponential backoff (3 retries, 5s initial delay)
- Add DriverProvider interface to pkg/workflow/agentic_engine.go
- Implement GetDriverScriptName() on CopilotEngine returning copilot_driver.cjs
- Modify GetExecutionSteps() to prepend driver to copilot command in both
  sandbox (AWF) and non-sandbox modes
- Update golden test fixtures and recompile lock files

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/7f63d96b-1922-4261-888a-e40d5845c0f8

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix retry logic for transient CAPIError 400 Bad Request Add copilot driver with retry logic for transient CAPIError 400 errors Apr 8, 2026
Copilot AI requested a review from pelikhan April 8, 2026 15:43
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Apr 8, 2026

@copilot add extensive logging

- Add log() helper with ISO timestamps and [copilot-driver] prefix
- Log startup config (maxRetries, delays, backoff)
- Log each attempt: pid, args (with --prompt redacted), stdout/stderr bytes, exit code, duration
- Log retry decisions with context (isCAPIError400, hasOutput, retriesRemaining)
- Log final summary with total duration
- Add formatDuration() for human-readable elapsed times
- Extend tests to cover formatDuration and log format

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/262cb7ee-9401-4cc8-bfd1-ffe0b6516d29

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 8, 2026

@copilot add extensive logging

Done in ee0b538. Added a log() helper that writes [copilot-driver] <ISO-timestamp> <message> to stderr so lines are easy to grep from agent-stdio.log. Now logs:

  • Startup: command, maxRetries, initial delay, backoff multiplier, max delay cap
  • Per attempt: pid, args (with --prompt value redacted), duration, stdout/stderr byte counts, exit code, signal
  • Retry decisions: isCAPIError400, hasOutput, retries remaining — and the specific reason when a retry is skipped
  • Final summary: exit code and total wall-clock duration via formatDuration() (e.g. 3m 12s)

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Apr 8, 2026

@copilot attempts should resume the session if a run was already partially executed.

Previously the driver only retried when a CAPIError 400 was detected.
Now it retries with --resume whenever the session produced any output
(hasOutput), which covers all cases where a run was partially executed.
CAPIError 400 is still detected and named in the retry log message as
the well-known transient case; other failures report "partial execution"
as the reason. No-output failures (process failed to start) are still
not retried. Added 5 new unit tests covering the broadened policy.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1022cc92-f38f-403a-83c5-4b5027719a76

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 8, 2026

@copilot attempts should resume the session if a run was already partially executed.

Done in c58aa0d. The retry condition is now broadened: the driver retries with --resume whenever the session produced any output (hasOutput), not just on CAPIError 400. This covers all cases where a run was partially executed. CAPIError 400 is still detected and labelled as "CAPIError 400 (transient)" in the log; other partial-execution failures are labelled "partial execution". Processes that produced no output (failed to start / auth error before any work) are still not retried.

Copilot AI changed the title Add copilot driver with retry logic for transient CAPIError 400 errors Add copilot driver with retry logic for partial session failures Apr 8, 2026
@pelikhan pelikhan marked this pull request as ready for review April 8, 2026 16:21
Copilot AI review requested due to automatic review settings April 8, 2026 16:21
@pelikhan pelikhan added the smoke label Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing...

@github-actions github-actions bot removed the smoke label Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🎬 THE ENDSmoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Agent Container Tool Check

Tool Status Version
bash 5.2.21
sh available
git 2.53.0
jq 1.7
yq v4.52.5
curl 8.5.0
gh 2.89.0
node v20.20.2
python3 3.12.3
go 1.24.13
java 10.0.201
dotnet 10.0.201

Result: 12/12 tools available ✅

Overall Status: PASS

🔧 Tool validation by Agent Container Smoke Test · ● 155.8K ·

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Node.js “copilot driver” wrapper to make Copilot CLI runs resilient to transient mid-session failures by retrying with --resume when a run has partially executed (produced output).

Changes:

  • Introduces actions/setup/js/copilot_driver.cjs to wrap Copilot CLI execution with resume-based retry and structured logging.
  • Adds a DriverProvider interface and wires CopilotEngine to invoke Copilot via the driver in both sandbox and non-sandbox execution paths.
  • Updates compiled workflow lockfiles and WASM golden fixtures to reflect the new node .../copilot_driver.cjs ... invocation.
Show a summary per file
File Description
actions/setup/js/copilot_driver.cjs New Node wrapper that retries Copilot runs with --resume after partial execution.
pkg/workflow/agentic_engine.go Adds DriverProvider optional interface for engines to provide a JS driver script name.
pkg/workflow/copilot_engine.go CopilotEngine implements GetDriverScriptName() returning copilot_driver.cjs.
pkg/workflow/copilot_engine_execution.go Prefixes Copilot CLI invocation with node ${RUNNER_TEMP}/gh-aw/actions/copilot_driver.cjs ....
pkg/workflow/copilot_engine_test.go Adds unit test asserting driver script name and that execution steps include the driver.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/basic-copilot.golden Updates expected AWF sandbox command to include the Node driver wrapper.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/with-imports.golden Updates expected AWF sandbox command to include the Node driver wrapper.
.github/workflows/workflow-health-manager.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/workflow-generator.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/weekly-safe-outputs-spec-review.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/weekly-blog-post-writer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/update-astro.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/test-workflow.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/test-project-url-default.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/test-dispatcher.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/terminal-stylist.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/technical-doc-writer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/super-linter.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/sub-issue-closer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/smoke-update-cross-repo-pr.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/smoke-service-ports.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/smoke-create-cross-repo-pr.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/security-review.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/security-compliance.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/research.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/repository-quality-improver.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/repo-tree-map.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/repo-audit-analyzer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/refiner.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/q.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/python-data-charts.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/pr-triage-agent.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/pr-nitpick-reviewer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/portfolio-analyst.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/plan.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/pdf-summary.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/org-health-report.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/notion-issue-summary.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/metrics-collector.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/jsweep.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/issue-triage-agent.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/issue-monster.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/gpclean.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/glossary-maintainer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/github-remote-mcp-auth-test.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/functional-pragmatist.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/firewall.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/example-permissions-warning.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/docs-noob-tester.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/dictation-prompt.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/dev.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/dependabot-go-checker.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/dependabot-burner.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/dead-code-remover.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-workflow-updater.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-team-status.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-semgrep-scan.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-secrets-analysis.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-regulatory.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-performance-summary.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-malicious-code-scan.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-integrity-analysis.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-firewall-report.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-cli-tools-tester.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-cli-performance.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-assign-issue-to-user.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/daily-architecture-diagram.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/craft.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/copilot-token-optimizer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/copilot-token-audit.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/copilot-pr-merged-report.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/contribution-check.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/constraint-solving-potd.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/code-simplifier.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/code-scanning-fixer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/ci-coach.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/brave.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/bot-detection.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/artifacts-summary.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/archie.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/agentic-observability-kit.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/agent-persona-explorer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/agent-performance-analyzer.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.
.github/workflows/ace-editor.lock.yml Updates locked workflow command to invoke Copilot via the Node driver.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 128/128 changed files
  • Comments generated: 4

Comment on lines +133 to +138
child.on("exit", (code, signal) => {
const durationMs = Date.now() - startTime;
const exitCode = code ?? 1;
log(`attempt ${attempt + 1}: process exited` + ` exitCode=${exitCode}` + (signal ? ` signal=${signal}` : "") + ` duration=${formatDuration(durationMs)}` + ` stdout=${stdoutBytes}B stderr=${stderrBytes}B hasOutput=${hasOutput}`);
resolve({ exitCode, output: collectedOutput, hasOutput, durationMs });
});
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In runProcess, resolving on the child process exit event can miss trailing stdout/stderr data because exit may fire before stdio streams are fully drained/closed. That can incorrectly set hasOutput=false (skipping retries) and can miss the CAPIError: 400 pattern. Prefer resolving on the close event (which waits for stdio to close) and keep exit only for logging if needed.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +129
let collectedOutput = "";
let hasOutput = false;
let stdoutBytes = 0;
let stderrBytes = 0;

child.stdout.on(
"data",
/** @param {Buffer} data */ data => {
hasOutput = true;
stdoutBytes += data.length;
collectedOutput += data.toString();
process.stdout.write(data);
}
);

child.stderr.on(
"data",
/** @param {Buffer} data */ data => {
hasOutput = true;
stderrBytes += data.length;
collectedOutput += data.toString();
process.stderr.write(data);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectedOutput concatenates the full stdout+stderr into a single string. Copilot sessions can emit large outputs, which can cause high memory usage or GC churn in the wrapper process. Since the driver only needs to detect a small error signature, consider capping what you retain (e.g., keep a rolling tail buffer / scan incrementally) while still computing hasOutput and byte counts.

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +121
// GetDriverScriptName returns the filename of the JavaScript driver script that wraps
// the Copilot CLI with retry logic for transient CAPIError 400 errors.
func (e *CopilotEngine) GetDriverScriptName() string {
return "copilot_driver.cjs"
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the driver wraps the CLI "with retry logic for transient CAPIError 400 errors", but the driver actually retries any non-zero exit after partial execution (any output), regardless of error type. Updating the doc comment to match the broader behavior will avoid confusion for future maintainers.

Copilot uses AI. Check for mistakes.
Comment on lines +164 to +167
//
// When a driver script is provided (GetDriverScriptName), wrap the copilot invocation with
// `node <driver> <commandName> <args>` to enable retry logic for transient CAPIError 400 errors.
driverScriptName := e.GetDriverScriptName()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment frames the driver wrapper as enabling retries specifically for transient CAPIError 400, but the driver retries any partially-executed run (any output + non-zero exit). Consider adjusting the wording here to reflect the actual policy (partial execution resume) so the behavior is discoverable in the Go code.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Smoke test run 24146254813
Merged PRs: #25333 Fix Changeset Generator missing github network preset in allowed-domains config; #25314 fix: filter AWF infrastructure lines from engine failure context
GitHub MCP ✅
Serena activate+find_symbol ✅
Playwright title check ❌
Web-fetch check ❌
File write + cat + build ✅
Overall: FAIL

🔮 The oracle has spoken through Smoke Codex ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🤖 Smoke test result for PR #25329 (@pelikhan @Copilot)

Test
GitHub MCP
mcpscripts-gh
Serena MCP
Playwright
Web Fetch
File Write + Bash
Discussion + Build
Artifact Upload
PR Review

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot · ● 1.5M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing PR #25329 — the copilot driver with retry logic for CAPIError 400 partial session failures. The implementation is clean and well-documented. Left 2 inline comments: one suggesting making MAX_RETRIES configurable, and another about the logging verbosity. Overall, solid approach to handling transient failures.

📰 BREAKING: Report filed by Smoke Copilot · ● 1.5M

const { spawn } = require("child_process");

// Maximum number of retry attempts after the initial run
const MAX_RETRIES = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MAX_RETRIES = 3 constant is hardcoded here. Consider making this configurable via an environment variable (e.g., COPILOT_DRIVER_MAX_RETRIES) to allow tuning in CI environments without code changes.

* grep out of the combined agent-stdio.log.
* @param {string} message
*/
function log(message) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log function writes directly to process.stderr. Consider also supporting a DEBUG env var check to suppress verbose driver logs unless debugging is needed — helps keep agent-stdio.log readable in normal runs.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Commit pushed: 448ff50

Generated by Changeset Generator

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Apr 8, 2026

@copilot make sure the copilot exit code is propagated to the exit

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Smoke Test Run 24146254696 — PARTIAL ✅

Core: 1✅ 2✅ 3✅ 4✅ 5✅ 6✅ 7✅ 8✅ 9✅ 10✅ 11✅ 12✅
PR review: 13✅ 14✅ 15✅ 16⚠️ 17✅ 18✅ 19⚠️

All non-skipped tests passed. Tests 16 & 19 skipped (expected).

💥 [THE END] — Illustrated by Smoke Claude · ● 289.1K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥 Automated smoke test review - all systems nominal! (Run 24146254696)

💥 [THE END] — Illustrated by Smoke Claude · ● 289.1K

hasOutput = true;
stderrBytes += data.length;
collectedOutput += data.toString();
process.stderr.write(data);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke test review comment #1 — The collectedOutput buffer captures full stdout/stderr for retry detection. Worth noting that the hasOutput check (any output → retry eligible) is correct for the broader retry policy described in this PR. Consider adding a max buffer cap to guard against extremely verbose sessions.

// the Copilot CLI with retry logic for transient CAPIError 400 errors.
func (e *CopilotEngine) GetDriverScriptName() string {
return "copilot_driver.cjs"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke test review comment #2 — The comment references CAPIError 400 specifically, but the actual retry policy in copilot_driver.cjs now retries on any partial execution (any output + non-zero exit). Updating this doc comment to say "partial execution retry" would keep Go and JS in sync semantically.

…ut are fully captured

The 'exit' event can fire before stdout/stderr buffers are fully
drained. Resolving on 'close' instead guarantees that collectedOutput
and hasOutput are complete before the retry decision is made, and that
the copilot exit code is faithfully propagated through the driver.

The 'exit' handler is kept for early diagnostic logging only.
Both events pass (code, signal); the close handler is now the single
source of truth for exitCode used in process.exit().

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5d9da918-aa26-45c4-b173-ead01a7a9163

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 8, 2026

@copilot make sure the copilot exit code is propagated to the exit

Fixed in f04bbbd. The root cause was resolving the promise on the exit event, which can fire before stdout/stderr buffers are fully flushed — meaning the captured exit code or output could be incomplete. Now resolving on close instead, which Node.js guarantees fires only after all stdio streams are drained. The exit handler is kept for early diagnostic logging. The copilot exit code (code ?? 1 for signal-terminated processes) flows from closeresult.exitCodelastExitCodeprocess.exit(lastExitCode).

@pelikhan pelikhan merged commit 599e471 into main Apr 8, 2026
@pelikhan pelikhan deleted the copilot/fix-retry-logic-capierror-400 branch April 8, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Copilot CLI should retry on transient CAPIError 400 Bad Request during agentic workflow execution

3 participants