feat: increase logging in copilot driver for silent startup failures#25390
feat: increase logging in copilot driver for silent startup failures#25390
Conversation
…(#issue) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/4d1135bb-2e56-4a55-99ad-bad9a86b30fc Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves diagnostics in the Copilot CLI driver to make “silent” startup failures (exit code 1 with no stdout/stderr) easier to debug in CI/sandbox environments.
Changes:
- Add a pre-flight accessibility/executability check for the Copilot command and log the result.
- Improve
spawnfailure logging by includingerr.codeanderr.syscall. - Enrich startup and no-output logs with Node version/platform and actionable “possible causes” text.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/copilot_driver.cjs | Adds pre-flight command checks and expands driver logs for startup/spawn/no-output diagnostics. |
| actions/setup/js/copilot_driver.test.cjs | Adds small unit-style assertions for new log message fields/content. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
actions/setup/js/copilot_driver.cjs:90
- The pre-flight log messages in the
catchblocks hard-code a single cause (“binary does not exist” / “permission denied”), butfs.accesscan fail for other reasons (e.g., EACCES on the F_OK check due to directory permissions, ENOTDIR, etc.). Capturing the thrown error and loggingerr.code/err.messagewould avoid misleading diagnostics.
try {
await fs.promises.access(command, fs.constants.F_OK);
} catch {
log(`pre-flight: command not found: ${command} (F_OK check failed — binary does not exist at this path)`);
return false;
}
try {
await fs.promises.access(command, fs.constants.X_OK);
log(`pre-flight: command is accessible and executable: ${command}`);
return true;
} catch {
log(`pre-flight: command exists but is not executable: ${command} (X_OK check failed — permission denied)`);
return false;
actions/setup/js/copilot_driver.cjs:203
checkCommandAccessible()returns a boolean but the result is ignored (await checkCommandAccessible(command);). Either use the return value to change control flow (e.g., exit early when the command path is definitely unusable) or make the helpervoidto avoid implying it affects execution.
log(`starting: command=${command} maxRetries=${MAX_RETRIES} initialDelayMs=${INITIAL_DELAY_MS}` + ` backoffMultiplier=${BACKOFF_MULTIPLIER} maxDelayMs=${MAX_DELAY_MS}` + ` nodeVersion=${process.version} platform=${process.platform}`);
await checkCommandAccessible(command);
- Files reviewed: 2/2 changed files
- Comments generated: 1
| async function checkCommandAccessible(command) { | ||
| try { | ||
| await fs.promises.access(command, fs.constants.F_OK); | ||
| } catch { | ||
| log(`pre-flight: command not found: ${command} (F_OK check failed — binary does not exist at this path)`); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
checkCommandAccessible() is called with command exactly as passed to the driver. When command is a PATH-resolved binary like copilot (the usage example suggests this), fs.access('copilot', ...) checks only the current working directory and will incorrectly log “command not found”. Consider skipping the fs-based checks unless command includes a path separator / is absolute, or resolve the executable via PATH (e.g., which/command -v) before calling fs.access.
This issue also appears in the following locations of the same file:
- line 78
- line 200
…eScript compatibility (#25406) * feat: increase logging in copilot driver for silent startup failures (#issue) (#25390) * feat(logging): add debug logging to 5 CLI files for improved troubleshooting (#25393) * fix: add parentheses to JSDoc type cast in copilot_driver.cjs for TypeScript compatibility Agent-Logs-Url: https://github.com/github/gh-aw/sessions/34f7e8b3-df09-41bc-b786-8bb4b22ebb7e Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
…25366) * Initial plan * Initial plan for CLI proxy: start difc-proxy on host, pass new AWF flags Agent-Logs-Url: https://github.com/github/gh-aw/sessions/cd08abe8-65f6-4cd4-aca7-a2cfa59d7e81 Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> * feat: replace --enable-cli-proxy with --difc-proxy-host, start difc-proxy on host When features.cli-proxy is enabled, the compiler now: 1. Starts a difc-proxy container on the host before AWF execution 2. Passes --difc-proxy-host host.docker.internal:18443 and --difc-proxy-ca-cert /tmp/gh-aw/difc-proxy-tls/ca.crt to AWF 3. Injects GH_TOKEN into the AWF step env with --exclude-env GH_TOKEN 4. Stops the CLI proxy container after AWF execution Removed deprecated flags: --enable-cli-proxy, --cli-proxy-policy. Minimum AWF version bumped to v0.26.0 for CLI proxy support. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/cd08abe8-65f6-4cd4-aca7-a2cfa59d7e81 Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> * fix: address code review feedback for CLI proxy - Handle empty policy gracefully in start_cli_proxy.sh (proxy starts without guard filtering when no policy is configured) - Exit with error when proxy fails to start (prevents AWF from running with a non-functional proxy) - Rename hasCliProxyNeeded to isCliProxyNeeded for naming consistency Agent-Logs-Url: https://github.com/github/gh-aw/sessions/cd08abe8-65f6-4cd4-aca7-a2cfa59d7e81 Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> * fix: address review feedback and recompile agentic workflows - Bump DefaultFirewallVersion to v0.26.0 to align with AWFCliProxyMinVersion - Gate addCliProxyGHTokenToEnv on awfSupportsCliProxy and awfSupportsExcludeEnv to prevent leaking GH_TOKEN into the agent container on older AWF versions - Make start_cli_proxy.sh idempotent by removing any leftover container first - Update changeset to describe current behavior (difc-proxy-host flags) - Recompile all agentic workflows with updated DefaultFirewallVersion Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e59645aa-2981-470c-bd44-1075fd88317a Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> * fix: update golden files and lock file for AWF v0.26.0 version bump (#25400) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f16055db-4d7a-479e-acae-0713caf5344d Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> * fix: update DefaultFirewallVersion to v0.25.17, fix shell quoting and docstring Agent-Logs-Url: https://github.com/github/gh-aw/sessions/35642b32-32d1-4a2d-bea7-8041bed78e77 Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> * fix: add parentheses to JSDoc type cast in copilot_driver.cjs for TypeScript compatibility (#25406) * feat: increase logging in copilot driver for silent startup failures (#issue) (#25390) * feat(logging): add debug logging to 5 CLI files for improved troubleshooting (#25393) * fix: add parentheses to JSDoc type cast in copilot_driver.cjs for TypeScript compatibility Agent-Logs-Url: https://github.com/github/gh-aw/sessions/34f7e8b3-df09-41bc-b786-8bb4b22ebb7e Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com> Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Landon Cox <landon.cox@microsoft.com>
Summary
Investigated failure at https://github.com/github/gh-aw/actions/runs/24158694003/job/70504330302#step:22
Root cause of the failure: The Copilot CLI (pid=147) started inside the sandbox container but exited after just 1 second with exit code 1 and zero bytes of output (stdout=0B stderr=0B). The driver correctly did not retry since there was nothing to resume, but the lack of output made it impossible to diagnose the root cause (no indication of whether it was a missing binary, permission issue, auth failure, or silent crash).
Changes
All changes are in
actions/setup/js/copilot_driver.cjs:Pre-flight check (
checkCommandAccessible): Before spawning, verify the command exists (F_OK) and is executable (X_OK), logging the result. This will surfacebinary not foundorpermission deniedissues immediately rather than leaving them as a silent exit-code-1.Better
errorevent logging: CasterrtoNodeJS.ErrnoExceptionto extracterr.codeanderr.syscall. These fields (e.g.code=ENOENT syscall=spawn) are critical for diagnosing spawn failures.Actionable "no output" message: When the process exits with no output, the log now includes a list of possible causes:
binary not found, permission denied, auth failure, or silent startup crash.Node.js version + platform at startup: Adds
nodeVersion=v22.x.x platform=linuxto the starting log line for environment diagnostics.What the new logs would look like in a "binary not found" scenario