Skip to content

[aw-failures] Smoke CI failure cluster 2026-05-02: 4 failures — Crush EROFS, Gemini API key, safe_outputs permissions #29665

@github-actions

Description

@github-actions

Overview

The Smoke CI batch on branch copilot/add-edit-wiki-support (PR #29626) at ~01:00Z on 2026-05-02 produced 4 failures across all non-Copilot engines. Two failures are P0 (agent crashes before any output); two are P1 (agent succeeded but safe_outputs job failed). No prior tracking issue was found for these failure signatures.

Failure Clusters

Priority Run ID Workflow Failing Job Root Cause
P0 §25239718625 Smoke Crush agent EROFS: read-only filesystem blocks Crush binary install into node_modules
P0 §25239718609 Smoke Gemini agent GEMINI_API_KEY invalid — HTTP 400 API_KEY_INVALID on first API call
P1 §25239718599 Smoke Claude safe_outputs resolve_pull_request_review_threadResource not accessible by integration
P1 §25239718605 Smoke Codex safe_outputs edit_wiki HTTPS push: fatal: could not read Username (no git credentials)

Evidence

P0 — Crush: EROFS crash details

From agent-stdio.log for run 25239718625:

ERREOFS: read-only file system, mkdir
  '/opt/hostedtoolcache/node/24.14.1/x64/lib/node_modules/`@charmland/crush`/bin'

Crush v0.59.0 downloaded successfully but the npm post-install script tries to copy the extracted binary into the node module's bin/ subdirectory inside the hosted tool cache, which is mounted read-only inside the chroot/sandbox. A secondary error also occurred: Failed to transfer /host/home/runner/work/_temp/gh-aw/safeoutputs ownership to chroot user. Zero turns, zero tool calls — total agent runtime ~1m.

P0 — Gemini: API key invalid

From gemini-client-error-generateJson-api-2026-05-02T01-00-16-489Z.json and gemini-client-error-Turn.run-sendMessageStream-2026-05-02T01-00-16-721Z.json for run 25239718609:

{"error": {"code": 400, "status": "INVALID_ARGUMENT",
  "message": "API key not valid. Please pass a valid API key.",
  "details": [{"reason": "API_KEY_INVALID", "domain": "googleapis.com"}]}}

Both generateJson (routing/classification) and sendMessageStream (main conversation stream) hit the error simultaneously at 01:00:16Z. The Crush run's stdio log also warned GEMINI_API_KEY is not set at line 15-16, confirming the key is absent/expired in the CI secret store.

P1 — Smoke Claude: safe_outputs permission failure

From workflow-logs/safe_outputs/8_Process Safe Outputs.txt for run 25239718599:

The GitHub App/token used by the safe_outputs job lacks the pull_request_review write permission needed to resolve review threads. This permission is not required for most safe output operations, but the Smoke Claude task exercises it.

Additional finding: the agent was resource-heavy for a Triage-domain task — 39 turns vs. typical baseline, classified resource_heavy_for_domain (high) and poor_agentic_control (medium).

P1 — Smoke Codex: edit_wiki credential failure + missing MCP

From workflow-logs/safe_outputs/9_Process Safe Outputs.txt for run 25239718605:

fatal: could not read Username for 'https://github.com': No such device or address

The safe_outputs job clones the wiki over HTTPS and applies a patch, but no git credential helper is configured in that job context (unlike the main agent job). 3 of 4 outputs succeeded (created issue #29655, closed #29638, added comment on PR #29626); the wiki edit failure caused the overall job to exit as failure.

Additional finding: Codex does not have the web-fetch MCP tool configured — agent called safeoutputs.missing_tool to signal this. Codex fell back to playwright.browser_navigate targeting github.com. Firewall also blocked 5 requests to ab.chatgpt.com:443 (4) and chatgpt.com:443 (1).

Existing Issue Correlation

GitHub issues API was not reachable during this investigation (HTTP 403 on localhost proxy). No de-duplication against existing tracking was possible. Smoke test result issues #29655 and #29658 were created by the agents themselves as part of normal operation.

Proposed Fix Roadmap

Priority Item Owner Area
P0 Crush install path: write binary to writable temp dir instead of node_modules Crush integration / workflow setup
P0 Rotate or re-configure GEMINI_API_KEY in CI secrets Secrets / infra
P1 Grant resolve PR review thread permission to safe_outputs job token Workflow permissions
P1 Configure git credentials for wiki HTTPS push in safe_outputs job safe_outputs infra
P2 Add web-fetch MCP to Codex engine config Engine config

Sub-Issues Created

  • #aw_p0fix — P0 crashes: Crush EROFS install + Gemini API key invalid

References:

Generated by [aw] Failure Investigator (6h) · ● 639.1K ·

  • expires on May 9, 2026, 1:29 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions