[workflow-analysis] Weekly Workflow Analysis — 2026-04-06 #24847
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-04-07T09:49:24.823Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Analysis of the 20 most recent workflow runs from the past week (all on 2026-04-06). Data covers 1.4h of wall time, 9.6M tokens (10.7M effective), 198 agent turns, and 58 GitHub API calls.
Overview
75% success rate — 15 of 20 runs succeeded (3 additional are still in-flight/baseline). Five failures cluster around three distinct root causes: infrastructure/dependency installation errors, a complete firewall misconfiguration, and a
safe_outputsjob crash after a successful agent run.Summary
Failures
❌ Schema Feature Coverage Checker — Firewall Misconfiguration
Run: §24022436614 · Codex · 2.8m
This is the most severe failure: 100% of network requests were blocked (43/43). The workflow has no
network.allowedlist in its frontmatter, so the Codex agent cannot reachapi.githubcopilot.com,api.github.com,json-schema.org,json.schemastore.org, or any other domain.Fix: Add a
networkblock to the workflow frontmatter:❌ Daily News — Exit Code 127 (Command Not Found)
Run: §24025680379 · Copilot · 4.6m
Failed at
agent/Write Safe Outputs Toolswith exit code 127 (command not found). Baseline comparison shows a significant regression: the last successful run (2026-03-24) completed 10 turns withwrite_capableposture; this run completed 0 turns inread_onlyposture — the agent never started.Likely cause: A missing tool or binary in the runner environment, possibly a recently removed or renamed dependency.
❌ CI Cleaner — Agent Exit Code 1
Run: §24022270848 · Claude Code · 3.8m
The
agentjob failed (exit code 1), also with 0 turns completed. Baseline (2026-04-04, successful) ran 22 turns withwrite_capableposture. Same pattern as Daily News — the agent never reached execution.❌ Dev — SDK Installation Failure
Run: §24025857784 · Copilot · 1.5m
Failed at the
indexingjob in theInstall@tobilu_qmdSDKstep. This is a pre-agent setup step; the agent was never reached (all downstream jobs skipped). Likely a broken or missing package version.❌ Contribution Check — Safe Outputs Job Failure
Run: §24020210365 · Copilot (claude-sonnet-4.6) · 6.3m
Unusual failure: the agent ran successfully (4m, 19 turns, 8 write actions), the detection job also passed, but the
safe_outputsjob itself failed. The agent produced valid outputs (2 comments, 1 label, 1 issue) but they were not applied.This is a post-agent infrastructure failure unrelated to agent behavior. It may be the same underlying issue as the Daily News/CI Cleaner failures given the same date.
Contribution Check — Agentic Assessment
The audit also flagged this workflow for optimization (independent of the failure):
list_pull_requests/pull_request_readcalls to a pre-agent shell step writing JSON to/tmp/gh-aw/agent/, reducing inference turns and costPerformance Highlights (Successful Runs)
Failure Patterns
Three distinct root causes account for all 5 failures:
Infrastructure failures (Daily News, CI Cleaner, Dev) — Exit code 127/1 in pre-agent or agent setup steps. These likely share a common infrastructure or dependency cause (all occurred on 2026-04-06 within a 2-hour window). Worth checking if a runner image update or package removal happened around 07:00 UTC that day.
Firewall misconfiguration (Schema Feature Coverage Checker) — Codex workflow missing
network.allowedentirely. 100% block rate. Simple fix: add required domains to workflow frontmatter.Safe outputs job failure (Contribution Check) — Agent completed successfully; post-agent infrastructure failed. May be related to the same infrastructure event as pattern rejig docs #1.
Recommendations
Immediate: Fix Schema Feature Coverage Checker firewall config — add
network.alloweddomains includingapi.githubcopilot.com,api.github.com,json-schema.org,json.schemastore.org.Investigate infrastructure event ~07:00 UTC 2026-04-06 — Daily News, CI Cleaner, Dev, and Contribution Check all failed within a 3-hour window. Check for runner image changes, package removals, or dependency version bumps that could explain exit code 127 and SDK install failures.
Optimize Contribution Check — Move PR data-fetching to a pre-agent deterministic step. Estimated savings: ~73% fewer agent turns, lower token cost.
Monitor Daily News baseline regression — The behavior shift from
write_capable/10 turns→read_only/0 turnssuggests the workflow isn't even starting its agent loop. Once the infrastructure issue is resolved, verify it recovers.References:
Beta Was this translation helpful? Give feedback.
All reactions