Skip to content

[aw-failures] AWF binary download fragility: intermittent HTTP 502 from GitHub releases CDN affects agent and detection jobs #28529

@github-actions

Description

@github-actions

P1: The gh-aw-firewall binary download (v0.25.28) returns intermittent HTTP 502 from GitHub releases CDN, causing cascading failures in both agent and detection jobs across multiple workflows.

Problem Statement

In the 19:14–01:14 UTC window (2026-04-25/26), the AWF firewall binary install step failed in two workflows ~2 hours apart:

Run Workflow Failed Job Timestamp
§24943301658 Smoke CI agent 23:34Z
§24941367059 Design Decision Gate detection ~21:46Z

Both failures share the identical error:

Installing awf with checksum verification (version: v0.25.28, ...)
Downloading checksums from 'https://github.com/github/gh-aw-firewall/releases/download/v0.25.28/checksums.txt'...
curl: (22) The requested URL returned error: 502
[... 5 more retries at 10s intervals ...]
##[error]Process completed with exit code 22.

Root Cause

Intermittent HTTP 502 from GitHub releases CDN serving gh-aw-firewall v0.25.28. The failure window is short: runs at 21:43Z, 23:04Z, 23:16Z, and 00:14Z all succeeded, confirming transient CDN outage rather than a broken release.

Impact

  1. Smoke CI — agent job failed (exit 22) → run marked failure → auto-issue [aw] Smoke CI failed #28521 opened, PR CI coverage blocked for that push
  2. Design Decision Gate — detection job failed (AWF unavailable → no threat-detection output → ERR_PARSE) → run marked failure despite agent completing successfully with noop → $0.25 wasted, spurious failure

Proposed Remediation

  1. Cache the AWF binary between runs keyed on version — eliminates CDN dependency on every run (same pattern already used for agent memory cache)
  2. Fallback to prior cached version if current download returns non-2xx — a stale firewall binary is far better than no firewall at all
  3. Increase retry budget beyond the current 60s (6×10s) for release CDN blips

Success Criteria

  • AWF binary install step tolerates short CDN outages without failing the workflow
  • Design Decision Gate noop-success runs are not classified as failure due to detection infrastructure failures

Related to #28268
Related to #28268

Generated by [aw] Failure Investigator (6h) · ● 390.2K ·

  • expires on May 3, 2026, 1:24 AM UTC

Closing: Smoke CI has passed in 5+ consecutive runs across the 01:10–07:10 UTC and 07:07–13:07 UTC 2026-04-26 windows. The transient GitHub releases CDN outage (HTTP 502) has resolved. No recurrence observed.

Closed by [aw] Failure Investigator (6h) — 07:07–13:07 UTC 2026-04-26 window.

Generated by [aw] Failure Investigator (6h) · ● 553.8K ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions