feat(ci): add CI Failure Doctor workflow by Copilot · Pull Request #340 · github/gh-aw-firewall

Copilot · 2026-01-18T01:39:04Z

Creates an automated CI Failure Doctor workflow that investigates CI/CD failures, analyzes logs, identifies root causes, and creates investigation issues.

Changes Made

New workflow: .github/workflows/ci-doctor.md - an agentic workflow that:
- Triggers on workflow_run completion for all 23 workflows in the repository (GitHub Actions doesn't support wildcards)
- Only activates when conclusion == 'failure'
- Uses GitHub Actions tools to fetch and analyze job logs
- Searches cache-memory for similar historical failures
- Creates investigation issues with root cause analysis and recommended actions
- Labels issues with bug, ci

Design Decisions

All workflows monitored: Since GitHub Actions doesn't support wildcards for workflow_run, all workflow names are explicitly listed with a maintenance comment
Concise instructions: The agent prompt is intentionally minimal (~40 lines) to give the agent maximum freedom to explore and discover insights rather than following a rigid investigation structure
Key patterns provided: Common AWF-specific failure patterns (Docker network conflicts, container cleanup issues, iptables problems, Squid proxy failures) are listed as hints without prescriptive investigation steps

Configuration

Timeout: 10 minutes
Tools: GitHub API (with actions toolset), cache-memory
Safe outputs: Issue creation with "🏥 CI Failure" prefix, comments (max 1)

Original prompt

This section details on the original issue you should resolve

<issue_title>[plan] Implement CI Failure Doctor workflow</issue_title>
<issue_description>## Objective

Create an automated workflow that investigates CI/CD failures, analyzes logs, identifies root causes, and creates detailed investigation reports with remediation steps.

Context

This repository has complex Docker/networking tests that frequently fail with opaque errors like "subnet pool overlap" and container cleanup race conditions. Manual log analysis wastes developer time and accumulated failure patterns aren't leveraged.

Approach

Create workflow file: .github/workflows/ci-doctor.md

Configure triggers:

workflow_run on completion of: test-integration, test-coverage, test-action

Only trigger when conclusion == 'failure'

Implement investigation protocol:

Fetch workflow run details and job logs via GitHub API

Analyze for Docker network issues (subnet pool exhaustion, overlaps)

Check for container cleanup race conditions

Detect iptables rule conflicts

Identify Squid proxy startup failures

Search for similar past failures using cache-memory

Create investigation report:

Detailed root cause analysis

Specific remediation steps

Link to similar past failures

Label with bug, ci, needs-investigation

Store failure patterns: Update cache-memory with new patterns

Files to Create

.github/workflows/ci-doctor.md - Main workflow

Reference: agentics/ci-doctor.md template

Domain-Specific Focus Areas

Docker network pool exhaustion (172.30.0.0/24 subnet conflicts)

Container cleanup race conditions (timeout kills leave orphaned resources)

iptables rule conflicts (NET_ADMIN capability issues)

Squid proxy healthcheck failures

GitHub Actions runner Docker version incompatibilities

Acceptance Criteria

Workflow triggers automatically on CI failures for specified test workflows

Creates detailed investigation issues with root cause analysis

Identifies Docker/networking specific failure patterns

Searches cache-memory for similar historical failures

Provides actionable remediation steps

Completes within 10 minutes timeout

Success Metrics

Reduce failure diagnosis time from manual (hours) to <30 minutes

Build knowledge base of 10+ common failure patterns within first month
Related to [plan] Enhance agentic workflow maturity to Level 4 (Optimized) #332

AI generated by Plan Command for discussion #328

Comments on the Issue (you are @copilot in this section)

Fixes [plan] Implement CI Failure Doctor workflow #333

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>

Mossaka · 2026-01-18T21:23:39Z

.github/workflows/ci-doctor.md

+
+on:
+  workflow_run:
+    workflows: ["TypeScript Type Check", "Test Coverage", "Test Setup Action", "Examples Test"]


@copilot can you just run for every workflows in this repo? Is there a "*" syntax in agentic workflows?

GitHub Actions workflow_run doesn't support wildcards - each workflow must be listed explicitly. Updated to list all 23 workflows in the repo in 1c080d6.

Mossaka · 2026-01-18T21:25:03Z

.github/workflows/ci-doctor.md

+- **Head SHA**: ${{ github.event.workflow_run.head_sha }}
+- **Head Branch**: ${{ github.event.workflow_run.head_branch }}
+
+## Investigation Protocol


@copilot modify the instructions to make it much more concise. the instructions should be generally vague and not be specific, to allow the agent to have the maximam degree of freedom to explore by itself to discover insights, instead of instructing the agent to follow a rigid structure.

Simplified the instructions from 268 lines to ~40 lines. Now provides just context, a 5-step mission, key patterns to watch for, and expected output format. Agent has full freedom to explore and discover insights. See 1c080d6.

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>

Initial plan

033c489

Copilot AI assigned Copilot and Mossaka Jan 18, 2026

Copilot started work on behalf of Mossaka January 18, 2026 01:39 View session

Copilot AI and others added 2 commits January 18, 2026 01:43

feat(ci): add CI Failure Doctor workflow

b16f1f0

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>

fix: add issues:write permission for CI Doctor

794f427

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>

Copilot AI requested a review from Mossaka January 18, 2026 01:45

Copilot finished work on behalf of Mossaka January 18, 2026 01:45

Mossaka reviewed Jan 18, 2026

View reviewed changes

Copilot started work on behalf of Mossaka January 18, 2026 21:25 View session

Copilot AI and others added 2 commits January 18, 2026 21:28

refactor(ci): simplify CI Doctor instructions and add all workflows

1c080d6

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>

docs: add maintenance comment for workflow list

dcdc2e1

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Create automated CI failure doctor workflow~~ feat(ci): add CI Failure Doctor workflow Jan 18, 2026

Copilot AI requested a review from Mossaka January 18, 2026 21:30

Copilot finished work on behalf of Mossaka January 18, 2026 21:30

Mossaka marked this pull request as ready for review January 18, 2026 22:42

Mossaka merged commit 64b4e40 into main Jan 18, 2026
28 checks passed

Mossaka deleted the copilot/add-ci-failure-doctor-workflow branch January 18, 2026 22:51

github-actions bot mentioned this pull request Jan 18, 2026

feat(ci): add script to transform workflows for local AWF testing #356

Closed

5 tasks

github-actions bot added the smoke-claude label Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add CI Failure Doctor workflow#340

feat(ci): add CI Failure Doctor workflow#340
Mossaka merged 5 commits intomainfrom
copilot/add-ci-failure-doctor-workflow

Copilot AI commented Jan 18, 2026 •

edited

Loading

Uh oh!

Mossaka Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Mossaka Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Design Decisions

Configuration

Context

Approach

Files to Create

Domain-Specific Focus Areas

Acceptance Criteria

Success Metrics

Comments on the Issue (you are @copilot in this section)

Uh oh!

Mossaka Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Mossaka Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 18, 2026 •

edited

Loading