Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions .github/aw/create-agentic-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,127 @@ This gives users the choice of triggering via comment (`/deploy`) or via label,
- `slash_command` full reference: https://github.github.com/gh-aw/reference/command-triggers/
- `label_command` and LabelOps: https://github.github.com/gh-aw/patterns/label-ops/

## Creating Monitoring Workflows

Monitoring workflows react automatically to pipeline events. The primary trigger for **GitHub Actions-internal** monitoring is `workflow_run`. Use it when you want to detect failures in another workflow in the same repository and take action — for example, posting a comment, opening an issue, or sending a notification. This is the recommended pattern for **DevOps monitoring** scenarios such as CI/CD failure detection.

> **`deployment_status` vs `workflow_run`**: Use `deployment_status` for **external deployment services** (Heroku, Vercel, Railway, Fly.io, etc.) that post status back to GitHub via the Deployments API. Use `workflow_run` for **GitHub Actions-internal** pipelines. See reference: @.github/aw/deployment-status.md for the `deployment_status` pattern.

### workflow_run: React to CI/CD pipeline results

`workflow_run` fires whenever a named workflow completes (or starts). Pair it with an `if:` condition on `github.event.workflow_run.conclusion` to act only on failures.

**Key context variables available in the prompt:**

| Expression | Description |
|---|---|
| `${{ github.event.workflow_run.conclusion }}` | Final result: `success`, `failure`, `cancelled`, `skipped`, `timed_out` |
| `${{ github.event.workflow_run.name }}` | Name of the workflow that ran |
| `${{ github.event.workflow_run.id }}` | Run ID (use with `gh run view`) |
| `${{ github.event.workflow_run.html_url }}` | Direct link to the run |
| `${{ github.event.workflow_run.head_branch }}` | Branch the run was triggered on |
| `${{ github.event.workflow_run.head_commit.message }}` | Commit message of the triggering commit |

**Example 1 — Notify on CI failure (minimal, no pre-steps):**

This is the simplest monitoring workflow. It activates whenever the "CI" workflow completes with a failure and posts a comment on the triggering PR.

```aw wrap
---
on:
workflow_run:
workflows: ["CI"]
types: [completed]
if: ${{ github.event.workflow_run.conclusion == 'failure' }}
permissions:
contents: read
tools:
github:
toolsets: [default]
safe-outputs:
add-comment:
max: 1
---

The CI workflow failed for branch `${{ github.event.workflow_run.head_branch }}`.

Run details:
- **Run ID**: ${{ github.event.workflow_run.id }}
- **Conclusion**: ${{ github.event.workflow_run.conclusion }}
- **Link**: ${{ github.event.workflow_run.html_url }}

Use the GitHub MCP tools to find the open pull request for branch `${{ github.event.workflow_run.head_branch }}`. Post a concise comment on that PR summarising the failure and suggesting next steps for the author.
```

**Example 2 — Fetch CI logs, diagnose root cause, and notify (with pre-steps):**

This pattern fetches the workflow logs before the agent runs, keeping the agent focused on analysis rather than API calls. Suitable for DevOps teams that need actionable failure summaries with root-cause analysis.

```aw wrap
---
on:
workflow_run:
workflows: ["CI", "Deploy"]
types: [completed]
if: ${{ github.event.workflow_run.conclusion == 'failure' }}
permissions:
contents: read
actions: read # required to download workflow run logs
tools:
github:
toolsets: [default]
cache-memory: true # deduplication: skip already-diagnosed run IDs
steps:
- name: Fetch failed run logs
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
RUN_ID: ${{ github.event.workflow_run.id }}
run: |
mkdir -p /tmp/gh-aw/agent
gh run view "$RUN_ID" --log-failed > /tmp/gh-aw/agent/ci-logs.txt 2>&1 || true
tail -500 /tmp/gh-aw/agent/ci-logs.txt > /tmp/gh-aw/agent/ci-logs-trimmed.txt
safe-outputs:
add-comment:
max: 1
---

The `${{ github.event.workflow_run.name }}` workflow failed on branch `${{ github.event.workflow_run.head_branch }}`.

**Run details:**
- Run ID: ${{ github.event.workflow_run.id }}
- Link: ${{ github.event.workflow_run.html_url }}
- Commit: ${{ github.event.workflow_run.head_commit.message }}

**Instructions:**

1. Check `/tmp/gh-aw/cache-memory/seen-runs.json` (a JSON array of run ID strings, e.g. `["12345","67890"]`). If `${{ github.event.workflow_run.id }}` is already listed, stop — this run was already processed.

2. Read `/tmp/gh-aw/agent/ci-logs-trimmed.txt` and identify the root cause of the failure.

3. Use GitHub MCP tools to find the open pull request for branch `${{ github.event.workflow_run.head_branch }}`.

4. Post a comment on that PR with:
- A one-sentence summary of what failed
- The likely root cause
- Suggested next steps for the author
- A link to the failed run: ${{ github.event.workflow_run.html_url }}

5. Append `${{ github.event.workflow_run.id }}` to `/tmp/gh-aw/cache-memory/seen-runs.json` so this run is not re-processed on retries.
```

**When to use `workflow_run` for monitoring:**

- ✅ Monitoring GitHub Actions CI pipelines (test, lint, build workflows)
- ✅ Monitoring deploy workflows that run inside GitHub Actions
- ✅ Alerting on `timed_out` or `cancelled` runs in addition to `failure`
- ✅ Creating issues or posting comments automatically on pipeline failure
- ⚠️ Only works for workflows in the **same repository**
- ❌ Not suitable for external deployment services — use `deployment_status` instead

**Guiding the user when they ask for DevOps monitoring:**

When a user asks for "notify me when my pipeline fails", "alert on CI failures", "deployment failure notification", or similar — default to this `workflow_run` pattern. Ask which workflow(s) to monitor (the `workflows:` list) and whether they want log-based root-cause analysis (Example 2) or a lightweight notification (Example 1).

## Best Practices

### Improver Coding Agents in Large Repositories
Expand Down