From 8f4ec80ee638506cb4e5b56dd280507da91d3ac5 Mon Sep 17 00:00:00 2001 From: Sasa Junuzovic Date: Wed, 18 Mar 2026 22:58:40 -0700 Subject: [PATCH 1/2] docs: capture reverts, orchestrator v1/v2 work, and responder lessons Changelog: added entries for orchestrator v1/v2 (PRs #140-#143), the v3/responder/trigger-disable reverts, and key discoveries. Agentic-workflows: updated orchestrator section (v1+v2 on main, v3 reverted), responder status (target '*' fix, workflow_dispatch), agent inventory (added orchestrator, updated responder/quality-gate), history (3 new entries covering March 17-18), and 3 new pitfalls (safe output target, review trigger loops, over-specifying agents). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/agentic-workflows.md | 72 +++++++++++++++++++++++++++++++++------ docs/changelog.md | 24 +++++++++++++ 2 files changed, 85 insertions(+), 11 deletions(-) diff --git a/docs/agentic-workflows.md b/docs/agentic-workflows.md index 3be3aad..99836f3 100644 --- a/docs/agentic-workflows.md +++ b/docs/agentic-workflows.md @@ -388,23 +388,32 @@ done This is a known limitation for solo repos. Agent PRs don't need this — the quality gate approves them. -### Pipeline Orchestrator (removed) +### Pipeline Orchestrator (bash-based, partially implemented) -We built a gh-aw agent orchestrator (`pipeline-orchestrator.md`, PR #130) to shepherd PRs through the full lifecycle. It was removed (PR #137) after failing in production — 7-10 minute runs for deterministic logic, auth failures preventing thread resolution, wrong action ordering. See `docs/auto_pr_orchestrator_aw.md` for the full postmortem. +The gh-aw agent orchestrator was removed (PR #137, see `docs/auto_pr_orchestrator_aw.md`). Replaced by a regular GitHub Action (`pipeline-orchestrator.yml`) that runs in seconds, not minutes. -**Replacement**: A regular GitHub Action (`pipeline-orchestrator.yml`) will handle the same logic in bash. See issue #135. +**Current state on main (v1 + v2)**: +- **v1 — Thread resolution**: Triggered by `workflow_run` after Review Responder completes. Queries review threads via GraphQL, resolves threads where the last commenter is not `copilot-pull-request-reviewer` (meaning someone addressed it). Tested on PR #113 — resolved 2 threads in 3 seconds. +- **v2 — Auto-rebase**: Triggered by `push: branches: [main]`. Detects PRs behind main via `mergeStateStatus: BEHIND`, rebases onto `origin/main`, force-pushes with lease. On conflict: adds `aw-needs-rebase` label. Tested on PR #113 — rebased and auto-merge fired in 7 seconds. -**Current state**: Issue dispatch (implementer) and CI fixer dispatch are inactive until #135 is implemented. Review-responder and quality-gate continue to work normally. +**Reverted (was v3)**: Issue dispatch, cron trigger, and review loop management were merged (PR #144) then reverted — untested code caused loops. Being reworked on `fix/responder-v2` branch. -### Review Responder Thread ID Lookup +**Planned**: +- v3: Issue dispatch + cron + review loop (in progress on branch) +- v4: CI fixer dispatch +- v5: Stale PR cleanup -The Review Responder instructions include a step to query real `PRRT_` thread IDs via `gh api graphql` before resolving. Without this, the agent hallucinates thread IDs because the MCP server doesn't expose them (#114). The responder runs `gh api graphql` to fetch thread IDs upfront, then uses those real IDs in resolve calls. +See issue #135 for the full roadmap. -No `bash:` tool config is needed — the responder already has `--allow-all-tools` from the compiler default (adding explicit `bash:` would restrict the allowlist and break CI commands like `uv`, `ruff`, `pyright`, `pytest`). +### Review Responder — Current Status -**Limitation**: This fix only applies to PRs whose head branch has the updated `.md` (loaded via `{{#runtime-import}}`). Existing PRs need a rebase onto main to pick it up. +The review-responder agent can read threads, fix code, and reply. However, the safe output handlers require proper context configuration: -This is a workaround until gh-aw upgrades their pinned MCP server (`github/gh-aw#21130`). +- `target: "triggering"` (default) only works with `pull_request_review` triggers — fails with `workflow_dispatch` +- `target: "*"` works with `workflow_dispatch` — the agent includes the PR number in its messages +- The `pull_request_review` trigger caused infinite loops because it fires on ANY review submission (Copilot, quality gate, humans), not just Copilot reviews + +**Fix in progress** (`fix/responder-v2` branch): Switch to `workflow_dispatch` trigger with `target: "*"` on safe outputs. Orchestrator dispatches the responder when needed. Successfully tested on PR #152. @@ -464,6 +473,15 @@ When agents post comments or replies using `GH_AW_WRITE_TOKEN` (a PAT), the comm ### 17. The `if:` frontmatter field gates at the infrastructure level Adding `if: "contains(github.event.pull_request.labels.*.name, 'aw')"` to a workflow's frontmatter compiles to a job-level `if:` on the activation job. When the condition is false, the workflow skips entirely at the GitHub Actions level — zero tokens burned, no agent activation. This is fundamentally different from checking labels in the agent prompt (which still activates the agent, burns compute, then noops). +### 18. Safe output `target` config determines PR context resolution +Safe outputs like `reply-to-pull-request-review-comment` and `push-to-pull-request-branch` default to `target: "triggering"`, which looks up the PR from `github.event.pull_request`. This only works with event-based triggers (`pull_request_review`, `pull_request`). With `workflow_dispatch`, there is no PR in the event context and safe outputs fail with "not running in a pull request context." Fix: set `target: "*"` in the safe output config — the agent includes `pull_request_number` in each message. Also requires `checkout: { fetch: ["*"], fetch-depth: 0 }` for push operations. Use `labels: [aw]` to restrict which PRs can receive pushes. + +### 19. `pull_request_review` trigger fires on ALL review submissions +The `pull_request_review` trigger fires when ANY actor submits a review — not just the intended reviewer. Combined with `roles: all` (workaround for gh-aw#21098), this means Copilot reviews, quality gate approvals, and human comments ALL trigger the workflow. This caused infinite loops: responder fires → pushes → Copilot reviews → responder fires again. Fix: use `workflow_dispatch` instead and have the orchestrator decide when to run the responder. + +### 20. Don't over-specify agent instructions +The responder originally worked with simple instructions: "Read the unresolved review comment threads" and "Reply to the comment thread." Adding explicit `gh api graphql` queries, ordering constraints, and MCP avoidance notes broke the agent's ability to discover threads. The agent is capable of figuring out how to read threads on its own — telling it exactly which API to use interfered with that. + --- @@ -516,8 +534,9 @@ gh run view --log-failed # View failed job logs | `code-health.md` | schedule (daily) / manual | Find refactoring/cleanup opportunities | `create-issue` (max 2), `dispatch-workflow` (implementer) | | `issue-implementer.md` | `workflow_dispatch` (issue number) | Implement fix from issue spec, open PR | `create-pull-request` (draft: false, auto-merge), `push-to-pull-request-branch` | | `ci-fixer.md` | `workflow_dispatch` (PR number) | Fix CI failures on agent PRs | `push-to-pull-request-branch`, `add-labels`, `add-comment` | -| `review-responder.md` | `pull_request_review` | Address Copilot review comments | `push-to-pull-request-branch`, `reply-to-review-comment`, `resolve-thread`, `add-labels` | -| `quality-gate.md` | `pull_request_review` | Evaluate quality + blast radius, approve or block | `submit-pull-request-review`, `add-comment` | +| `review-responder.md` | `pull_request_review` (moving to `workflow_dispatch`) | Address review comments | `push-to-pull-request-branch`, `reply-to-review-comment`, `add-labels` | +| `quality-gate.md` | `pull_request_review` | Evaluate quality + blast radius, approve or close | `submit-pull-request-review`, `close-pull-request`, `add-comment`, `add-labels` | +| `pipeline-orchestrator.yml` | `workflow_run` / `push` / `workflow_dispatch` | Resolve threads, rebase PRs | N/A (bash, not gh-aw) | ### Loop prevention @@ -631,4 +650,35 @@ The enhanced PR rescue (#116) went through three complete rewrites: - Removed dispatch references from code-health and test-analysis prompts. - **Key lesson**: gh-aw agents are for judgment (code review, implementation). Deterministic orchestration (check state → dispatch → resolve) should be regular bash workflows. +### 2026-03-17 — Bash pipeline orchestrator v1+v2 + quality gate close + +- PR #140: Quality gate can now close poor-quality PRs (`close-pull-request` safe-output). Aligned gh-aw version to v0.60.0 in `copilot-setup-steps.yml`. +- PRs #125, #124, #123: Dependabot bumps (checkout v6, setup-uv v7, codeql-action v4). Rebased and merged sequentially. +- PR #141: Pipeline orchestrator v1 — bash GitHub Action resolves addressed review threads via GraphQL `resolveReviewThread`. Triggered by `workflow_run` after responder completes. Tested on PR #113: resolved 2 threads in 3 seconds. +- PR #142: Pipeline orchestrator v2 — auto-rebase. Detects PRs behind main via `mergeStateStatus: BEHIND`, rebases and force-pushes. Tested on PR #113: rebased and auto-merge fired in 7 seconds. +- PR #143: Fix for git fetch not creating remote tracking refs during rebase. +- PR #113: First end-to-end orchestrator success — rebased a stuck PR, CI passed, auto-merge fired. +- Issues closed: #89 (quality gate close), #88 (gh-aw outdated), #117 (thread resolution), #66 (code quality cleanups via PR #113). +- **Key insight**: Bash orchestrator in 7 seconds vs gh-aw agent in 7-10 minutes. Same logic, 60x faster. + +### 2026-03-17/18 — Orchestrator v3 attempt, responder investigation, revert + +- PR #144: Merged orchestrator v3 (issue dispatch, cron, review loop) + docs + daily test-analysis. Not adequately tested before merge. +- PR #147: Attempted to fix responder by reverting to working version. Merged untested. +- PR #150: Panic-disabled all orchestrator triggers to stop loops. Disabled v1/v2 triggers that were working. +- All three reverted after discovering cascading issues. +- **Responder investigation findings**: + - The responder agent CAN read review threads and fix code inside the sandbox (confirmed by examining agent job logs). + - The safe output handlers (`reply-to-pull-request-review-comment`, `push-to-pull-request-branch`) default to `target: "triggering"` which requires `pull_request_review` event context. With `workflow_dispatch`, no PR context exists and safe outputs fail. + - Setting `target: "*"` in safe output config lets the agent specify the PR number in each message, enabling `workflow_dispatch`. + - The `pull_request_review` trigger fires on ANY review submission (Copilot, quality gate, humans) — not just Copilot reviews. Combined with `roles: all` workaround, this caused infinite responder loops. + - Successfully tested fix on PR #152: responder read thread via REST API, fixed code (renamed variable), replied to comment, pushed commit, orchestrator (v1) auto-resolved the thread. +- **Copilot CLI accountability**: Multiple failures caused by CLI agent: stating the responder had worked when it never had, pushing to implementer PR branches, merging without permission, creating branches and PRs without approval, not verifying changes before claiming they worked. +- **Key lessons**: + - Never merge untested workflow changes to main — always test from branch first. + - Safe output `target` config is critical: `"triggering"` for event triggers, `"*"` for `workflow_dispatch`. + - Don't over-specify agent instructions — the simple original version worked; adding explicit API calls and ordering constraints broke it. + - `workflow_dispatch` is the right trigger for the responder — the orchestrator decides when to run it, eliminating trigger loops. + - Always verify claims by reading actual data (run logs, thread state) before proceeding. + diff --git a/docs/changelog.md b/docs/changelog.md index ec00f80..5a138dc 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -4,6 +4,30 @@ Append-only history of repo-level changes (CI, infra, shared config). Tool-speci --- +## revert: undo orchestrator v3, responder changes, and trigger disable — 2026-03-18 + +**Problem**: Three PRs were merged to main without adequate testing, creating cascading failures: +- PR #144 (orchestrator v3 + docs + daily test-analysis): Added cron trigger, issue dispatch, and review loop management — none tested end-to-end before merge. +- PR #147 (responder revert): Attempted to restore responder to working version but was merged untested. +- PR #150 (disable triggers): Panic fix to stop orchestrator loops, but disabled v1/v2 triggers that were working fine. + +The responder was never able to address review threads in production. Investigation revealed: +1. The responder agent CAN read threads and fix code inside the sandbox. +2. The safe output handlers (`reply-to-pull-request-review-comment`, `push-to-pull-request-branch`) fail outside `pull_request_review` context because they default to `target: "triggering"` which requires PR event data. +3. The `pull_request_review` trigger caused infinite loops — every review from any actor (Copilot, quality gate, humans) fired the responder. + +**Fix**: Reverted all three PRs to restore main to post-v2 state (orchestrator v1 thread resolution + v2 auto-rebase, both tested and proven). Responder fix being developed on `fix/responder-v2` branch with controlled testing. + +**Key discovery**: Setting `target: "*"` on safe outputs + switching to `workflow_dispatch` trigger allows the responder to work from any context. Successfully tested on PR #152 — responder read thread, fixed code, replied, pushed, and orchestrator resolved the thread automatically. + +**Lessons**: +- Never merge untested workflow changes to main. +- The safe output `target` config is critical — `"triggering"` only works with event-based triggers, `"*"` works with `workflow_dispatch`. +- Don't over-specify agent instructions — the original simple responder worked; adding GraphQL queries and ordering constraints broke it. +- Copilot CLI churn (lying about what worked, pushing to wrong branches, merging without permission) was the root cause of most failures. + +--- + ## chore: remove pipeline orchestrator agent — 2026-03-17 **Problem**: The gh-aw orchestrator agent (PR #130) took 7-10 minutes per run for deterministic if/else logic. Over 22+ overnight runs it failed to resolve a single thread — auth failures, wrong action ordering, redundant review requests. Burned significant Opus inference tokens with no results. From b0c9c9795a58b9d658d0ff9446805884435c6e27 Mon Sep 17 00:00:00 2001 From: Sasa Junuzovic Date: Wed, 18 Mar 2026 23:16:03 -0700 Subject: [PATCH 2/2] fix: correct safe output name in agent inventory table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/agentic-workflows.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/agentic-workflows.md b/docs/agentic-workflows.md index 99836f3..2a7a3a6 100644 --- a/docs/agentic-workflows.md +++ b/docs/agentic-workflows.md @@ -534,7 +534,7 @@ gh run view --log-failed # View failed job logs | `code-health.md` | schedule (daily) / manual | Find refactoring/cleanup opportunities | `create-issue` (max 2), `dispatch-workflow` (implementer) | | `issue-implementer.md` | `workflow_dispatch` (issue number) | Implement fix from issue spec, open PR | `create-pull-request` (draft: false, auto-merge), `push-to-pull-request-branch` | | `ci-fixer.md` | `workflow_dispatch` (PR number) | Fix CI failures on agent PRs | `push-to-pull-request-branch`, `add-labels`, `add-comment` | -| `review-responder.md` | `pull_request_review` (moving to `workflow_dispatch`) | Address review comments | `push-to-pull-request-branch`, `reply-to-review-comment`, `add-labels` | +| `review-responder.md` | `pull_request_review` (moving to `workflow_dispatch`) | Address review comments | `push-to-pull-request-branch`, `reply-to-pull-request-review-comment`, `add-labels` | | `quality-gate.md` | `pull_request_review` | Evaluate quality + blast radius, approve or close | `submit-pull-request-review`, `close-pull-request`, `add-comment`, `add-labels` | | `pipeline-orchestrator.yml` | `workflow_run` / `push` / `workflow_dispatch` | Resolve threads, rebase PRs | N/A (bash, not gh-aw) |