Skip to content

feat: make agent-fix pipeline fully autonomous (end-to-end)#758

Merged
PureWeen merged 1 commit intomainfrom
fix/agent-fix-end-to-end
Apr 23, 2026
Merged

feat: make agent-fix pipeline fully autonomous (end-to-end)#758
PureWeen merged 1 commit intomainfrom
fix/agent-fix-end-to-end

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Summary

Closes the last gaps in the agent-fix pipeline so it runs fully end-to-end without human intervention:

label issue → agent implements fix → self-review → non-draft PR → integration tests + expert review dispatched → auto-fix if tests fail

Changes

  1. draft: false — PRs are now created as non-draft, so review-on-open.agent can trigger on ready_for_review
  2. Dispatch review.agent directly — bypasses the action_required approval gate that blocks pull_request-triggered workflows for bot-created PRs
  3. Increase dispatch max from 2 → 3 (verify-build + integration + review)
  4. Updated Step 8 — instructions now include dispatching the expert review with pr_number
  5. Recompiled lock filegh aw compile auto-detected review.agent as a gh-aw workflow and added .lock.yml extension mapping

Pipeline Flow (after this PR)

Issue labeled 'agent-fix'
  → Agent reads issue, explores code, implements fix
  → 3-model self-review (Opus 4.6, Sonnet 4.6, GPT-5.3-Codex)
  → Non-draft PR created
  → Dispatches: verify-build + polypilot-integration + review.agent
  → If tests fail: auto-fix-on-failure dispatches /fix
  → Result: fully reviewed, tested PR ready for merge

Context

This is the final piece of the pipeline work started across PRs #755 (integration test fixes), #756 (auto-fix-on-failure), and #757 (run-name embedding for PR discovery). Those PRs fixed bugs that prevented the existing pipeline from completing.

- Change draft: true → draft: false so PRs are created as non-draft
- Add review.agent to dispatch-workflow list (dispatches expert code
  review directly, bypassing the action_required approval gate that
  blocks pull_request-triggered workflows for bot-created PRs)
- Increase dispatch max from 2 to 3 (verify-build + integration + review)
- Update Step 8 instructions to dispatch review.agent with pr_number
- Recompile lock file via gh aw compile

This completes the end-to-end pipeline: label issue → agent implements
fix → self-review → PR created (non-draft) → integration tests +
expert review dispatched → auto-fix-on-failure if tests fail.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit 393b73c into main Apr 23, 2026
1 check passed
@PureWeen PureWeen deleted the fix/agent-fix-end-to-end branch April 23, 2026 19:33
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Reviewed the full workflow source, lock file, review-on-open.agent, review.agent, and auto-fix-on-failure.yml.

✅ No issues found

  • Lock file consistency: Both config.json and GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG correctly reflect draft: false, max: 3, and the added review.agent workflow. The review_agent tool schema, aw_context_workflows, and workflow_files entries are all correct.
  • No infinite loop risk: review.agent does NOT have dispatch_workflow capability — it can only post review comments. It cannot re-trigger itself or agent-fix.
  • No race conditions: The 3 dispatched workflows (verify-build, polypilot-integration, review.agent) are fully independent. The review workflows share a concurrency group with cancel-in-progress: false, so concurrent runs serialize safely rather than conflicting.
  • Auto-fix loop protection: auto-fix-on-failure.yml only triggers on verify-build and polypilot-integration failures, not review workflow completions, so the review dispatch doesn't create a new failure loop vector.

Findings (2)

# Sev Issue
1 🟡 Duplicate reviews: draft: false causes review-on-open.agent to trigger on PR creation (opened event), AND Step 8 explicitly dispatches review.agent. Both share concurrency group review-{PR} with cancel-in-progress: false, so they serialize — producing two full review runs on the same diff.
2 🟡 pr_number type mismatch: Prompt example shows string "<PR number>" but the tool schema declares type: number. Agent may send a string value that fails schema validation.

Generated by Expert Code Review (auto) for issue #758 · ● 9.5M

create-pull-request:
auto-merge: false
draft: true
draft: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Duplicate review runs

With draft: false, the PR is created as non-draft immediately. This fires the pull_request: [opened] event, which triggers review-on-open.agent (it matches opened + draft == false). Then in Step 8, agent-fix also explicitly dispatches review.agent via workflow_dispatch.

Both workflows share the concurrency group review-$\{\{ PR_NUMBER }} with cancel-in-progress: false, so they serialize rather than deduplicate — the second review queues behind the first and runs in full. This doubles compute cost (~90 min × 2) and produces two separate review comment threads on the same PR for the same diff.

Mitigation options (pick one):

  1. Keep draft: true and rely solely on the explicit review.agent dispatch (original approach minus the approval-gate problem).
  2. Keep draft: false but remove the explicit dispatch_workflow for review.agent in Step 8, relying on review-on-open.agent to auto-trigger.
  3. Add a conditional label or environment flag so review-on-open.agent skips PRs created by agent-fix (e.g., if: github.actor != 'copilot-agentic-workflow[bot]').

"pr_number": "<PR number>"
}
})
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — pr_number type mismatch between prompt example and tool schema

The prompt example passes pr_number as a string ("<PR number>"), but the compiled tool schema in the lock file defines pr_number with "type": "number". If the agent sends a string (e.g. "42" instead of 42), schema validation may reject the dispatch call at runtime.

The review.agent.md frontmatter also declares pr_number as type: number, so the lock file schema is correct. However, the prompt example should make it clear this must be an integer, not a quoted string.

Suggested fix: Change the example to use an unquoted placeholder:

"pr_number": <PR number>

or add a note: "Pass as an integer, not a string."

create-pull-request:
auto-merge: false
draft: true
draft: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Duplicate reviews on bot-created PRs (Flagged by: 3/3 reviewers)

With draft: false, a non-draft PR created by the agent will immediately trigger review-on-open.agent (via pull_request: opened / ready_for_review). Then Step 8 also explicitly dispatches review.agent via workflow_dispatch.

Both workflows share the concurrency group review-<PR#> with cancel-in-progress: false, so they serialize rather than cancel — both run to completion, posting independent review comment sets on the same PR and doubling compute cost.

Concrete scenario: Agent-fix creates PR #100 (non-draft) → review-on-open.agent triggers on opened → Step 8 dispatches review.agent → two full expert reviews post on the same PR.

Suggested fix: Add a bot-user guard to review-on-open.agent (e.g., && github.event.pull_request.user.type != 'Bot') so it skips bot-created PRs where the explicit dispatch is the intended path. Alternatively, remove the explicit dispatch from Step 8 and rely solely on review-on-open.agent.

Comment on lines +197 to +198

dispatch_workflow({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Review dispatched concurrently with build; may review stale code (Flagged by: 1/3 reviewers initially, confirmed 3/3 after follow-up)

All three dispatches (verify-build, polypilot-integration, review.agent) fire simultaneously. If verify-build fails, auto-fix-on-failure dispatches fix.agent which pushes new commits — but the review has already started (or completed) on the pre-fix code. The resulting review comments will refer to code that no longer exists.

This is correctly a minor concern: the happy path (build passes) is unaffected, and adding sequencing would increase latency for every pipeline run. But it may be worth documenting as a known trade-off with a comment here.

dispatch_workflow({
"workflow": "review.agent",
"inputs": {
"pr_number": "<PR number>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — pr_number placeholder will be interpreted as a string (Flagged by: 3/3 reviewers)

The placeholder "<PR number>" is enclosed in double-quotes, making it look like a JSON string. The compiled lock file's schema for review_agent declares "pr_number": { "type": "number" }.

An LLM following this template will likely substitute as "pr_number": "42" (string) instead of "pr_number": 42 (number), which may fail strict JSON Schema validation in the safe-outputs handler.

Suggested fix: Remove the quotes to signal numeric intent:

"pr_number": <PR number as integer, e.g. 42>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant