feat: make agent-fix pipeline fully autonomous (end-to-end)#758
feat: make agent-fix pipeline fully autonomous (end-to-end)#758
Conversation
- Change draft: true → draft: false so PRs are created as non-draft - Add review.agent to dispatch-workflow list (dispatches expert code review directly, bypassing the action_required approval gate that blocks pull_request-triggered workflows for bot-created PRs) - Increase dispatch max from 2 to 3 (verify-build + integration + review) - Update Step 8 instructions to dispatch review.agent with pr_number - Recompile lock file via gh aw compile This completes the end-to-end pipeline: label issue → agent implements fix → self-review → PR created (non-draft) → integration tests + expert review dispatched → auto-fix-on-failure if tests fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Review Summary
Reviewed the full workflow source, lock file, review-on-open.agent, review.agent, and auto-fix-on-failure.yml.
✅ No issues found
- Lock file consistency: Both
config.jsonandGH_AW_SAFE_OUTPUTS_HANDLER_CONFIGcorrectly reflectdraft: false,max: 3, and the addedreview.agentworkflow. Thereview_agenttool schema,aw_context_workflows, andworkflow_filesentries are all correct. - No infinite loop risk:
review.agentdoes NOT havedispatch_workflowcapability — it can only post review comments. It cannot re-trigger itself or agent-fix. - No race conditions: The 3 dispatched workflows (
verify-build,polypilot-integration,review.agent) are fully independent. The review workflows share a concurrency group withcancel-in-progress: false, so concurrent runs serialize safely rather than conflicting. - Auto-fix loop protection:
auto-fix-on-failure.ymlonly triggers onverify-buildandpolypilot-integrationfailures, not review workflow completions, so the review dispatch doesn't create a new failure loop vector.
Findings (2)
| # | Sev | Issue |
|---|---|---|
| 1 | 🟡 | Duplicate reviews: draft: false causes review-on-open.agent to trigger on PR creation (opened event), AND Step 8 explicitly dispatches review.agent. Both share concurrency group review-{PR} with cancel-in-progress: false, so they serialize — producing two full review runs on the same diff. |
| 2 | 🟡 | pr_number type mismatch: Prompt example shows string "<PR number>" but the tool schema declares type: number. Agent may send a string value that fails schema validation. |
Generated by Expert Code Review (auto) for issue #758 · ● 9.5M
| create-pull-request: | ||
| auto-merge: false | ||
| draft: true | ||
| draft: false |
There was a problem hiding this comment.
🟡 MODERATE — Duplicate review runs
With draft: false, the PR is created as non-draft immediately. This fires the pull_request: [opened] event, which triggers review-on-open.agent (it matches opened + draft == false). Then in Step 8, agent-fix also explicitly dispatches review.agent via workflow_dispatch.
Both workflows share the concurrency group review-$\{\{ PR_NUMBER }} with cancel-in-progress: false, so they serialize rather than deduplicate — the second review queues behind the first and runs in full. This doubles compute cost (~90 min × 2) and produces two separate review comment threads on the same PR for the same diff.
Mitigation options (pick one):
- Keep
draft: trueand rely solely on the explicitreview.agentdispatch (original approach minus the approval-gate problem). - Keep
draft: falsebut remove the explicitdispatch_workflowforreview.agentin Step 8, relying onreview-on-open.agentto auto-trigger. - Add a conditional label or environment flag so
review-on-open.agentskips PRs created byagent-fix(e.g.,if: github.actor != 'copilot-agentic-workflow[bot]').
| "pr_number": "<PR number>" | ||
| } | ||
| }) | ||
| ``` |
There was a problem hiding this comment.
🟡 MODERATE — pr_number type mismatch between prompt example and tool schema
The prompt example passes pr_number as a string ("<PR number>"), but the compiled tool schema in the lock file defines pr_number with "type": "number". If the agent sends a string (e.g. "42" instead of 42), schema validation may reject the dispatch call at runtime.
The review.agent.md frontmatter also declares pr_number as type: number, so the lock file schema is correct. However, the prompt example should make it clear this must be an integer, not a quoted string.
Suggested fix: Change the example to use an unquoted placeholder:
"pr_number": <PR number>
or add a note: "Pass as an integer, not a string."
| create-pull-request: | ||
| auto-merge: false | ||
| draft: true | ||
| draft: false |
There was a problem hiding this comment.
🟡 MODERATE — Duplicate reviews on bot-created PRs (Flagged by: 3/3 reviewers)
With draft: false, a non-draft PR created by the agent will immediately trigger review-on-open.agent (via pull_request: opened / ready_for_review). Then Step 8 also explicitly dispatches review.agent via workflow_dispatch.
Both workflows share the concurrency group review-<PR#> with cancel-in-progress: false, so they serialize rather than cancel — both run to completion, posting independent review comment sets on the same PR and doubling compute cost.
Concrete scenario: Agent-fix creates PR #100 (non-draft) → review-on-open.agent triggers on opened → Step 8 dispatches review.agent → two full expert reviews post on the same PR.
Suggested fix: Add a bot-user guard to review-on-open.agent (e.g., && github.event.pull_request.user.type != 'Bot') so it skips bot-created PRs where the explicit dispatch is the intended path. Alternatively, remove the explicit dispatch from Step 8 and rely solely on review-on-open.agent.
|
|
||
| dispatch_workflow({ |
There was a problem hiding this comment.
🟢 MINOR — Review dispatched concurrently with build; may review stale code (Flagged by: 1/3 reviewers initially, confirmed 3/3 after follow-up)
All three dispatches (verify-build, polypilot-integration, review.agent) fire simultaneously. If verify-build fails, auto-fix-on-failure dispatches fix.agent which pushes new commits — but the review has already started (or completed) on the pre-fix code. The resulting review comments will refer to code that no longer exists.
This is correctly a minor concern: the happy path (build passes) is unaffected, and adding sequencing would increase latency for every pipeline run. But it may be worth documenting as a known trade-off with a comment here.
| dispatch_workflow({ | ||
| "workflow": "review.agent", | ||
| "inputs": { | ||
| "pr_number": "<PR number>" |
There was a problem hiding this comment.
🟡 MODERATE — pr_number placeholder will be interpreted as a string (Flagged by: 3/3 reviewers)
The placeholder "<PR number>" is enclosed in double-quotes, making it look like a JSON string. The compiled lock file's schema for review_agent declares "pr_number": { "type": "number" }.
An LLM following this template will likely substitute as "pr_number": "42" (string) instead of "pr_number": 42 (number), which may fail strict JSON Schema validation in the safe-outputs handler.
Suggested fix: Remove the quotes to signal numeric intent:
"pr_number": <PR number as integer, e.g. 42>
Summary
Closes the last gaps in the agent-fix pipeline so it runs fully end-to-end without human intervention:
label issue → agent implements fix → self-review → non-draft PR → integration tests + expert review dispatched → auto-fix if tests fail
Changes
draft: false— PRs are now created as non-draft, soreview-on-open.agentcan trigger onready_for_reviewreview.agentdirectly — bypasses theaction_requiredapproval gate that blockspull_request-triggered workflows for bot-created PRspr_numbergh aw compileauto-detectedreview.agentas a gh-aw workflow and added.lock.ymlextension mappingPipeline Flow (after this PR)
Context
This is the final piece of the pipeline work started across PRs #755 (integration test fixes), #756 (auto-fix-on-failure), and #757 (run-name embedding for PR discovery). Those PRs fixed bugs that prevented the existing pipeline from completing.