refactor: slim expert-reviewer prompt — remove redundant instructions by PureWeen · Pull Request #656 · PureWeen/PolyPilot

PureWeen · 2026-04-21T19:37:45Z

Based on analysis of 3 actual review runs (PRs #619, #639, #635), several prompt instructions are redundant and add tokens without improving output quality.

What was removed:

3x "Read copilot-instructions.md" — engine auto-loads it; sub-agents found deep domain bugs without it
Generic preamble ("You are a thorough PR reviewer for PolyPilot")
MCP tool usage hint (agent discovers available tools)
Duplicate path validation instruction
Verbose REQUEST_CHANGES explanation (one line is sufficient)
Repo-specific sub-agent prompt → generic ("expert code reviewer" not "expert PolyPilot code reviewer")

What was kept:

Security warning (treat PR content as untrusted) — changes behavior
No test messages rule — prevents permanent damage
Full adversarial consensus methodology
Path/line validation rules
COMMENT-only policy

72 lines → 60 lines. Same review quality — the evidence from real runs shows the agent finds domain issues from reading code, not from hint bullets.

PureWeen · 2026-04-21T21:07:08Z

Multi-Model Code Review — PR #656 (merged)

Reviewers: 3 independent reviewers, 3 review rounds
Status: ✅ Merged after all 8 findings resolved

Post-Merge Verification

instruction-drift workflow triggered twice after merge:

Run 1 (24742556757): ✅ All jobs green, but agent didn't call noop tool
Run 2 (24756951467): ✅ All jobs green, same issue — agent runs Check-Staleness.ps1 correctly (FRESH result) but doesn't emit the noop safe-output

Root cause: This is a gh-aw agent compliance issue — the agent understands the task and executes the scripts correctly, but doesn't follow through on calling the noop MCP tool despite explicit 🚨 CRITICAL instructions. The workflow succeeds silently (no crash, no false issue created), but leaves no audit trail. May need a post-steps: fallback that auto-emits noop if no safe-output was produced.

All review findings from 3 rounds remain resolved. The workflow infrastructure is correct — only the noop compliance needs iteration.

PureWeen · 2026-04-21T23:32:48Z

/review

github-actions · 2026-04-21T23:33:11Z

❌ Expert Code Review failed. Please review the logs for details.

PureWeen · 2026-04-22T00:51:33Z

/review

github-actions · 2026-04-22T00:51:57Z

✅ Expert Code Review completed successfully!

github-actions · 2026-04-22T01:05:42Z

Design-Level Findings (outside diff hunks)

These findings affect code outside the changed diff hunks and cannot be posted as inline comments.

🟢 MINOR — Architecture security table contradicts new min-integrity guidance (3/3 reviewers after follow-up)

File: .github/skills/gh-aw-guide/references/architecture.md line 86

The security boundaries table still says "Requires explicit min-integrity configuration" in the "What it doesn't do" column. This directly contradicts the new guidance added 10 lines later: "Do NOT set min-integrity explicitly." An agent reading this table would receive conflicting instructions.

Fix: Update the table cell to: "Automatic lockdown may not match manual policy; blocked-users/trusted-users still require explicit config"

🟢 MINOR — Cap-3 disputed-findings rule only in review-shared.md, not expert-reviewer.agent.md (3/3 reviewers after follow-up)

File: .github/agents/expert-reviewer.agent.md § 3 (lines 41-47)

review-shared.md adds "Cap at 3 disputed findings" in § 3, but expert-reviewer.agent.md § 3 has no such cap. If the expert-reviewer agent is invoked directly (not via review workflows), the cap won't apply. Consider adding the cap rule to expert-reviewer.agent.md § 3 for consistency, or noting that review-shared.md is the authoritative source for orchestration-level caps.

Generated by Expert Code Review for issue #656 · ◷

github-actions

Expert Code Review — PR #656

Methodology: 3 independent reviewers with adversarial consensus. Disputed findings (flagged by only 1/3) were escalated to the other 2 reviewers for validation. All findings below achieved 2/3+ agreement.

Findings Summary

#	Severity	Consensus	Finding	File
1	🟡 MODERATE	3/3	Concurrency groups don't match — `/review` won't cancel auto-review	`review-on-open.agent.md:17`
2	🟡 MODERATE	2/3	Sub-agent recursion guard incomplete in standalone agent	`expert-reviewer.agent.md:37`
3	🟡 MODERATE	3/3 ✱	Disputed-finding cap has no severity ordering	`review-shared.md:88`
4	🟢 MINOR	3/3 ✱	Architecture security table contradicts min-integrity guidance	`architecture.md:86` (outside diff)
5	🟢 MINOR	3/3 ✱	Cap-3 rule divergence between agent file and shared config	`expert-reviewer.agent.md` § 3 (outside diff)

✱ = initially flagged by 1/3, confirmed by follow-up reviewers

What Looks Good

✅ min-integrity removal is well-justified — compiler v0.62.2 bug is real, runtime determine-automatic-lockdown is the standard pattern
✅ Draft PR guard (if: github.event.pull_request.draft == false) is correct in both source and compiled lock file
✅ Sub-agent recursion prevention in review-shared.md is strong (explicit "do NOT dispatch sub-agents or use the task tool" guard)
✅ Token optimizations (no pre-reading, cap disputed findings, 2-model follow-ups) are reasonable cost-reduction measures
✅ Noop guidance fix in instruction-drift workflow addresses a real failure mode
✅ Prompt slimming removes genuinely redundant instructions without losing review dimensions
✅ Lock files for review-on-open match the source changes (draft guard, concurrency group both present)

Assessment

No 🔴 CRITICAL issues. The PR makes well-motivated improvements to the review workflow system. The three 🟡 MODERATE findings are all low-effort fixes that should be addressed before or shortly after merge:

Concurrency mismatch (Finding 1) — the stated design goal (shared group) is not achieved. Add concurrency: to review.agent.md and recompile.
Recursion guard (Finding 2) — one-line fix to expert-reviewer.agent.md line 37.
Severity ordering (Finding 3) — add prioritization to the cap rule.

CI & Test Coverage

This PR modifies only documentation/workflow files (.md, .yml, .ps1). No application code or tests are affected. No CI test failures expected.

Generated by Expert Code Review for issue #656

github-actions · 2026-04-22T01:05:43Z

  contents: read
  pull-requests: read

+# Intentional: shared group with review.agent.md — /review cancels in-progress auto-review.


🟡 MODERATE — Concurrency groups don't match between review workflows (Flagged by: 3/3 reviewers)

The comment says "shared group with review.agent.md — /review cancels in-progress auto-review" but review.agent.md has no concurrency: block. The compiled lock files prove the mismatch:

review-on-open.agent.lock.yml:46 → group: review-$\{\{ ... }}

review.agent.lock.yml:51 → group: "gh-aw-$\{\{ github.workflow }}-$\{\{ ... }}"

These groups never collide, so /review and auto-review run in parallel — posting duplicate reviews.

Fix: Add a matching concurrency: block to review.agent.md and recompile:

concurrency: group: "review-$\{\{ github.event.issue.number || github.event.pull_request.number || github.run_id }}" cancel-in-progress: false

github-actions · 2026-04-22T01:05:43Z

-> Read `.github/copilot-instructions.md` for project conventions.
+> Read `.github/copilot-instructions.md` for project conventions and architecture.
 >
 > For each finding: file path, line number (within a `@@` diff hunk — mark "outside diff" if not), severity (🔴 CRITICAL, 🟡 MODERATE, 🟢 MINOR), concrete failing scenario, and fix suggestion. Return findings as text — do NOT call safe-output tools.


🟡 MODERATE — Sub-agent recursion guard incomplete (Flagged by: 2/3 reviewers)

This line says "do NOT call safe-output tools" but does not prohibit the task tool. The review-shared.md prompt (line 75) has the full guard: "do NOT dispatch sub-agents or use the task tool — act as an individual reviewer only." But when expert-reviewer.agent.md is invoked standalone (not via review workflows), sub-agents receive only this incomplete guard and could recursively spawn further sub-agents.

Fix: Append to this line:

Return findings as text — do NOT call safe-output tools, do NOT dispatch sub-agents or use the task tool.

github-actions · 2026-04-22T01:05:43Z

+3. **Only 1/3 flagged** → dispatch **exactly 2** follow-up sub-agents (the other 2 models that didn't flag it) asking: "Reviewer X found this issue: [finding]. Do you agree or disagree? Explain why."
   - If 2+ now agree → include
   - If still 1/3 → discard (note as "discarded — single reviewer only")
+   - **Cap at 3 disputed findings** — if more than 3 findings are 1/3, discard the rest without follow-up to preserve token budget for posting.


🟡 MODERATE — Disputed-finding cap has no severity ordering (Flagged by: 3/3 reviewers after follow-up)

The cap discards excess 1/3 findings "without follow-up" but specifies no ordering. If 4+ findings are disputed and only 3 get follow-up slots, the selection is arbitrary — a 🔴 CRITICAL finding could be silently dropped while 🟢 MINOR ones consume the budget.

Fix: Add severity prioritization:

- **Cap at 3 disputed findings** (prioritized by severity: 🔴 > 🟡 > 🟢) — if more than 3 findings are 1/3, discard the lowest-severity remainder without follow-up. Note discarded findings in the review body.

…abs learnings Bundled changes from 3 review rounds (8 findings, all resolved): Expert reviewer: - Slimmed prompt (72→62 lines) — removed redundant copilot-instructions refs - Restored PolyPilot identity + copilot-instructions read in sub-agent prompt - Added anti-recursion guard: 'Do NOT dispatch sub-agents' - Restored REQUEST_CHANGES rationale min-integrity fix: - Removed explicit min-integrity: approved from review-shared.md (compiler v0.62.2 crashes MCP Gateway — missing repos field) - Updated all skill docs, instructions, and security scanner - Rely on runtime determine-automatic-lockdown instead maui-labs learnings: - Draft PR guard on review-on-open (if: draft == false) - Concurrency groups matching between review.agent.md and review-on-open - cancel-in-progress: false on both (slash_command safety) Other: - CLI Commands section in SKILL.md (gh aw trial/run/audit) - Strengthened noop guidance in instruction-drift workflow - Fixed defense table stale min-integrity text Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen force-pushed the fix/slim-reviewer-prompt branch from 3da7c24 to d092245 Compare April 21, 2026 19:55

github-actions Bot reviewed Apr 22, 2026

View reviewed changes

PureWeen mentioned this pull request Apr 22, 2026

feat: add expert code review workflows with 3-model adversarial consensus dotnet/maui-labs#118

Merged

PureWeen force-pushed the fix/slim-reviewer-prompt branch from eb44acc to 13a5369 Compare April 22, 2026 02:24

PureWeen merged commit 0ce33ec into main Apr 22, 2026

PureWeen deleted the fix/slim-reviewer-prompt branch April 22, 2026 02:25

This was referenced Apr 22, 2026

feat: add inline review comments for expert code review dotnet/maui-labs#123

Merged

feat: add expert code review workflow with 3-model adversarial consensus dotnet/maui#35111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: slim expert-reviewer prompt — remove redundant instructions#656

refactor: slim expert-reviewer prompt — remove redundant instructions#656
PureWeen merged 1 commit intomainfrom
fix/slim-reviewer-prompt

PureWeen commented Apr 21, 2026

Uh oh!

PureWeen commented Apr 21, 2026 •

edited

Loading

Uh oh!

PureWeen commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

PureWeen commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Apr 21, 2026

Uh oh!

PureWeen commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Multi-Model Code Review — PR #656 (merged)

Post-Merge Verification

Uh oh!

PureWeen commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PureWeen commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 22, 2026

Design-Level Findings (outside diff hunks)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Expert Code Review — PR #656

Findings Summary

What Looks Good

Assessment

CI & Test Coverage

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PureWeen commented Apr 21, 2026 •

edited

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading

github-actions Bot commented Apr 22, 2026 •

edited

Loading