Add PR review agentic workflow with expert-reviewer agent by PureWeen · Pull Request #593 · PureWeen/PolyPilot

PureWeen · 2026-04-16T15:14:14Z

Summary

Adds a /review slash command agentic workflow modeled after dotnet/msbuild's approach, adapted for PolyPilot with domain-specific review dimensions.

Files Added

File	Purpose
`.github/workflows/review.agent.md`	Slash command trigger (`/review` on PR comments)
`.github/workflows/shared/review-shared.md`	Shared config — permissions, tools, safe-outputs
`.github/agents/expert-reviewer.agent.md`	Expert reviewer with 12 PolyPilot-specific review dimensions

Review Dimensions

IsProcessing State Safety (BLOCKING) — 9 companion fields, FlushCurrentResponse, diagnostic tags
SDK Event Handling (BLOCKING) — IDLE-DEFER, TurnEnd flush, error event cleanup
Thread Safety & Concurrency (BLOCKING) — InvokeOnUI, ConcurrentDictionary vs List
Multi-Agent Orchestration (MAJOR) — 5-phase lifecycle, PendingOrchestration, worker failures
Session Persistence & Data Safety (MAJOR) — merge-based saves, atomic writes, test isolation
Bridge Protocol & Remote Mode (MAJOR) — IsRemoteMode guards, optimistic adds
MAUI / Blazor Hybrid Platform Safety (MAJOR) — no static readonly platform APIs, lazy properties
Test Coverage & Quality (MAJOR) — SetBaseDirForTesting, Demo mode, Compile Include
Performance & Render Pipeline (MODERATE) — debounce flush, cache invalidation
Watchdog & Timeout Logic (MAJOR) — 3-tier timeouts, generation guards
Connection & Server Management (MODERATE) — ReconnectAsync, PID files
Code Quality & Conventions (NIT) — lazy properties, CSS conventions

Review Workflow

Uses msbuild's 4-wave pattern:

Find — parallel sub-agents per dimension
Validate — prove/disprove each finding with code traces and proof-of-concept tests
Post — inline review comments on specific diff lines
Summary — verdict table with COMMENT or REQUEST_CHANGES

Multi-model consensus (≥2/3) for borderline findings using claude-opus-4.6, claude-sonnet-4.6, and gpt-5.3-codex.

Adds a /review slash command workflow modeled after dotnet/msbuild's approach, adapted for PolyPilot with domain-specific review dimensions: - .github/workflows/review.agent.md: Slash command trigger - .github/workflows/shared/review-shared.md: Shared config (permissions, tools, safe-outputs) - .github/agents/expert-reviewer.agent.md: Expert reviewer with 12 PolyPilot-specific review dimensions, multi-model consensus validation, and 4-wave review workflow (Find → Validate → Post → Summary) Review dimensions cover: IsProcessing safety, SDK event handling, thread safety, multi-agent orchestration, session persistence, bridge protocol, MAUI/Blazor platform safety, test coverage, performance, watchdog logic, connection management, and code conventions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-16T17:09:33Z

🔍 Multi-Model Code Review — PR #593

PR: Add PR review agentic workflow with expert-reviewer agent
CI Status: ⚠️ No CI checks reported on this branch
Reviewers: 3 independent reviewers | 7 review rounds across 12 commits

Review History

Round	Commits	Result	Key Changes
1	#1	⚠️ Request Changes	2 criticals (permissions, field list), 5 moderates, 4 minors
2	#2–3	✅ Approve	10/11 fixed
3	#4	✅ Approve	13/14 fixed
4	#5	✅ Approve	Security paragraph, IsResumed timeout, ~22 fields refs
5	#6–8	✅ Approve	Dim 10 CHECK fixed, triggers refined, label removal
6	#9	✅ Approve	Label guard + detection
7	#10–12	✅ Approve	Safe-output keys standardized — see below

🔄 Re-Review Round 7 (2026-04-16T23:13Z) — Commits 10–12

Commits:

72d94d0 — Standardize safe-output keys to underscores
703bcbc — Use model family names instead of versioned identifiers
56d2473 — Revert to explicit model versions

Net change: Only the safe-output key rename survives (commits 11+12 cancel out).

Previous Finding — Final Status

#	Finding	Status
#3	🟢 Safe-output naming mismatch (hyphens vs underscores)	✅ FIXED — All YAML keys now use underscores, matching tool names in agent text

Verified: create_pull_request_review_comment, submit_pull_request_review, add_comment, remove_label — consistent across both review-shared.md and expert-reviewer.agent.md.

Complete Finding Tracker — All 7 Rounds, 17 Findings

#	Finding	Final Status
1	🔴 `pull-requests: write` missing	✅ Fixed (R2)
2	�� Companion fields list wrong	✅ Fixed (R2)
3	🟢 Safe-output naming mismatch	✅ Fixed (R7)
4	🟡 Watchdog timeouts inaccurate	✅ Fixed (R2)
5	🟡 HasUsedToolsThisTurn wrong	✅ Fixed (R2)
6	🟡 Organization.Sessions wrong	✅ Fixed (R2)
7	🟡 Cost explosion	✅ Fixed (R2)
8	🟡 60-min timeout	✅ Fixed (R2)
9	🟢 Redundant permissions	✅ Fixed (R2)
10	🟢 Fallback logic hole	✅ Fixed (R2)
11	🟢 Skill references	✅ Fixed (R2)
R2-1	🟡 Auto-trigger cost	✅ Fixed (R3–R5)
R2-2	🟢 LastUpdatedAt omitted	✅ Fixed (R3)
R2-3	🟢 Skill ref qualifier	✅ Fixed (R3)
R4-1	🟢 Dim 10 CHECK contradiction	✅ Fixed (R5)
R5-1	🟡 `labeled` fires for all labels	✅ Fixed (R6)
R5-2	🟢 Label detection underspecified	✅ Fixed (R6)

✅ 17/17 findings resolved.

Recommended Action

✅ Approve

All 17 findings across 7 review rounds have been resolved. The PR delivers a comprehensive, well-designed review workflow:

✅ Accurate review dimensions (verified against codebase across 5 rounds)
✅ Proper permissions (pull-requests: write, issues: write)
✅ Consistent naming (safe-output keys match tool identifiers)
✅ Cost controls (Wave 0 triage, Sonnet scanning, Opus validation)
✅ Security hardening (prompt injection defense)
✅ Robust fallback logic (partial output detection)
✅ Smart triggers (opened, ready_for_review, labeled with guard)
✅ Label lifecycle management (auto-remove for re-triggering)

Fixes from PR review: - Fix pull-requests permission to 'write' (was 'read', would 403) - Remove redundant permissions block from review.agent.md - Fix IsProcessing companion fields to match actual ClearProcessingState() (~22 fields/ops, not 9 phantom fields) - Fix Organization.Sessions threading to _organizationLock pattern - Fix watchdog timeouts to actual 8 constants (30/35/60/120/180/600/3600s) - Fix HasUsedToolsThisTurn timeout to 180s (not 600s) - Add Wave 0 triage to avoid cost explosion on docs/test-only PRs - Use Sonnet for Wave 1 scanning, Opus for Wave 2 validation - Increase timeout from 60 to 90 minutes - Add repo file read step for authoritative field lists - Add partial-output fallback detection - Remove redundant permissions block New: - Add review-on-open.agent.md for automatic review on PRs from contributors with write access (trusted forks + direct pushes) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Combine pull_request (auto) and slash_command (/review) triggers into one workflow file. The body is identical — no need for two files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add 'if accessible' qualifier to Principle 1 skill reference - Add LastUpdatedAt to companion fields list - Add cost/noise note for auto-trigger on push Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ection - Replace 'verify all 9 companion fields' with ClearProcessingState() reference in Wave 1/Wave 2 sub-agent prompts (was stale pre-refactor count) - Add IsResumed as a 600s timeout condition in Dimension 10 Rule 2 (code gives 600s for ActiveToolCallCount>0 OR IsResumed+non-quiescent) - Add prompt-injection defense preamble: treat all PR content as untrusted Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The CHECK parenthetical still said 'only ActiveToolCallCount > 0 gives 600s' after Rule 2 was updated to include IsResumed. Now consistent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace synchronize (every push) with smarter triggers: - opened: first review on new PRs - ready_for_review: re-review when draft goes ready - labeled: re-review when 'review' label added - /review: on-demand via slash command (unchanged) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add issues:write permission and remove-label safe-output so the workflow can remove the triggering label, making it re-addable for future review requests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Step 1b: exit early if labeled event is not the 'review' label, preventing unrelated labels from triggering the pipeline - Step 6: specify how to detect the trigger label via github.event.action and github.event.label.name Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Match the tool name convention (create_pull_request_review_comment, submit_pull_request_review, add_comment, remove_label) used in the agent instructions and GitHub MCP tool names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace claude-opus-4.6, claude-sonnet-4.6, gpt-5.3-codex with claude-opus, claude-sonnet, gpt so the workflow automatically uses the latest available version of each model family. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Bare family names (claude-opus, claude-sonnet, gpt) are not confirmed to work in gh-aw. Revert to explicit versions matching the msbuild reference pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

gh-aw safe-output keys must use hyphens (kebab-case), not underscores. The compiler normalizes them to underscores in the lock file. Using underscores in the source caused the compiler to silently drop them, resulting in no PR review tools in the compiled workflow. Reverts the underscore change from PR #593 commit 10 — msbuild's original hyphenated format was correct. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen and others added 11 commits April 16, 2026 12:20

Consolidate into single review workflow with both triggers

d08b117

Combine pull_request (auto) and slash_command (/review) triggers into one workflow file. The body is identical — no need for two files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address re-review feedback: minor fixes

b2bc87d

- Add 'if accessible' qualifier to Principle 1 skill reference - Add LastUpdatedAt to companion fields list - Add cost/noise note for auto-trigger on push Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix Dimension 10 CHECK item to match updated Rule 2

e68fd28

The CHECK parenthetical still said 'only ActiveToolCallCount > 0 gives 600s' after Rule 2 was updated to include IsResumed. Now consistent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Auto-remove 'review' label after review completes

c7cc10e

Add issues:write permission and remove-label safe-output so the workflow can remove the triggering label, making it re-addable for future review requests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Revert to explicit model versions

56d2473

Bare family names (claude-opus, claude-sonnet, gpt) are not confirmed to work in gh-aw. Revert to explicit versions matching the msbuild reference pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen merged commit 879339d into main Apr 16, 2026

PureWeen deleted the feature/pr-review-workflow branch April 16, 2026 23:31

PureWeen mentioned this pull request Apr 17, 2026

Fix review workflow: add engine:copilot block and workflow_dispatch #594

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PR review agentic workflow with expert-reviewer agent#593

Add PR review agentic workflow with expert-reviewer agent#593
PureWeen merged 12 commits intomainfrom
feature/pr-review-workflow

PureWeen commented Apr 16, 2026

Uh oh!

PureWeen commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Apr 16, 2026

Summary

Files Added

Review Dimensions

Review Workflow

Uh oh!

PureWeen commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Multi-Model Code Review — PR #593

Review History

🔄 Re-Review Round 7 (2026-04-16T23:13Z) — Commits 10–12

Previous Finding — Final Status

Complete Finding Tracker — All 7 Rounds, 17 Findings

✅ 17/17 findings resolved.

Recommended Action

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PureWeen commented Apr 16, 2026 •

edited

Loading