Skip to content

Add PR review agentic workflow with expert-reviewer agent#593

Merged
PureWeen merged 12 commits intomainfrom
feature/pr-review-workflow
Apr 16, 2026
Merged

Add PR review agentic workflow with expert-reviewer agent#593
PureWeen merged 12 commits intomainfrom
feature/pr-review-workflow

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Summary

Adds a /review slash command agentic workflow modeled after dotnet/msbuild's approach, adapted for PolyPilot with domain-specific review dimensions.

Files Added

File Purpose
.github/workflows/review.agent.md Slash command trigger (/review on PR comments)
.github/workflows/shared/review-shared.md Shared config — permissions, tools, safe-outputs
.github/agents/expert-reviewer.agent.md Expert reviewer with 12 PolyPilot-specific review dimensions

Review Dimensions

  1. IsProcessing State Safety (BLOCKING) — 9 companion fields, FlushCurrentResponse, diagnostic tags
  2. SDK Event Handling (BLOCKING) — IDLE-DEFER, TurnEnd flush, error event cleanup
  3. Thread Safety & Concurrency (BLOCKING) — InvokeOnUI, ConcurrentDictionary vs List
  4. Multi-Agent Orchestration (MAJOR) — 5-phase lifecycle, PendingOrchestration, worker failures
  5. Session Persistence & Data Safety (MAJOR) — merge-based saves, atomic writes, test isolation
  6. Bridge Protocol & Remote Mode (MAJOR) — IsRemoteMode guards, optimistic adds
  7. MAUI / Blazor Hybrid Platform Safety (MAJOR) — no static readonly platform APIs, lazy properties
  8. Test Coverage & Quality (MAJOR) — SetBaseDirForTesting, Demo mode, Compile Include
  9. Performance & Render Pipeline (MODERATE) — debounce flush, cache invalidation
  10. Watchdog & Timeout Logic (MAJOR) — 3-tier timeouts, generation guards
  11. Connection & Server Management (MODERATE) — ReconnectAsync, PID files
  12. Code Quality & Conventions (NIT) — lazy properties, CSS conventions

Review Workflow

Uses msbuild's 4-wave pattern:

  1. Find — parallel sub-agents per dimension
  2. Validate — prove/disprove each finding with code traces and proof-of-concept tests
  3. Post — inline review comments on specific diff lines
  4. Summary — verdict table with COMMENT or REQUEST_CHANGES

Multi-model consensus (≥2/3) for borderline findings using claude-opus-4.6, claude-sonnet-4.6, and gpt-5.3-codex.

Adds a /review slash command workflow modeled after dotnet/msbuild's
approach, adapted for PolyPilot with domain-specific review dimensions:

- .github/workflows/review.agent.md: Slash command trigger
- .github/workflows/shared/review-shared.md: Shared config (permissions,
  tools, safe-outputs)
- .github/agents/expert-reviewer.agent.md: Expert reviewer with 12
  PolyPilot-specific review dimensions, multi-model consensus validation,
  and 4-wave review workflow (Find → Validate → Post → Summary)

Review dimensions cover: IsProcessing safety, SDK event handling, thread
safety, multi-agent orchestration, session persistence, bridge protocol,
MAUI/Blazor platform safety, test coverage, performance, watchdog logic,
connection management, and code conventions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 16, 2026

🔍 Multi-Model Code Review — PR #593

PR: Add PR review agentic workflow with expert-reviewer agent
CI Status: ⚠️ No CI checks reported on this branch
Reviewers: 3 independent reviewers | 7 review rounds across 12 commits


Review History

Round Commits Result Key Changes
1 #1 ⚠️ Request Changes 2 criticals (permissions, field list), 5 moderates, 4 minors
2 #2–3 ✅ Approve 10/11 fixed
3 #4 ✅ Approve 13/14 fixed
4 #5 ✅ Approve Security paragraph, IsResumed timeout, ~22 fields refs
5 #6–8 ✅ Approve Dim 10 CHECK fixed, triggers refined, label removal
6 #9 ✅ Approve Label guard + detection
7 #10–12 Approve Safe-output keys standardized — see below

🔄 Re-Review Round 7 (2026-04-16T23:13Z) — Commits 10–12

Commits:

  • 72d94d0 — Standardize safe-output keys to underscores
  • 703bcbc — Use model family names instead of versioned identifiers
  • 56d2473 — Revert to explicit model versions

Net change: Only the safe-output key rename survives (commits 11+12 cancel out).

Previous Finding — Final Status

# Finding Status
#3 🟢 Safe-output naming mismatch (hyphens vs underscores) FIXED — All YAML keys now use underscores, matching tool names in agent text

Verified: create_pull_request_review_comment, submit_pull_request_review, add_comment, remove_label — consistent across both review-shared.md and expert-reviewer.agent.md.


Complete Finding Tracker — All 7 Rounds, 17 Findings

# Finding Final Status
1 🔴 pull-requests: write missing ✅ Fixed (R2)
2 �� Companion fields list wrong ✅ Fixed (R2)
3 🟢 Safe-output naming mismatch Fixed (R7)
4 🟡 Watchdog timeouts inaccurate ✅ Fixed (R2)
5 🟡 HasUsedToolsThisTurn wrong ✅ Fixed (R2)
6 🟡 Organization.Sessions wrong ✅ Fixed (R2)
7 🟡 Cost explosion ✅ Fixed (R2)
8 🟡 60-min timeout ✅ Fixed (R2)
9 🟢 Redundant permissions ✅ Fixed (R2)
10 🟢 Fallback logic hole ✅ Fixed (R2)
11 🟢 Skill references ✅ Fixed (R2)
R2-1 🟡 Auto-trigger cost ✅ Fixed (R3–R5)
R2-2 🟢 LastUpdatedAt omitted ✅ Fixed (R3)
R2-3 🟢 Skill ref qualifier ✅ Fixed (R3)
R4-1 🟢 Dim 10 CHECK contradiction ✅ Fixed (R5)
R5-1 🟡 labeled fires for all labels ✅ Fixed (R6)
R5-2 🟢 Label detection underspecified ✅ Fixed (R6)

✅ 17/17 findings resolved.


Recommended Action

Approve

All 17 findings across 7 review rounds have been resolved. The PR delivers a comprehensive, well-designed review workflow:

  • ✅ Accurate review dimensions (verified against codebase across 5 rounds)
  • ✅ Proper permissions (pull-requests: write, issues: write)
  • ✅ Consistent naming (safe-output keys match tool identifiers)
  • ✅ Cost controls (Wave 0 triage, Sonnet scanning, Opus validation)
  • ✅ Security hardening (prompt injection defense)
  • ✅ Robust fallback logic (partial output detection)
  • ✅ Smart triggers (opened, ready_for_review, labeled with guard)
  • ✅ Label lifecycle management (auto-remove for re-triggering)

PureWeen and others added 11 commits April 16, 2026 12:20
Fixes from PR review:
- Fix pull-requests permission to 'write' (was 'read', would 403)
- Remove redundant permissions block from review.agent.md
- Fix IsProcessing companion fields to match actual ClearProcessingState()
  (~22 fields/ops, not 9 phantom fields)
- Fix Organization.Sessions threading to _organizationLock pattern
- Fix watchdog timeouts to actual 8 constants (30/35/60/120/180/600/3600s)
- Fix HasUsedToolsThisTurn timeout to 180s (not 600s)
- Add Wave 0 triage to avoid cost explosion on docs/test-only PRs
- Use Sonnet for Wave 1 scanning, Opus for Wave 2 validation
- Increase timeout from 60 to 90 minutes
- Add repo file read step for authoritative field lists
- Add partial-output fallback detection
- Remove redundant permissions block

New:
- Add review-on-open.agent.md for automatic review on PRs from
  contributors with write access (trusted forks + direct pushes)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Combine pull_request (auto) and slash_command (/review) triggers into
one workflow file. The body is identical — no need for two files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 'if accessible' qualifier to Principle 1 skill reference
- Add LastUpdatedAt to companion fields list
- Add cost/noise note for auto-trigger on push

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ection

- Replace 'verify all 9 companion fields' with ClearProcessingState()
  reference in Wave 1/Wave 2 sub-agent prompts (was stale pre-refactor count)
- Add IsResumed as a 600s timeout condition in Dimension 10 Rule 2
  (code gives 600s for ActiveToolCallCount>0 OR IsResumed+non-quiescent)
- Add prompt-injection defense preamble: treat all PR content as untrusted

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CHECK parenthetical still said 'only ActiveToolCallCount > 0 gives
600s' after Rule 2 was updated to include IsResumed. Now consistent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace synchronize (every push) with smarter triggers:
- opened: first review on new PRs
- ready_for_review: re-review when draft goes ready
- labeled: re-review when 'review' label added
- /review: on-demand via slash command (unchanged)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add issues:write permission and remove-label safe-output so the
workflow can remove the triggering label, making it re-addable
for future review requests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Step 1b: exit early if labeled event is not the 'review' label,
  preventing unrelated labels from triggering the pipeline
- Step 6: specify how to detect the trigger label via
  github.event.action and github.event.label.name

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the tool name convention (create_pull_request_review_comment,
submit_pull_request_review, add_comment, remove_label) used in the
agent instructions and GitHub MCP tool names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace claude-opus-4.6, claude-sonnet-4.6, gpt-5.3-codex with
claude-opus, claude-sonnet, gpt so the workflow automatically
uses the latest available version of each model family.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bare family names (claude-opus, claude-sonnet, gpt) are not confirmed
to work in gh-aw. Revert to explicit versions matching the msbuild
reference pattern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit 879339d into main Apr 16, 2026
@PureWeen PureWeen deleted the feature/pr-review-workflow branch April 16, 2026 23:31
PureWeen added a commit that referenced this pull request Apr 17, 2026
gh-aw safe-output keys must use hyphens (kebab-case), not underscores.
The compiler normalizes them to underscores in the lock file. Using
underscores in the source caused the compiler to silently drop them,
resulting in no PR review tools in the compiled workflow.

Reverts the underscore change from PR #593 commit 10 — msbuild's
original hyphenated format was correct.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant