fix: Re-check PR badge when session becomes active by PureWeen · Pull Request #459 · PureWeen/PolyPilot

PureWeen · 2026-03-31T04:30:24Z

When clicking a session in the sidebar or switching expanded views, invalidate the PrLinkService cache and re-fetch if no PR was previously found. This ensures PR badges appear promptly after creating a PR or switching branches, without aggressive polling.

Changes:

SessionListItem: Track IsActive transitions; on activation, invalidate cache + re-fetch if no PR cached
ExpandedSessionView: Re-check on session switch if no PR cached

No change to SessionCard (dashboard) — the 5-minute cache TTL handles that naturally.

…runtime validation Updated both worker charters and orchestrator routing to address gaps where multi-agent sessions failed but single-agent sessions succeeded: Implementer charter now requires: - Implementing EVERY requirement from the original prompt (completeness) - Launching runnable apps and verifying at runtime (not just build+test) - Performing any validation steps specified in the prompt Challenger charter now requires: - Cross-referencing original prompt requirements vs implementation - Runtime validation (launching the app, not just static review) - Performing the same validation steps the prompt specifies Orchestrator routing now requires: - Forwarding the COMPLETE original prompt to workers (no summarizing) - Always including full original requirements for completeness checks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…Challenger Implementer now follows 4 steps: Plan → Implement → Validate → Self-review. Creates a requirements checklist before writing code and verifies every item before reporting completion. Challenger now follows 4 steps: Build checklist → Code review → Completeness check → Runtime validation. Extracts requirements into a numbered checklist and verifies each item individually, matching the approach from proven multi-agent orchestration patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-03T05:17:12Z

🔍 R1 Review — PR #459

Reviewer: PP PR REVIEWER-worker-5 (3-model consensus)
Models: Claude Opus 4.6 · Claude Sonnet 4.6 · GPT-5.3-Codex
CI: No checks configured
Prior reviews: None

⚠️ Important: PR title/description do NOT match the code

The PR title ("fix: Re-check PR badge when session becomes active") and body (describing SessionListItem IsActive tracking, ExpandedSessionView cache invalidation, PrLinkService changes) describe completely different code than what is in the diff. The actual diff only changes worker prompt text in ModelCapabilities.cs for the "Implement & Challenge" group preset.

PR #469 (fix/pr-badge-refresh) is the one with the actual PR badge code changes.

Consensus Findings

#	Finding	Severity	Models	File / Lines
1	Misleading PR title/description — diff contains prompt improvements, not PR badge changes. Creates merge confusion risk: reviewers may approve thinking they reviewed a badge fix. Also pollutes `git log --grep` history.	🟡 MODERATE	3/3 (Opus + Sonnet + Codex)	PR metadata
2	Runtime validation hard requirement may cause stalls — New prompts mandate "MUST launch it and verify it works at runtime" for both Implementer and Challenger. In headless CI or contexts where runtime is unavailable, this could cause repeated failed loops instead of completing valid code-only tasks.	🟢 MINOR	2/3 (Opus + Codex)	`ModelCapabilities.cs` lines ~354-357, ~389-392

Non-consensus observations (1/3, informational only)

Sonnet noted that the new raw string literals introduce leading/trailing \n in the prompt strings (cosmetic, models tolerate it)
Sonnet noted Challenger "build the checklist" instruction relies on session history retention across iterations (fragile assumption, but works with current persistent sessions)
Codex noted increased token/context pressure from forwarding full original request + full worker output each iteration (up to 10 reflections)

What's clean ✅

All 3 models confirm: no code bugs, regressions, security issues, data loss, or race conditions
[[GROUP_REFLECT_COMPLETE]] sentinel preserved — reflection loop won't break
WorkerSystemPrompts array length unchanged (2) — existing tests pass
MaxReflectIterations, DefaultWorktreeStrategy unchanged
Prompt content is well-structured and the intent (planning + completeness checking + runtime validation) is sound

Test coverage

No new code paths requiring tests — this is a prompt-text-only change. Existing tests (ImplementAndChallenge_Preset_HasDistinctPersonas, WorktSystemPrompts_MatchWorkerCount) remain valid.

Recommended Action: ⚠️ Request Changes

Fix the PR title and description to match the actual code (e.g., "improve: Structured planning & validation steps for Implement & Challenge preset"). This is the only blocker.
Consider softening the "MUST launch" language to "SHOULD launch when runtime is available" to avoid stalls in headless contexts (minor, non-blocking).

The code changes themselves are safe to merge once the metadata is corrected.

…eview Implement & Challenge: - Implementer Step 2: Examine existing files before coding to match patterns - Challenger Step 4: Must cite exact commands and output as evidence PR Review Squad: - Zero tolerance for test failures — always request changes, even for pre-existing/flaky tests. Every PR should leave the suite greener. - Report ALL findings including minor nits. Every PR is an opportunity to improve the codebase. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-04T00:22:39Z

🔍 Multi-Model Code Review — PR #459 R2

PR: fix: Re-check PR badge when session becomes active (title unchanged)
Diff: 107 lines, 1 file (ModelCapabilities.cs) — prompt text only
CI: ⚠️ No checks configured
Prior reviews: R1 (⚠️ Request Changes — 2 findings)
Models: claude-opus-4.6 · claude-sonnet-4.6 · gpt-5.3-codex

R1 Findings Status

#	R1 Finding	Status
1	🟡 Misleading PR title/description (3/3)	❌ Still present — title still says "Re-check PR badge"
2	🟢 "MUST launch" hard requirement may stall (2/3)	❌ Still present — see Finding #3 below

Consensus Findings

1. 🟡 MODERATE — PR title/description still mismatches diff (R1 carryover)

Flagged by: Opus, Sonnet, Codex (3/3)

Title says "fix: Re-check PR badge when session becomes active." Diff only changes prompt text for the PR Reviewer and Implement & Challenge presets. No badge-related code. Misleads reviewers, pollutes git history, breaks git log --grep.

Fix: Rename to something like chore: strengthen reviewer and implement-challenge preset prompts.

2. 🔴 CRITICAL — Section 4b directly contradicts SharedContext delivered to the same agent

File: ModelCapabilities.cs, new section 4b (~lines 233–237) vs SharedContext (~lines 274–276)
Flagged by: Opus, Sonnet, Codex (3/3)

New 4b instructs: "Report ALL findings regardless of severity — even minor nits, naming inconsistencies, missing docs" and "Do not dismiss anything as 'too minor to mention.'"

Existing SharedContext (injected into every worker of this preset) instructs: "Only flag real issues: bugs, security holes, logic errors" and "NEVER comment on style, formatting, naming conventions, or documentation."

These are directly contradictory — the agent receives both "report all nits" and "NEVER comment on naming/docs" in the same prompt. Behavior becomes model-dependent and unpredictable. This also undermines the 2/3 consensus filter that is the multi-reviewer workflow's core value proposition.

Fix: Either (a) update SharedContext to match the new 4b philosophy, or (b) remove/soften 4b to align with the existing "real issues only" filter. The two sections must be consistent.

3. 🟡 MODERATE — "MUST launch" + Section 4a conflict with CI/headless contexts and consensus mechanism

File: ModelCapabilities.cs, 4a (~lines 227–232), Implementer prompt (~lines 357–359), Challenger prompt (~lines 389–392)
Flagged by: Opus, Sonnet, Codex (3/3)

"MUST launch": Both Implementer and Challenger mandate runtime launch ("Building alone is NOT sufficient"). In headless CI, container agents, or sessions without a display server, this is impossible. Agents will either stall attempting dotnet run, fabricate evidence, or reject valid work. Should be conditional: "If a runtime environment is available, launch and verify."
Section 4a: "ALWAYS request changes if ANY test fails, including pre-existing flaky tests" — the PR Reviewer operates on gh pr diff and may not have test output. Also creates tension with the existing CI distinction (PR-specific vs pre-existing failures).

Non-Consensus (1/3, informational)

Observation	Model
4a "no exceptions" bypasses adversarial consensus gate for test failures	Opus

What's Clean ✅

Implement & Challenge structured 4-step prompts (Plan, Implement, Validate, Self-review) are well-designed
[[GROUP_REFLECT_COMPLETE]] sentinel preserved
WorkerSystemPrompts array length unchanged (2)
RoutingContext improvements (forward COMPLETE request, verify completeness) are correct
No runtime code changes, no regressions, no security issues

Verdict: ⚠️ Request Changes

Three actions needed before merge:

Fix the PR title and description to match the actual diff content
Resolve the 4b ↔ SharedContext contradiction — these instructions are delivered to the same agent and directly conflict
Soften "MUST launch" to conditional — headless agents cannot comply

The prompt engineering improvements are valuable — the structured planning, checklist verification, and completeness checking are good additions. Just need consistency with the existing review standards.

R2 re-review · consensus threshold: 2/3 models must agree

…ional runtime - Update PR Review Squad SharedContext to flag ALL severities including minor nits (was 'NEVER comment on style' which contradicted 4b) - Soften 'MUST launch' to 'launch when runtime is available' for headless/CI contexts Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-04T00:57:36Z

🔍 Multi-Model Code Review — PR #459 R3

PR: fix: Re-check PR badge when session becomes active (title unchanged)
Diff: 122 lines, 1 file (ModelCapabilities.cs) — 4 commits
CI: ⚠️ No checks configured
Prior reviews: R1 (⚠️), R2 (⚠️ — 3 findings)
Models: claude-opus-4.6 · claude-sonnet-4.6 · gpt-5.3-codex

R2 Findings Status

#	R2 Finding	Status
1	🟡 PR title/description mismatches diff	❌ Still open (3/3 confirm)
2	�� Section 4b ↔ SharedContext contradiction	✅ Fixed — SharedContext updated to "Flag ALL issues regardless of severity" (3/3 confirm)
3	🟡 "MUST launch" stalls headless agents	✅ Fixed — now conditional: "when a runtime environment is available" / "when possible" (2/3 confirm; Codex notes residual "MUST perform exact validation steps" for user-specified steps, but this is about following explicit user instructions, not general launch)

Remaining Finding

🟡 MODERATE — PR title still mismatches diff content (R1 → R2 → R3 unresolved)

Flagged by: Opus, Sonnet, Codex (3/3)

Title: fix: Re-check PR badge when session becomes active
Diff: Prompt improvements for Implement & Challenge + PR Reviewer presets. Zero badge/session-active code.

Breaks git log traceability and reviewer triage. Should be: refine: Strengthen Implement & Challenge charters and PR reviewer prompts (or similar).

What's Clean ✅

SharedContext and 4b are now fully consistent — both say "flag all severities"
"MUST launch" conditionally gated for headless environments
[[GROUP_REFLECT_COMPLETE]] sentinel preserved
WorkerSystemPrompts array length unchanged (2)
Structured 4-step prompts (Plan → Implement → Validate → Self-review) well-designed
RoutingContext improvements (forward complete request, verify completeness) correct
No runtime code changes, no regressions, no security issues

Verdict: ✅ Approve (with title fix requested)

The two substantive R2 findings (SharedContext contradiction, headless stall) are properly resolved. The only remaining item is the misleading PR title — this is a metadata issue, not a code issue. The code changes themselves are safe and ready to merge.

Recommendation: Fix the PR title before or at merge time (e.g., edit via GitHub UI or gh pr edit 459 --title "refine: Strengthen Implement & Challenge charters and PR reviewer prompts").

R3 re-review · consensus threshold: 2/3 models must agree

PureWeen force-pushed the fix/pr-badge-refresh-on-activate branch from a30850d to a1e608e Compare April 1, 2026 20:13

PureWeen force-pushed the fix/pr-badge-refresh-on-activate branch from cb41220 to a49c977 Compare April 1, 2026 21:02

PureWeen merged commit 32fe00e into main Apr 4, 2026

PureWeen deleted the fix/pr-badge-refresh-on-activate branch April 4, 2026 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Re-check PR badge when session becomes active#459

fix: Re-check PR badge when session becomes active#459
PureWeen merged 4 commits intomainfrom
fix/pr-badge-refresh-on-activate

PureWeen commented Mar 31, 2026

Uh oh!

PureWeen commented Apr 3, 2026

Uh oh!

PureWeen commented Apr 4, 2026

Uh oh!

PureWeen commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Mar 31, 2026

Uh oh!

PureWeen commented Apr 3, 2026

🔍 R1 Review — PR #459

⚠️ Important: PR title/description do NOT match the code

Consensus Findings

Non-consensus observations (1/3, informational only)

What's clean ✅

Test coverage

Recommended Action: ⚠️ Request Changes

Uh oh!

PureWeen commented Apr 4, 2026

🔍 Multi-Model Code Review — PR #459 R2

R1 Findings Status

Consensus Findings

1. 🟡 MODERATE — PR title/description still mismatches diff (R1 carryover)

2. 🔴 CRITICAL — Section 4b directly contradicts SharedContext delivered to the same agent

3. 🟡 MODERATE — "MUST launch" + Section 4a conflict with CI/headless contexts and consensus mechanism

Non-Consensus (1/3, informational)

What's Clean ✅

Verdict: ⚠️ Request Changes

Uh oh!

PureWeen commented Apr 4, 2026

🔍 Multi-Model Code Review — PR #459 R3

R2 Findings Status

Remaining Finding

🟡 MODERATE — PR title still mismatches diff content (R1 → R2 → R3 unresolved)

What's Clean ✅

Verdict: ✅ Approve (with title fix requested)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant