Skip to content

Simplify built-in presets: keep PR Review Squad + add Implement & Challenge#225

Merged
PureWeen merged 29 commits intomainfrom
bridge-history-hardening-v2
Feb 27, 2026
Merged

Simplify built-in presets: keep PR Review Squad + add Implement & Challenge#225
PureWeen merged 29 commits intomainfrom
bridge-history-hardening-v2

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Summary

Simplify built-in multi-agent presets from 6 down to 2:

Kept

  • 📋 PR Review Squad — Orchestrator mode, 5 specialized reviewers
  • ⚔️ Implement & Challenge (new) — OrchestratorReflect adversarial loop

Removed

  • Code Review Team (overlapped with PR Review Squad)
  • Multi-Perspective Analysis (broadcast mode, niche use case)
  • Quick Reflection Cycle (overlapped with Implement & Challenge)
  • Deep Research (niche use case)

Implement & Challenge Preset

Uses OrchestratorReflect mode with two workers:

  • Implementer (claude-sonnet-4.6): Codes the solution
  • Challenger (claude-opus-4.6): Reviews, finds issues, suggests improvements

The loop iterates until the challenger is satisfied ([[GROUP_REFLECT_COMPLETE]]), with stall detection and max 5 iterations (overridable with --max N).

Users who need the removed presets can recreate them as custom presets or via .squad/ team definitions.

Test Changes

  • Updated all test references from removed presets to remaining ones
  • All 1400 tests pass

@PureWeen
Copy link
Copy Markdown
Owner Author

TODO: Remaining work for this PR and follow-ups

What this PR does

Simplifies built-in multi-agent presets from 6 → 2:

  • 📋 PR Review Squad (kept) — Orchestrator mode with 5 specialized code reviewers
  • ⚔️ Implement & Challenge (new) — OrchestratorReflect adversarial loop with an implementer (claude-sonnet-4.6) and challenger (claude-opus-4.6)

Removed: Code Review Team, Multi-Perspective Analysis, Quick Reflection Cycle, Deep Research (redundant or niche).

How Implement & Challenge works

  • Uses OrchestratorReflect mode — loops until challenger emits [[GROUP_REFLECT_COMPLETE]] sentinel
  • Default MaxIterations = 5, user can set up to 100 via the Iterations spinner in the group header
  • Stall detection: exits if 2 consecutive iterations have >90% Jaccard token similarity
  • Challenger system prompt instructs it to be adversarial — find bugs, edge cases, missing tests
  • Implementer system prompt instructs it to write production-quality code and address all feedback

Related PRs

Open items from code review of PR #214 (addressed in PR #215)

  • HistoryLock + GetHistorySnapshot() added to AgentSessionInfo for thread-safe reads
  • All 6+ background-thread write sites in CopilotService.Bridge.cs now lock on HistoryLock
  • Server-side MaxHistoryPayloadMessages = 500 cap in WsBridgeServer.SendSessionHistoryToClient
  • Retry logic for failed history requests in SyncRemoteSessions

Still outstanding (future work)

  • Silent return on snapshot failure — If GetHistorySnapshot() fails in SendSessionHistoryToClient, we log + return without sending. Client has _requestedHistorySessions keyed, so it won't auto-retry. PR Harden bridge History thread-safety with comprehensive HistoryLock coverage #215 added retry logic in SyncRemoteSessions but a more robust retry/backoff mechanism would help.
  • UI-thread History reads still unlocked — ~60 sites in Dashboard.razor read session.History without locking. These tolerate stale data but could theoretically see torn reads. A proper fix would be a snapshot method or ReadOnlyCollection wrapper on AgentSessionInfo.History so all callers benefit.
  • LoadFullRemoteHistoryAsync has no server-side cap — passes limit: null. Server now caps at 500 via MaxHistoryPayloadMessages, but the client-side API still allows requesting unlimited.
  • Consider raising UI max iterations above 100 — Currently the Iterations spinner in Dashboard.razor and SessionSidebar.razor caps at max="100". For truly long-running implement & challenge loops, this could be raised or made unlimited.
  • Implement & Challenge could benefit from shared worktree context — The preset sets DefaultWorktreeStrategy = WorktreeStrategy.Shared and SharedContext explaining the workflow, but the orchestrator routing prompt could be enhanced with more specific task-splitting guidance.

Key files

  • PolyPilot/Models/ModelCapabilities.csGroupPreset definitions, BuiltIn array
  • PolyPilot/Models/ReflectionCycle.cs — Reflection state machine, stall detection, sentinel parsing
  • PolyPilot/Services/CopilotService.Organization.csSendViaOrchestratorReflectAsync, orchestration engine
  • PolyPilot/Models/AgentSessionInfo.csHistoryLock, GetHistorySnapshot() (in PR Harden bridge History thread-safety with comprehensive HistoryLock coverage #215)
  • PolyPilot/Services/CopilotService.Bridge.cs — Remote mode event handlers, history sync
  • PolyPilot/Services/WsBridgeServer.cs — WebSocket server, SendSessionHistoryToClient

Testing

All 1400 tests pass. Test files updated:

  • SessionOrganizationTests.cs — preset count, group creation scenarios
  • MultiAgentRegressionTests.cs — renamed persona test to Implement & Challenge
  • SquadDiscoveryTests.cs — updated built-in preset name reference
  • multi-agent-scenarios.json — all scenario preset references updated

PureWeen and others added 23 commits February 26, 2026 21:53
Two-agent OrchestratorReflect preset: an Implementer builds the solution,
a Challenger reviews it, and they loop until the Challenger approves
(emits [[GROUP_REFLECT_COMPLETE]]) or max iterations reached.

- Implementer: claude-sonnet-4.6 (fast, good at coding)
- Challenger: claude-opus-4.6 (thorough, good at finding bugs)
- Orchestrator: claude-opus-4.6 (routes between the two)
- RoutingContext enforces strict alternation pattern

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…llenge

Remove 4 redundant built-in presets (Code Review Team, Multi-Perspective
Analysis, Quick Reflection Cycle, Deep Research) and add new Implement &
Challenge preset using OrchestratorReflect mode.

Built-in presets now:
- PR Review Squad (Orchestrator, 5 reviewers)
- Implement & Challenge (OrchestratorReflect, implementer + challenger loop)

The Implement & Challenge preset uses an adversarial loop where an
implementer codes a solution and a challenger reviews it, iterating
until the challenger approves via [[GROUP_REFLECT_COMPLETE]] sentinel.
Default max iterations: 5, overridable with --max N.

Update all test references to use remaining presets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…for repo presets, update NuGet packages

- Built-in presets are now always shown in the Built-in section (repo/user presets with same name are skipped)
- Added delete button (trash icon) for From Repo presets in the preset picker
- Added UserPresets.DeleteRepoPreset() to remove .squad/ directories
- Updated NuGet packages: GitHub.Copilot.SDK 0.1.26, Markdig 1.0.0, MauiDevFlow 0.11.0, Test.Sdk 18.3.0
- Updated test: GetAll_RepoDoesNotOverride_BuiltInByName

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eset priority docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FlushCurrentResponse on AssistantTurnEndEvent was clearing CurrentResponse
before CompleteResponse (on SessionIdleEvent) could read it for the TCS
result. This caused SendPromptAndWaitAsync to return "" and
ParseTaskAssignments to find 0 worker assignments.

Add FlushedResponse StringBuilder to SessionState that accumulates text
flushed mid-turn. CompleteResponse now combines FlushedResponse +
CurrentResponse for the TCS result, ensuring orchestrator dispatch gets
the full plan text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Worker sessions dispatched by orchestrators have no interactive user to
approve tool permission requests. Without OnPermissionRequest, the SDK
returns 'denied-no-approval-rule-and-could-not-request-from-user' for
every tool call (view, edit, powershell, grep, glob, etc).

Set OnPermissionRequest = AutoApprovePermissions on all SessionConfig
and ResumeSessionConfig instances so tools execute without prompting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Update worker prompts: Implementer must make actual code changes, build,
  test, and commit. Challenger must run git diff and verify changes.
- Fix RoutingContext: reference actual worker names (worker-1, worker-2)
  instead of abstract role names (Implementer, Challenger).
- Add MaxReflectIterations to GroupPreset and SessionGroup so presets can
  configure how many reflect loops to run (Implement & Challenge = 10).
- Wire up MaxReflectIterations in Dashboard and SessionSidebar UI so the
  preset default is used when the user hasn't overridden it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dropdown now defaults to 'Preset default' instead of 'Each agent gets
own worktree'. This lets each preset define its own optimal strategy —
e.g., Implement & Challenge defaults to Shared (workers alternate, need
to see each other's file changes), while PR Review Squad defaults to
FullyIsolated (workers run in parallel on different branches).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AutoStartReflectionIfNeeded was only called from the group-level
input boxes (SendToMultiAgentGroup/SendToExpandedMultiAgentGroup),
but NOT from the orchestrator's own chat input path at line 1278.
This caused ReflectionState to remain null, making
SendViaOrchestratorReflectAsync fall back to plain orchestrator
mode — no worker dispatch, no reflect loop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The orchestrator prompt had two issues causing it to use SDK tools
(task, grep, edit) directly instead of producing @worker: blocks:

1. Line 989 said 'If you can handle the request entirely yourself,
   just respond normally without any @worker blocks' — giving the
   model explicit permission to skip delegation.

2. The RoutingContext used tool-like language ('run git diff', 'build,
   test, commit') that the model pattern-matched to its SDK tools.

Fix: Rewrite the planning prompt to say the orchestrator is a
DISPATCHER ONLY with no tools, and reword the RoutingContext to
describe the dispatch pattern without tool-like language.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Worker sessions that were created but never received messages have no
events.jsonl, so the CLI server returns 'Session not found' on resume.
This caused multi-agent workers to silently vanish from the sidebar
after relaunch.

Now falls back to CreateSessionAsync when resume fails with 'Session
not found', preserving the session in _sessions so it remains visible
and functional.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Session deletion triggered rapid render batch churn:
1. Session.DisposeAsync() talked to CLI, triggering SDK events
2. OnStateChanged fired from CloseSessionAsync
3. ReconcileOrganization fired another OnStateChanged

This caused Blazor render batch ordering races ('r.parentNode.removeChild'
on null) and 'unexpected acknowledgement for render batch N' errors.

Fix: move DisposeAsync to fire-and-forget after UI update, and remove
redundant ReconcileOrganization call (session already removed from
_sessions, reconciliation just caused extra state churn).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous fix removed the JS DOM portal but kept the dialog in Blazor's
render tree. The overlay showed but the dialog was clipped/invisible
due to ancestor overflow constraints that position:fixed alone can't
escape in WebView with scoped CSS.

Adopt the approach from PR #226: render the close-confirmation dialog
entirely via JS (window.showCloseSessionDialog), creating DOM elements
imperatively and appending to document.body. Blazor never tracks these
elements, so no render batch desync on open/close/delete.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Track dispatched workers across iterations in the reflect loop
- Override [[GROUP_REFLECT_COMPLETE]] if not all workers have participated
- Include RoutingContext in synthesis prompt so orchestrator remembers I&C pattern
- Show worker participation status (missing workers) in synthesis prompt
- Add DISPATCHER ONLY language to replan prompt for iteration 2+
- Pass RoutingContext to replan prompt for consistent routing awareness

Fixes: orchestrator prematurely completing after only worker-1 (Implementer)
runs, without waiting for worker-2 (Challenger) to review. Also fixes the
loop stopping after Challenger finds issues instead of continuing to have
Implementer fix them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Track only successfully completed workers (not just dispatched) so
  failed workers don't count as having participated
- Replace continue statements with flow-through logic so stall
  detection, SaveOrganization, and UI updates always run
- Add comment documenting HashSet sequential access safety

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SDK 0.1.26 (PR github/copilot-sdk#485) now natively maps maccatalyst
RIDs to osx RIDs for CLI download. Our _FixCopilotRidForMacCatalyst
workaround's condition ('$(_CopilotPlatform)' == '') no longer fires
since the SDK sets it first, causing _CopilotOriginalRid to never be
set, which skipped both the maccatalyst-arm64 runtime copy AND the
MonoBundle copy. Clean builds would produce an app with no copilot CLI.

Changes:
- Remove _FixCopilotRidForMacCatalyst (SDK handles this now)
- Update _CopyCopilotCliForMacCatalyst to detect maccatalyst via
  GetTargetPlatformIdentifier instead of _CopilotOriginalRid
- Update _IncludeCopilotCliInBundle with same condition fix
- Use $(RuntimeIdentifier) directly for the maccatalyst output path

Fixes: github/copilot-sdk#454

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SDK 0.1.26 (github/copilot-sdk#485) handles the download mapping, but
MAUI still won't bundle the binary because the SDK's ContentWithTargetPath
registration lacks PublishFolderType metadata. The fix is a single target
that re-registers the item with PublishFolderType=Assembly and a flat
TargetPath so MAUI places it in MonoBundle.

Replaced 5 custom MSBuild targets (55 lines) with 1 target (10 lines).
Verified via clean build: copilot binary lands in .app/Contents/MonoBundle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CloseSessionAsync removed sessions from _sessions dict and
active-sessions.json but NOT from Organization.Sessions. The
ReconcileOrganization call was removed earlier to fix render crashes.
On restart, the session metadata in organization.json caused the
deleted session to reappear.

Fix: explicitly remove from Organization.Sessions and call
SaveOrganization() in CloseSessionAsync, avoiding the full
ReconcileOrganization() that triggers render batch churn.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Make DISPATCHER ONLY language conditional via dispatcherOnly parameter
  so PR Review Squad orchestrator can still use tools (only I&C reflect
  mode sets dispatcherOnly=true)
- Track attempted vs successful workers separately to distinguish
  'never dispatched' from 'dispatched but failed' in override messages
- Hoist allWorkersDispatched computation before evaluator/self-eval
  branch to eliminate variable duplication

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When multiple active-sessions.json entries share the same display name
but different session IDs (e.g., from mode switches or reconnects),
CloseSessionAsync only tracked the current session's ID in
_closedSessionIds. The other entries survived the merge because their
IDs weren't in the closed set, and their session-state directories
still existed on disk.

Fix: add _closedSessionNames tracking alongside _closedSessionIds.
MergeSessionEntries now filters by both session ID and display name,
ensuring all entries for a closed session are excluded.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- CloseSessionAsync: use FlushSaveOrganization() instead of debounced
  SaveOrganization() to prevent ReconcileOrganization re-adding the
  deleted session during the 2-second debounce window
- DeleteRepoPreset: validate SourcePath ends with .squad or .ai-team
  before recursive delete to prevent accidental deletion of unrelated dirs
- SessionErrorEvent: reorder FlushCurrentResponse before FlushedResponse
  .Clear() so partial response text is preserved in history on errors
- Add 3 tests for closedNames merge filtering: by name, case-insensitive,
  and duplicate session IDs with same display name

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Strip [[GROUP_REFLECT_COMPLETE]] from worker responses in BuildSynthesisPrompt
  to prevent Challenger's approval sentinel from leaking into synthesis and
  causing premature reflection loop termination
- Remove unused allWorkersAttempted variable (dead code)
- Add guard in StartGroupReflection to skip if reflection is already active
- Add 6 tests: sentinel stripping (4), reflection guard (2)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… FlushedResponse

- SaveOrganization debounce: timer callback now re-serializes live state
  instead of writing a pre-captured stale JSON snapshot. Prevents race where
  FlushSaveOrganization writes newer state, then stale timer overwrites it.
- LoadOrganization self-healing: detects groups with IsMultiAgent=false that
  have orchestrator sessions and restores IsMultiAgent=true on load.
- GetOrCreateRepoGroup defensive check: skips groups with orchestrator sessions
  even if IsMultiAgent is somehow false, preventing repo group overwrite.
- AbortAsync: now combines FlushedResponse + CurrentResponse for partial
  response preservation (was only using CurrentResponse).
- Fix BuildOrchestratorPlanningPrompt test parameter count mismatch.
- Add 3 new tests: LoadOrganization healing, GetOrCreateRepoGroup defensive
  skip, SaveOrganization debounce writes live state.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen and others added 5 commits February 26, 2026 21:53
…story, add [Collection] to flaky test

- CopilotService.Events.cs: wrap FlushCurrentResponse(state) call in Invoke()
  in ToolExecutionStartEvent handler to ensure it runs on the UI thread
- CopilotService.cs: AbortSessionAsync uses only CurrentResponse for partial
  history save — FlushedResponse content was already committed to History by
  FlushCurrentResponse, so combining both caused duplicate history entries
- SessionOrganizationTests.cs: add [Collection("BaseDir")] to GroupingStabilityTests
  to prevent parallel SetBaseDirForTesting calls causing flaky test failures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… guard

- SaveOrganization timer callback now marshals to UI thread via InvokeOnUI
  to prevent ThreadPool/UI thread race on List<T> serialization
- DeleteRepoPreset: added Path.GetFullPath containment check to prevent
  path traversal attacks via crafted SourcePath
- Reflection guard: added debug logging, documented UI-thread-only
- Fixed test compilation: CreateTestService -> CreateService,
  GroupPreset.DeleteRepoPreset -> UserPresets.DeleteRepoPreset
- Added 7 new tests: sentinel stripping (2), path traversal (2),
  debounce InvokeOnUI validation (1), debounce live-state (1),
  nonexistent preset (1)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move History.Add calls into Invoke/InvokeOnUI blocks to prevent concurrent
List<ChatMessage> writes from background SDK event threads and UI thread.

- ToolExecutionStartEvent: moved FlushCurrentResponse + History.Add + event
  invocations into single Invoke() block
- ApplyReasoningUpdate: wrapped new reasoning message History.Add in InvokeOnUI()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #222 fixed the test race that corrupts repos.json, but the damage
was already done — real repos were replaced with test data (repo-1).
The bare clones still exist on disk but aren't tracked in repos.json.

Load() now calls HealMissingRepos() which scans the repos/ directory
for bare clones that exist on disk but aren't in the state file,
reads their remote URL from git config, and re-adds them. Also
reconstructs missing worktree entries from the worktrees/ directory.

Added 4 tests for the self-healing logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the bridge-history-hardening-v2 branch from 95f7421 to c383c9f Compare February 27, 2026 03:59
- ApplyReasoningUpdate: PendingReasoningMessages ConcurrentDictionary prevents
  duplicate reasoning ChatMessages when rapid deltas arrive before InvokeOnUI
  fires. Messages registered in map before posting to UI thread; removed after
  History.Add completes. Map cleared in CompleteResponse, SendPromptAsync,
  AbortSessionAsync, error handler, watchdog, and reconnect paths.

- SendViaOrchestratorReflectAsync: moved reflectState.IsActive=false and
  CompletedAt into finally block so they always run, even on
  OperationCanceledException. Previously, cancellation would permanently
  block future reflections for that group.

- Added 4 reasoning dedup tests (ReasoningDedupTests.cs)
- Added 2 reflection cancellation tests (ReflectionCycleTests.cs)
- All 1546 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit 19219f1 into main Feb 27, 2026
@PureWeen PureWeen deleted the bridge-history-hardening-v2 branch February 27, 2026 04:49
PureWeen added a commit that referenced this pull request Mar 12, 2026
When ResumeSessionAsync fails with 'Session not found' / 'corrupt' /
'session file' errors, the fallback path was calling CreateSessionAsync
without recovering history from the old session's events.jsonl. This
caused sessions to appear empty after app relaunch.

The fix adds history recovery to the fallback path:
1. Load history from old session via LoadHistoryFromDisk(entry.SessionId)
2. Inject recovered messages into the new session's Info.History
3. Set MessageCount and LastReadMessageCount
4. Call RestoreUsageStats(entry) to preserve CreatedAt, token counts
5. Sync recovered history to chat DB under the new session ID
6. Add a system message indicating the session was recreated

Bug introduced in PR #225 (commit 19219f1), worsened by PR #308
(commit 72886a2) which expanded the catch conditions without adding
history recovery.

Adds 5 structural regression tests guarding the fallback path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen added a commit that referenced this pull request Mar 13, 2026
## Problem

When `ResumeSessionAsync` fails during app restart ("Session not found"
/ "corrupt" / "session file" errors), the fallback path called
`CreateSessionAsync` which created a **blank** session with zero
messages. The old session's `events.jsonl` was never loaded, so all
conversation history was lost.

**User impact:** Sessions appeared empty after app relaunch — the user
would see their session restored but with 0 messages, forcing them to
create duplicate sessions.

## Root Cause

Introduced in PR #225 (commit `19219f1`) which added the "Session not
found" fallback. Worsened in PR #308 which expanded the catch conditions
to include "corrupt" and "session file" errors. Neither PR added history
recovery to the fallback path.

`ResumeSessionAsync` (the success path) correctly loads history via
`LoadHistoryFromDisk(sessionId)`. The fallback path skipped this
entirely.

## Fix

In the fallback path of `RestorePreviousSessionsAsync`:

1. **Load history** from the old session's `events.jsonl` via
`LoadHistoryFromDisk(entry.SessionId)` before creating the new session
2. **Inject history** into the recreated session's `Info.History`
3. **Set message counts** (`MessageCount`, `LastReadMessageCount`) so
the UI shows the correct state
4. **Restore usage stats** (`CreatedAt`, token counts) via
`RestoreUsageStats(entry)`
5. **Sync to DB** via `BulkInsertAsync` under the new session ID
6. **Add indicator** system message: "🔄 Session recreated — conversation
history recovered from previous session."

## Duplicate Session Issue

The user also observed a duplicate session (`session-20260311-001729`)
created with the same repo as the original. This was a downstream
consequence: after seeing the original session restored with zero
messages, the user manually created a new session. With this fix, the
full history is preserved, eliminating the need for duplicates.

## Tests

Added 5 structural regression tests in `SessionPersistenceTests.cs`:
- `RestoreFallback_LoadsHistoryFromOldSession` — verifies
`LoadHistoryFromDisk` appears before `CreateSessionAsync`
- `RestoreFallback_InjectsHistoryIntoRecreatedSession` — verifies
history injection + message count
- `RestoreFallback_RestoresUsageStats` — verifies `RestoreUsageStats`
call
- `RestoreFallback_SyncsHistoryToDatabase` — verifies `BulkInsertAsync`
call
- `RestoreFallback_AddsReconnectionIndicator` — verifies system message

All 2464 tests pass.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dinisdopion-sys
Copy link
Copy Markdown

TODO: Осталась работа над этим PR и последующими действиями.

Что делает этот PR?

Упрощает настройку встроенных многоагентных предустановок с 6 до 2:

  • 📋 Команда по проверке запросов на слияние (сохранена) — режим оркестратора с 5 специализированными рецензентами кода.
  • ⚔️ Реализация и оспаривание (новое) — Цикл состязательного взаимодействия OrchestratorReflect с реализатором (claude-sonnet-4.6) и претендентом (claude-opus-4.6)

Удалено: Группа проверки кода, Многосторонний анализ, Цикл быстрой рефлексии, Глубокое исследование (избыточное или узкоспециализированное).

Как работает метод внедрения и анализа

  • Использует OrchestratorReflectрежим — цикл повторяется до тех пор, пока претендент не выпустит [[GROUP_REFLECT_COMPLETE]]сигнальный модуль.
  • По умолчанию MaxIterations = 5пользователь может установить до 100 итераций с помощью выпадающего списка в заголовке группы.
  • Обнаружение задержек: завершает работу, если две последовательные итерации имеют сходство токенов Жаккара более 90%.
  • Система Challenger дает указание действовать враждебно — находить ошибки, граничные случаи, отсутствующие тесты.
  • Система, запускаемая программистом, предписывает ему писать код производственного качества и учитывать все отзывы.

Связанные пресс-релизы

Open items from code review of PR #214 (addressed in PR #215)

  • HistoryLock + GetHistorySnapshot() added to AgentSessionInfo for thread-safe reads
  • All 6+ background-thread write sites in CopilotService.Bridge.cs now lock on HistoryLock
  • Server-side MaxHistoryPayloadMessages = 500 cap in WsBridgeServer.SendSessionHistoryToClient
  • Retry logic for failed history requests in SyncRemoteSessions

Still outstanding (future work)

  • Silent return on snapshot failure — If GetHistorySnapshot() fails in SendSessionHistoryToClient, we log + return without sending. Client has _requestedHistorySessions keyed, so it won't auto-retry. PR Harden bridge History thread-safety with comprehensive HistoryLock coverage #215 added retry logic in SyncRemoteSessions but a more robust retry/backoff mechanism would help.
  • UI-thread History reads still unlocked — ~60 sites in Dashboard.razor read session.History without locking. These tolerate stale data but could theoretically see torn reads. A proper fix would be a snapshot method or ReadOnlyCollection wrapper on AgentSessionInfo.History so all callers benefit.
  • LoadFullRemoteHistoryAsync has no server-side cap — passes limit: null. Server now caps at 500 via MaxHistoryPayloadMessages, but the client-side API still allows requesting unlimited.
  • Consider raising UI max iterations above 100 — Currently the Iterations spinner in Dashboard.razor and SessionSidebar.razor caps at max="100". For truly long-running implement & challenge loops, this could be raised or made unlimited.
  • Implement & Challenge could benefit from shared worktree context — The preset sets DefaultWorktreeStrategy = WorktreeStrategy.Shared and SharedContext explaining the workflow, but the orchestrator routing prompt could be enhanced with more specific task-splitting guidance.

Key files

  • PolyPilot/Models/ModelCapabilities.csGroupPreset definitions, BuiltIn array
  • PolyPilot/Models/ReflectionCycle.cs — Reflection state machine, stall detection, sentinel parsing
  • PolyPilot/Services/CopilotService.Organization.csSendViaOrchestratorReflectAsync, orchestration engine
  • PolyPilot/Models/AgentSessionInfo.csHistoryLock, GetHistorySnapshot() (in PR Harden bridge History thread-safety with comprehensive HistoryLock coverage #215)
  • PolyPilot/Services/CopilotService.Bridge.cs — Remote mode event handlers, history sync
  • PolyPilot/Services/WsBridgeServer.cs — WebSocket server, SendSessionHistoryToClient

Testing

All 1400 tests pass. Test files updated:

  • SessionOrganizationTests.cs — preset count, group creation scenarios
  • MultiAgentRegressionTests.cs — renamed persona test to Implement & Challenge
  • SquadDiscoveryTests.cs — updated built-in preset name reference
  • multi-agent-scenarios.json — all scenario preset references updated

arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
## Problem

When `ResumeSessionAsync` fails during app restart ("Session not found"
/ "corrupt" / "session file" errors), the fallback path called
`CreateSessionAsync` which created a **blank** session with zero
messages. The old session's `events.jsonl` was never loaded, so all
conversation history was lost.

**User impact:** Sessions appeared empty after app relaunch — the user
would see their session restored but with 0 messages, forcing them to
create duplicate sessions.

## Root Cause

Introduced in PR PureWeen#225 (commit `19219f1`) which added the "Session not
found" fallback. Worsened in PR PureWeen#308 which expanded the catch conditions
to include "corrupt" and "session file" errors. Neither PR added history
recovery to the fallback path.

`ResumeSessionAsync` (the success path) correctly loads history via
`LoadHistoryFromDisk(sessionId)`. The fallback path skipped this
entirely.

## Fix

In the fallback path of `RestorePreviousSessionsAsync`:

1. **Load history** from the old session's `events.jsonl` via
`LoadHistoryFromDisk(entry.SessionId)` before creating the new session
2. **Inject history** into the recreated session's `Info.History`
3. **Set message counts** (`MessageCount`, `LastReadMessageCount`) so
the UI shows the correct state
4. **Restore usage stats** (`CreatedAt`, token counts) via
`RestoreUsageStats(entry)`
5. **Sync to DB** via `BulkInsertAsync` under the new session ID
6. **Add indicator** system message: "🔄 Session recreated — conversation
history recovered from previous session."

## Duplicate Session Issue

The user also observed a duplicate session (`session-20260311-001729`)
created with the same repo as the original. This was a downstream
consequence: after seeing the original session restored with zero
messages, the user manually created a new session. With this fix, the
full history is preserved, eliminating the need for duplicates.

## Tests

Added 5 structural regression tests in `SessionPersistenceTests.cs`:
- `RestoreFallback_LoadsHistoryFromOldSession` — verifies
`LoadHistoryFromDisk` appears before `CreateSessionAsync`
- `RestoreFallback_InjectsHistoryIntoRecreatedSession` — verifies
history injection + message count
- `RestoreFallback_RestoresUsageStats` — verifies `RestoreUsageStats`
call
- `RestoreFallback_SyncsHistoryToDatabase` — verifies `BulkInsertAsync`
call
- `RestoreFallback_AddsReconnectionIndicator` — verifies system message

All 2464 tests pass.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants