Skip to content

Harden bridge History thread-safety with comprehensive HistoryLock coverage#215

Closed
PureWeen wants to merge 4 commits intomainfrom
bridge-history-hardening
Closed

Harden bridge History thread-safety with comprehensive HistoryLock coverage#215
PureWeen wants to merge 4 commits intomainfrom
bridge-history-hardening

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

@PureWeen PureWeen commented Feb 26, 2026

Follow-up to #214. Addresses review findings around History thread-safety, server-side enforcement, retry resilience, and lock scope correctness.

Changes

1. Thread-safe History access via HistoryLock

  • Added HistoryLock + GetHistorySnapshot() to AgentSessionInfo — a single lock object and snapshot method for all callers
  • Locked all background-thread writes in CopilotService.Bridge.cs: OnContentReceived, OnToolStarted, OnToolCompleted, OnImageReceived, OnReasoningReceived, OnReasoningComplete, and SyncRemoteSessions Clear+AddRange
  • WsBridgeServer.SendSessionHistoryToClient uses GetHistorySnapshot() instead of bare ToArray()
  • UnreadCount uses GetHistorySnapshot() instead of try/catch

2. Widened lock scopes to cover all mutations (Review Fix)

  • ToolExecutionCompleteEvent: Extended lock to cover all histToolMsg property mutations (MessageType, ImagePath, Caption, IsComplete, IsSuccess, Content) — previously the reference escaped the lock scope
  • ApplyReasoningUpdate: Extended lock to cover all reasoningMsg property mutations (ReasoningId, IsComplete, IsCollapsed, Timestamp, MergeReasoningContent)
  • CompleteReasoningMessages: Moved property mutations (IsComplete, IsCollapsed, Timestamp) inside the lock block — previously captured reference list was mutated outside lock

3. Server-side history cap (MaxHistoryPayloadMessages = 500)

  • SendSessionHistoryToClient enforces a server-side max regardless of client-requested limit
  • Even LoadFullRemoteHistory (limit: null) is capped to 500 messages server-side
  • Prevents unbounded WebSocket payloads

4. Retry with cap for failed history requests (Review Fix)

  • SyncRemoteSessions re-requests history for sessions that were previously requested but still have no local history
  • Added retry cap (max 5 attempts) to prevent infinite retry loops — previously retried unconditionally every sync cycle
  • Prevents sessions from being permanently stuck showing "Loading conversation…"

5. MessageCount consistency (Review Fix)

  • OnContentReceived and OnToolStarted now update session.MessageCount after History.Add inside the lock — previously only OnReasoningReceived updated it
  • BuildSessionsListPayload and initial connect check use MessageCount (int field) instead of History.Count (list enumeration)

6. Avoid cross-thread History.Count reads

  • Bridge event handlers use MessageCount field instead of History.Count to avoid racing with background writes

Review History

  • Round 1: 6 findings identified (TOCTOU, unbounded retry, MessageCount desync, pagination, Events.cs locking, missing MessageCount)
  • Round 2: All 6 original findings fixed; 5 new findings identified (lock scope too narrow in 3 handlers, unbounded retry persisted, MessageCount still stale in 2 handlers)
  • Round 3: Applied fixes for all 5 remaining issues in commit Fix 5 thread-safety issues

@PureWeen PureWeen force-pushed the bridge-history-hardening branch from 62625b0 to 1cd7b30 Compare February 26, 2026 01:10
@PureWeen PureWeen force-pushed the bridge-history-hardening branch 4 times, most recently from 905018b to 0083ce7 Compare February 27, 2026 04:40
@PureWeen PureWeen changed the title Harden bridge History thread-safety, server-side cap, and retry Harden bridge History thread-safety with comprehensive HistoryLock coverage Feb 27, 2026
PureWeen and others added 4 commits February 26, 2026 23:00
1. Thread-safe History access:
   - Add HistoryLock + GetHistorySnapshot() to AgentSessionInfo
   - Lock all background-thread writes in Bridge.cs event handlers
     (OnContentReceived, OnToolStarted, OnToolCompleted, OnImageReceived,
     OnReasoningReceived, OnReasoningComplete, SyncRemoteSessions Clear+AddRange)
   - WsBridgeServer uses GetHistorySnapshot() instead of bare ToArray()
   - UnreadCount uses GetHistorySnapshot() instead of try/catch

2. Server-side history cap (MaxHistoryPayloadMessages = 500):
   - SendSessionHistoryToClient enforces max regardless of client-requested limit
   - Prevents any caller (including LoadFullRemoteHistory) from sending unbounded payloads

3. Retry on failed history requests:
   - SyncRemoteSessions now re-requests history for sessions that were previously
     requested but still have no local history (server snapshot may have failed)
   - Prevents sessions from being permanently stuck with no history

4. Use MessageCount instead of History.Count in WsBridgeServer to avoid
   cross-thread reads of the list for simple count checks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s mutations, reset retry cap

- Lock ApplyReasoningUpdate to protect History reads and Add under HistoryLock
- Lock CompleteReasoningMessages History.Where query under HistoryLock
- Lock tool dedup read (FirstOrDefault) and History.Add for tool messages
- Lock ToolExecutionCompleteEvent History.LastOrDefault lookup
- Lock History.Add in remote-mode SendPromptAsync
- Lock History.Add in AbortSessionAsync with MessageCount update
- Lock History.Clear/MessageCount in ClearHistory
- Clear _requestedHistorySessions entry on successful history sync (fix retry cap never resetting)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…MessageCount

- ToolExecutionCompleteEvent: extend lock to cover all histToolMsg mutations
  (was: reference escaped lock scope leaving all property writes unprotected)
- ApplyReasoningUpdate: extend lock to cover all reasoningMsg property mutations
- CompleteReasoningMessages: move property mutations inside lock with DB content snapshot
- SyncRemoteSessions: add retry cap (max 5) — was retrying indefinitely every cycle
- OnContentReceived/OnToolStarted: update MessageCount after History.Add

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d, SyncRemoteSessions

- FlushCurrentResponse: wrap History.LastOrDefault + History.Add + MessageCount in HistoryLock
- CompleteResponse: wrap History.Add + MessageCount + LastReadMessageCount in HistoryLock
- Bridge OnTurnEnd: wrap History.LastOrDefault + IsComplete/Model mutations in HistoryLock
- SyncRemoteSessions: move messages.Count >= History.Count check inside HistoryLock to close TOCTOU

These paths ran on UI/InvokeOnUI thread but accessed History without the lock that background
bridge callbacks (OnContentReceived, OnToolStarted, etc.) hold — allowing concurrent List<T>
corruption.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the bridge-history-hardening branch from 9255aad to 76ef9c6 Compare February 27, 2026 05:06
@PureWeen PureWeen closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant