Skip to content

Fix: Restore fallback preserves conversation history#364

Merged
PureWeen merged 6 commits intomainfrom
fix/restore-fallback-preserves-history
Mar 13, 2026
Merged

Fix: Restore fallback preserves conversation history#364
PureWeen merged 6 commits intomainfrom
fix/restore-fallback-preserves-history

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Problem

When ResumeSessionAsync fails during app restart ("Session not found" / "corrupt" / "session file" errors), the fallback path called CreateSessionAsync which created a blank session with zero messages. The old session's events.jsonl was never loaded, so all conversation history was lost.

User impact: Sessions appeared empty after app relaunch — the user would see their session restored but with 0 messages, forcing them to create duplicate sessions.

Root Cause

Introduced in PR #225 (commit 19219f1) which added the "Session not found" fallback. Worsened in PR #308 which expanded the catch conditions to include "corrupt" and "session file" errors. Neither PR added history recovery to the fallback path.

ResumeSessionAsync (the success path) correctly loads history via LoadHistoryFromDisk(sessionId). The fallback path skipped this entirely.

Fix

In the fallback path of RestorePreviousSessionsAsync:

  1. Load history from the old session's events.jsonl via LoadHistoryFromDisk(entry.SessionId) before creating the new session
  2. Inject history into the recreated session's Info.History
  3. Set message counts (MessageCount, LastReadMessageCount) so the UI shows the correct state
  4. Restore usage stats (CreatedAt, token counts) via RestoreUsageStats(entry)
  5. Sync to DB via BulkInsertAsync under the new session ID
  6. Add indicator system message: "🔄 Session recreated — conversation history recovered from previous session."

Duplicate Session Issue

The user also observed a duplicate session (session-20260311-001729) created with the same repo as the original. This was a downstream consequence: after seeing the original session restored with zero messages, the user manually created a new session. With this fix, the full history is preserved, eliminating the need for duplicates.

Tests

Added 5 structural regression tests in SessionPersistenceTests.cs:

  • RestoreFallback_LoadsHistoryFromOldSession — verifies LoadHistoryFromDisk appears before CreateSessionAsync
  • RestoreFallback_InjectsHistoryIntoRecreatedSession — verifies history injection + message count
  • RestoreFallback_RestoresUsageStats — verifies RestoreUsageStats call
  • RestoreFallback_SyncsHistoryToDatabase — verifies BulkInsertAsync call
  • RestoreFallback_AddsReconnectionIndicator — verifies system message

All 2464 tests pass.

PureWeen and others added 5 commits March 12, 2026 15:26
When ResumeSessionAsync fails with 'Session not found' / 'corrupt' /
'session file' errors, the fallback path was calling CreateSessionAsync
without recovering history from the old session's events.jsonl. This
caused sessions to appear empty after app relaunch.

The fix adds history recovery to the fallback path:
1. Load history from old session via LoadHistoryFromDisk(entry.SessionId)
2. Inject recovered messages into the new session's Info.History
3. Set MessageCount and LastReadMessageCount
4. Call RestoreUsageStats(entry) to preserve CreatedAt, token counts
5. Sync recovered history to chat DB under the new session ID
6. Add a system message indicating the session was recreated

Bug introduced in PR #225 (commit 19219f1), worsened by PR #308
(commit 72886a2) which expanded the catch conditions without adding
history recovery.

Adds 5 structural regression tests guarding the fallback path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sonl checks

Three additional bugs causing session history loss on restore:

1. RECONNECT handler didn't call SaveActiveSessionsToDisk() after updating
   SessionId, so the debounced save wrote stale IDs. On next restart,
   the stale ID's directory exists but has no events.jsonl.

2. MergeSessionEntries only deduped by SessionId (case-insensitive).
   After RECONNECT changes the ID, old persisted entries with the stale
   ID pass the check (stale != new), accumulating N+1 ghost entries per
   restart cycle.

3. Directory existence checks (Directory.Exists) were too lenient — the
   SDK creates empty session directories during ResumeSessionAsync.
   Changed to require events.jsonl file presence in both
   WriteActiveSessionsFile and RestorePreviousSessionsAsync.

Tests: 9 new regression tests (display name dedup, reconnect save guard,
events.jsonl structural checks). Entry helper default changed from
shared name to id-based to avoid false dedup in existing tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- MergeSessionEntries: only dedup persisted entries against active names,
  not against each other (fixes legitimate same-name session loss)
- WriteActiveSessionsFile: accept sessions with events.jsonl OR recently
  created dirs (< 5 min), drop stale ghost directories
- RestorePreviousSessionsAsync: remove early events.jsonl skip so
  never-used sessions reach the existing fallback path
- Fallback path: copy old events.jsonl to new session dir for durability
  across future restarts
- Fallback path: normalize stale incomplete ToolCall/Reasoning entries
  (matching ResumeSessionAsync behavior)
- Add 5 behavioral filesystem tests for sessionDirExists callback logic

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace GC-based UnobservedTaskException detection with direct task
verification. The old approach used TaskScheduler.UnobservedTaskException
(a process-global handler) + forced GC, which was susceptible to
cross-test pollution — other tests' leaked exceptions could surface
during this test's GC cycles.

New approach: await the fire-and-forget tasks, assert they completed
without faulting (proving internal exception handling works), and verify
the error return value (-1). Deterministic, no GC timing dependency.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CRITICAL: Replace File.Copy with sanitized line-by-line copy during
fallback recovery. If events.jsonl caused ResumeSessionAsync to fail
(corrupt JSON), a raw copy propagates corruption to the new session
directory, creating a retry loop on every restart. Now each line is
validated via JsonDocument.Parse — only valid JSON lines are written.

MODERATE: Guard against null DisplayName in MergeSessionEntries'
activeNames HashSet. StringComparer.OrdinalIgnoreCase.GetHashCode(null)
throws ArgumentNullException. JSON deserialization can produce null
DisplayName even though the default is empty string. Added .Where(n =>
n != null) filter.

Tests:
- Merge_NullDisplayNameInActive_DoesNotThrow
- Merge_NullDisplayNameInPersisted_DoesNotThrow
- WriteActiveSessionsFile_SanitizedCopy_Concept
- SanitizedCopy_WritesOnlyValidJsonLines
- Updated structural tests: widened search window (5000→7000 chars),
  updated File.Copy assertion to match sanitized copy comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PR #364 Review — Multi-Agent Consensus Report

PR: Fix: Restore fallback preserves conversation history
CI: ⚠️ No checks configured
Prior reviews: None
Models responding: 6 across 2 rounds (3× Sonnet/Gemini/GPT each round; Opus timed out both rounds)


Consensus Findings (2+ models agree)

🔴 CRITICAL — Corrupt events.jsonl propagated via File.Copy creates persistent retry loop

File: CopilotService.Persistence.cs:431-434
Models: 5/6 flagged this

When ResumeSessionAsync fails due to "corrupt" events.jsonl, File.Copy(oldEvents, newEvents) copies the entire file — including corruption — to the new session directory. On next restart:

  1. active-sessions.json has the NEW session ID
  2. Resume tries to load the copied corrupt file → fails again
  3. Fallback fires → creates yet another session → copies corrupt file again
  4. Each restart produces a new session entry with the same corrupt data

LoadHistoryFromDisk already handles corrupt lines gracefully (line-by-line parsing with try/catch). The copy should use the same approach: read each line, validate it parses as JSON, write only valid lines to the new file.

Fix: Replace File.Copy(oldEvents, newEvents) with a sanitizing copy that writes only valid JSONL lines.

🟡 MODERATE — Null DisplayName in activeNames HashSet → ArgumentNullException

File: CopilotService.Persistence.cs:165
Models: 3/6 flagged this

var activeNames = new HashSet<string>(active.Select(e => e.DisplayName), StringComparer.OrdinalIgnoreCase);

StringComparer.OrdinalIgnoreCase.GetHashCode(null) throws ArgumentNullException. While DisplayName defaults to "", JSON deserialization can set it to null if the JSON contains "DisplayName": null. This crashes MergeSessionEntries during startup — all persisted sessions would be lost.

Fix: active.Select(e => e.DisplayName).Where(n => n != null) or guard the Contains call at line 170.

🟡 MODERATE — MessageCount off-by-one after system message injection

File: CopilotService.Persistence.cs:446-460
Models: 3/6 flagged this

MessageCount and LastReadMessageCount are set at line 446-447 to History.Count, then a system message is added at line 460. The system message is NOT counted, so History.Count > MessageCount. The UI delta (MessageCount - LastReadMessageCount) reads as 0, so the "🔄 Session recreated" message may appear without an unread indicator — minor UX issue but technically a bug.

Fix: Set MessageCount/LastReadMessageCount after the system message, or increment both by 1 after line 460.

🟢 MINOR — SaveActiveSessionsToDisk() debounced, not immediate as comment claims

File: CopilotService.cs:2722-2725
Models: 2/6 flagged this

Comment says "Persist the new session ID immediately" but SaveActiveSessionsToDisk() uses a 2s debounce timer. If the app crashes within that window, the stale session ID is still on disk, causing the fallback path to trigger again on next restart. The debounce is probably fine in practice (the fallback handles it gracefully), but the comment is misleading.

Fix: Either call FlushSaveActiveSessionsToDisk() for a synchronous write, or update the comment to say "schedule persistence."


Non-Consensus Findings (noted but excluded)

  • History copy skipped when SDK already created events.jsonl (1/6 — Gemini)
  • _sessions.TryGetValue race after async CreateSessionAsync (1/6 — Sonnet)
  • Old session directory never cleaned up (1/6 — Gemini)

Test Coverage Assessment

Tested (structural regression guards):

  • LoadHistoryFromDisk called before CreateSessionAsync
  • ✅ History injection + MessageCount set
  • RestoreUsageStats called
  • BulkInsertAsync DB sync
  • File.Copy for events.jsonl
  • ✅ Incomplete tool/reasoning normalization
  • ✅ Display name dedup in MergeSessionEntries
  • sessionDirExists filesystem behavior

Missing tests:

  • ❌ Corrupt events.jsonl propagation (no test verifies corrupt lines aren't copied)
  • ❌ Null DisplayName in activeNames HashSet
  • ❌ MessageCount after system message injection
  • ❌ End-to-end fallback integration (tests are source-text-matching, not behavioral)

Verdict: ⚠️ Request Changes

  1. [MUST] Sanitize events.jsonl during copy — write only valid JSON lines to new file (prevents restart loop)
  2. [MUST] Guard against null DisplayName in activeNames HashSet construction
  3. [NICE] Fix MessageCount after system message
  4. [NICE] Clarify debounce comment or use synchronous flush

- Move MessageCount/LastReadMessageCount assignment AFTER system message
  injection so the '🔄 Session recreated' message is included in the count
  and triggers the unread indicator
- Remove duplicate pre-system-message MessageCount assignment
- Clarify SaveActiveSessionsToDisk comment: it's debounced (2s), not
  immediate. The fallback path handles the crash-within-window case.
- Add structural regression test verifying MessageCount is set after
  the system message, not before

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit 049b205 into main Mar 13, 2026
@PureWeen PureWeen deleted the fix/restore-fallback-preserves-history branch March 13, 2026 00:05
@dinisdopion-sys
Copy link
Copy Markdown

dinisdopion-sys commented Mar 13, 2026 via email

arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
## Problem

When `ResumeSessionAsync` fails during app restart ("Session not found"
/ "corrupt" / "session file" errors), the fallback path called
`CreateSessionAsync` which created a **blank** session with zero
messages. The old session's `events.jsonl` was never loaded, so all
conversation history was lost.

**User impact:** Sessions appeared empty after app relaunch — the user
would see their session restored but with 0 messages, forcing them to
create duplicate sessions.

## Root Cause

Introduced in PR PureWeen#225 (commit `19219f1`) which added the "Session not
found" fallback. Worsened in PR PureWeen#308 which expanded the catch conditions
to include "corrupt" and "session file" errors. Neither PR added history
recovery to the fallback path.

`ResumeSessionAsync` (the success path) correctly loads history via
`LoadHistoryFromDisk(sessionId)`. The fallback path skipped this
entirely.

## Fix

In the fallback path of `RestorePreviousSessionsAsync`:

1. **Load history** from the old session's `events.jsonl` via
`LoadHistoryFromDisk(entry.SessionId)` before creating the new session
2. **Inject history** into the recreated session's `Info.History`
3. **Set message counts** (`MessageCount`, `LastReadMessageCount`) so
the UI shows the correct state
4. **Restore usage stats** (`CreatedAt`, token counts) via
`RestoreUsageStats(entry)`
5. **Sync to DB** via `BulkInsertAsync` under the new session ID
6. **Add indicator** system message: "🔄 Session recreated — conversation
history recovered from previous session."

## Duplicate Session Issue

The user also observed a duplicate session (`session-20260311-001729`)
created with the same repo as the original. This was a downstream
consequence: after seeing the original session restored with zero
messages, the user manually created a new session. With this fix, the
full history is preserved, eliminating the need for duplicates.

## Tests

Added 5 structural regression tests in `SessionPersistenceTests.cs`:
- `RestoreFallback_LoadsHistoryFromOldSession` — verifies
`LoadHistoryFromDisk` appears before `CreateSessionAsync`
- `RestoreFallback_InjectsHistoryIntoRecreatedSession` — verifies
history injection + message count
- `RestoreFallback_RestoresUsageStats` — verifies `RestoreUsageStats`
call
- `RestoreFallback_SyncsHistoryToDatabase` — verifies `BulkInsertAsync`
call
- `RestoreFallback_AddsReconnectionIndicator` — verifies system message

All 2464 tests pass.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants