Skip to content

fix: dedupe Claude sessions imported by Codex Desktop#220

Merged
vakovalskii merged 1 commit into
vakovalskii:mainfrom
NovakPAai:fix/dedupe-codex-claude-imports
May 25, 2026
Merged

fix: dedupe Claude sessions imported by Codex Desktop#220
vakovalskii merged 1 commit into
vakovalskii:mainfrom
NovakPAai:fix/dedupe-codex-claude-imports

Conversation

@NovakPAai
Copy link
Copy Markdown
Collaborator

Summary

Codex Desktop >= 0.133.0-alpha.1 introduced an External Agent Session Imports feature that ingests Claude Code sessions from ~/.claude/projects/**/*.jsonl into ~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl and registers them in ~/.codex/session_index.jsonl with a fresh UUIDv7 thread id. The original Claude file stays in place.

Codbash was loading both copies, so every imported conversation appeared twice in the dashboard — once correctly attributed to Claude (original file, UUIDv4 id) and once incorrectly attributed to Codex (the import, UUIDv7 id). On my machine that produced 50 duplicates (56 "Codex" sessions of which only 6 were genuinely Codex-native).

Fix

  • New helper parseCodexExternalImports(codexDir) reads ~/.codex/external_agent_session_imports.json and returns a Set of imported_thread_ids whose source_path still exists on disk.
  • scanCodexSessions() skips those ids in both the history.jsonl loop and the rollout-files walk.
  • If the source Claude file has been deleted, the Codex copy is retained — no history is lost.
  • The ledger file is added to _sessionsNeedRescan / _updateScanMarkers so a two-phase Codex write (rollout/index first, ledger a tick later) cannot leave a stale dedup view between polls.

Identifying an import on disk

Imported rollouts have a distinctive meta header:

"originator": "Codex Desktop",
"source": "vscode",
"turn_id": "external-import-turn-1"

But the ledger is the authoritative source — using it avoids parsing every rollout to detect imports.

Before / after

before after
codex sessions 56 (50 imports + 6 native) 6 (native only)
claude sessions 226 226
total 605 557
imported id 019e55cf-3089-... shown as Codex hidden (Claude original ee956bf6-... is shown instead)

Test plan

  • node -c src/data.js — syntax
  • Manual: node bin/cli.js run, /api/sessions counts before/after match expected dedup
  • Sample id 019e55cf-3089-7031-bc64-fd29a650def8 no longer appears under Codex; original ee956bf6-b5f6-4571-b1d0-019ce2874122 still appears under Claude
  • Test on a machine with no ~/.codex/external_agent_session_imports.json — fallback (helper returns empty set, all Codex sessions shown as before)
  • Test on a machine where a source Claude file has been deleted — Codex copy is kept (not skipped)

Codex Desktop >= 0.133.0-alpha.1 ingests Claude Code sessions from
~/.claude/projects/**/*.jsonl into ~/.codex/sessions/ and registers
them in session_index.jsonl with a fresh UUIDv7 thread id. The
original Claude file stays in place, so the same conversation
appeared twice in codbash — once correctly as a Claude session and
once incorrectly as a Codex session.

Read ~/.codex/external_agent_session_imports.json and skip imported
rollouts whose source path still exists on disk. When the source has
been deleted, retain the Codex copy so no history is lost.

Also watch the ledger file in the rescan-change detector to close a
race where Codex writes the rollout/index slightly before the ledger
and a poll fires in between.
Copy link
Copy Markdown
Owner

@vakovalskii vakovalskii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ Targeted dedup fix with the right defensive posture.

Verified:

  • node -c src/data.js clean
  • Full test suite still 127/0/2 pass — nothing regresses
  • CI green 6/6

Spot-checks:

  • parseCodexExternalImports is defensive at every step: missing file → empty Set, unreadable → empty Set, malformed JSON → empty Set, non-array records → empty Set
  • Conservative no-data-loss policy: only skip Codex copy when the source Claude file still exists. If user deleted the original, the Codex import becomes the sole surviving copy
  • Skip applied at both Codex code paths (history.jsonl loop and rollout files walk) — full coverage
  • Cache invalidation correctly watches the ledger mtime+size + handles the file-disappeared case (else if (_codexImportsLedgerMtime !== 0 || _codexImportsLedgerSize !== 0))
  • Uses the ledger as authoritative — avoids parsing every rollout to detect the Codex Desktop originator + external-import-turn-1 markers 👍

Real-world impact reported in the PR (56 → 6 Codex sessions, 605 → 557 total) suggests this will be felt immediately by anyone running both Codex Desktop and Claude Code.

@vakovalskii vakovalskii merged commit 4ba985d into vakovalskii:main May 25, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants