From f26b03944fd31b9537361739885758d99c89e0ae Mon Sep 17 00:00:00 2001
From: Stephen Hellicar <shellicar@gmail.com>
Date: Mon, 6 Apr 2026 17:04:02 +1000
Subject: [PATCH 1/5] Add architecture refactor plan

Captures the agreed design direction: splitting Conversation from
ConversationStore, extracting RequestBuilder, TerminalEditor, AgentMessageHandler,
and the display components. Each substep ships independently with the CLI working.
Includes estimates, risk ratings, and test ROI notes.
---
 .claude/plans/architecture-refactor.md | 200 +++++++++++++++++++++++++
 1 file changed, 200 insertions(+)
 create mode 100644 .claude/plans/architecture-refactor.md

diff --git a/.claude/plans/architecture-refactor.md b/.claude/plans/architecture-refactor.md
new file mode 100644
index 0000000..24b375a
--- /dev/null
+++ b/.claude/plans/architecture-refactor.md
@@ -0,0 +1,200 @@
+# Architecture Refactor Plan
+
+## Context
+
+Identified 2026-04-06. The SDK and CLI work correctly but several classes carry more
+responsibilities than they should, which makes them harder to extend and impossible to
+unit test in isolation. This plan captures the agreed design direction and the ordered
+steps to get there.
+
+The core invariant: **the CLI+SDK must work at every commit**. Each substep is a
+complete, shippable unit. If a substep goes wrong, revert it — all previous substeps
+remain intact. Nothing is abandoned; rollback cost is always exactly one step.
+
+---
+
+## Target Design
+
+### SDK
+
+| Class | Responsibility |
+|-------|---------------|
+| `Conversation` | Pure data: ordered messages, role alternation, compaction trim. No I/O. |
+| `ConversationStore` | Load/save a `Conversation` from a JSONL file. |
+| `RequestBuilder` | Pure function: `(RunAgentQuery, messages) → BetaMessageStreamParams`. |
+| `StreamProcessor` | Raw Anthropic stream events → typed blocks. Already `MessageStream`, keep as-is. |
+| `ToolRunner` | Validate input, call handler, transform result. |
+| `AgentLoop` | Orchestrate: Conversation + RequestBuilder + StreamProcessor + ToolRunner. The loop itself. |
+
+Auth is already well-decomposed. No changes planned there.
+
+### CLI
+
+| Class | Responsibility |
+|-------|---------------|
+| `TerminalEditor` | Multi-line text input, cursor, word navigation. No rendering, no stream knowledge. |
+| `ConversationDisplay` | Sealed block system, streaming display, flush-to-scroll. |
+| `ToolApprovalWidget` | Pending tools list, keyboard-driven approval, async promise. |
+| `StatusBar` | Running token/cost totals. Renders to a single line. |
+| `CommandMode` | `/` command handling, clipboard, attachments. |
+| `ScreenCoordinator` | Owns the physical screen. Routes keyboard events. Assembles render output. (Slimmed `AppLayout`.) |
+| `AgentMessageHandler` | Maps `SdkMessage` events → display component calls. Extracted from `runAgent.ts`. |
+| `PermissionPolicy` | Auto-approve/deny logic. Currently split across `permissions.ts` and `runAgent.ts`. |
+
+---
+
+## Steps
+
+### Prerequisite: Test framework
+Set up vitest in the monorepo (workspace config, turbo pipeline, per-package config).
+**Estimate: 1 — do this alongside step 1a.**
+
+---
+
+### Step 1 — Split `Conversation` from `ConversationStore`
+
+**1a — Extract `Conversation` (pure data)**
+- `Conversation`: holds `#items`, `push()`, `remove()`, `messages` getter, role-alternation,
+  compaction trim. No file I/O.
+- `ConversationStore`: wraps `Conversation`, loads from JSONL in constructor, calls save on
+  every mutation.
+- `AgentRun` receives `Conversation` instead of `ConversationHistory`.
+- External API unchanged (`historyFile` option still works).
+- **Estimate: 1 | Risk: Low** — single failure mode: forgetting to call save in
+  `ConversationStore.push()`. Obvious, fast to catch.
+- **Tests: `Conversation` becomes fully unit-testable** — role alternation, compaction clear,
+  push/remove, trim logic. All pure assertions, no mocks needed. +1 for tests.
+
+**1b — History replay in TUI**
+- On startup, walk the messages loaded from file and replay them into `ConversationDisplay`
+  so prior turns are visible.
+- Requires decisions: what to show for compaction blocks, tool use/result pairs, thinking
+  blocks. Get this wrong → confusing display, not a crash.
+- Depends on 1a (cleanly) but is a separate commit.
+- **Estimate: 2 | Risk: Medium** — decisions about what to display, runtime-visible.
+- **Tests: partial** — the parse-to-display-events function is testable if extracted. +1.
+
+---
+
+### Step 2 — Extract `RequestBuilder`
+
+- Extract `AgentRun.#getMessageStream` params-building into a pure function/class:
+  tools mapping, betas resolution, context_management config, system prompts, cache_control,
+  thinking flag → `BetaMessageStreamParams`.
+- `#getMessageStream` keeps the stream call, delegates params to `RequestBuilder`.
+- No UI impact, no external API change.
+- **Estimate: 1 | Risk: Very Low** — TypeScript catches missed fields. API error on first
+  run if anything wrong. Fast to diagnose.
+- **Tests: clean** — one test per beta feature: "given compact enabled, request has compact
+  edit". Pure assertions. +1 for tests.
+
+---
+
+### Step 3 — Extract `TerminalEditor` from `AppLayout`
+
+Do these in order. Each substep compiles and runs standalone.
+
+**3a — Extract editor state**
+- Move `#editorLines`, `#cursorLine`, `#cursorCol` to `TerminalEditor`.
+- Expose accessors. `AppLayout` holds `this.#editor` and reads state from it.
+- Key handling and rendering stay in `AppLayout` for now.
+- **Estimate: 1 | Risk: Low** — TypeScript finds every missed reference at compile time.
+- **Tests: marginal at this point** — state is there but key handling isn't yet.
+
+**3b — Move key handling into `TerminalEditor.handleKey()`**
+- `AppLayout.handleKey` routes: if editor mode → `this.#editor.handleKey(key)`.
+- Must be done atomically — extract AND update call site in same commit. No gap where
+  neither has the logic.
+- Edge cases: backspace at col 0 merges lines, Enter mid-line splits, multi-line paste,
+  word jump at line boundary. These are where regressions hide.
+- **Estimate: 2 | Risk: Medium-High** — runtime edge case regressions, caught by typing.
+- **Tests: clean and valuable** — pure state machine. Test every edge case: backspace at
+  line start, Enter mid-line, Ctrl+Left, paste. High confidence. +2 for tests.
+
+**3c — Move editor rendering into `TerminalEditor.render(cols)`**
+- `AppLayout.render()` calls `this.#editor.render(cols)` for the editor region.
+- Visual regression if column width or ANSI cursor placement is wrong — visible immediately.
+- **Estimate: 1 | Risk: Medium** — fast feedback, obvious if wrong.
+- **Tests: partial** — render string for known input is assertable; ANSI codes noisy. +1.
+
+---
+
+### Step 4 — Extract `AgentMessageHandler` from `runAgent.ts`
+
+**4a — Stateless cases**
+- Move `message_thinking`, `message_text`, `message_compaction_start`, `message_compaction`,
+  `done`, `error`, `query_summary` into `AgentMessageHandler`.
+- Constructor takes `layout`, `logger`, model, cacheTtl.
+- `port.on('message', (msg) => handler.handle(msg))`.
+- **Estimate: 1 | Risk: Low** — straight delegations, TypeScript catches missing refs.
+- **Tests: clean** — mock layout components, assert right methods called for each message. +1.
+
+**4b — Stateful cases**
+- Move `usageBeforeTools` tracking and delta calculation.
+- Move `toolApprovalRequest` async function.
+- The invariant: capture usage at start of first tool batch, compute delta on next
+  `message_usage`, then null. Getting the reset timing wrong → wrong delta annotation.
+  Not a crash, but a wrong number on the tools block.
+- **Estimate: 1-2 | Risk: Medium** — runtime-visible wrong number, needs a tool-use
+  interaction to catch.
+- **Tests: clean** — fire a sequence of tool+usage messages, assert delta annotation
+  string. Catches the reset-timing bug. +1.
+
+---
+
+### Step 5 — Split display components out of `AppLayout`
+
+**5a — Extract `StatusBar`**
+- Move the 5 token/cost accumulators and the status line render logic.
+- `AppLayout` holds `this.#statusBar`, calls `updateUsage` and `render`.
+- **Estimate: 1 | Risk: Low** — pure state + string render. Visible immediately if wrong.
+- **Tests: clean** — given usage sequence, assert totals and rendered string. +1.
+
+**5b — Extract `ConversationDisplay`**
+- Move sealed blocks, active block, flush count, `transitionBlock`, `appendStreaming`,
+  `completeStreaming`, `appendToLastSealed`, render logic.
+- The flush-to-scroll boundary is subtle — blocks flushed to scroll are permanently written.
+  Getting `#flushedCount` wrong causes double-rendering or missing content.
+- **Estimate: 2 | Risk: Medium** — flush logic is the dangerous part, visible but confusing.
+- **Tests: partial** — state logic yes; flush-to-scroll needs mock screen infrastructure. +1-2.
+
+**5c — Extract `ToolApprovalWidget`**
+- Move pending tools list, selection, expand/collapse, keyboard handler, approval promise
+  queue (`#pendingApprovals`).
+- The async coordination — resolve functions in an array, keyboard handler pops them —
+  must move together. Splitting this across two commits creates a broken state.
+- **Estimate: 2 | Risk: Medium-High** — async approval flow, only caught during a
+  tool-use interaction.
+- **Tests: valuable** — async approval flow, cancel flow, keyboard navigation. +2.
+
+**5d — `ScreenCoordinator` cleanup**
+- By this point all logic has moved out. `AppLayout` becomes wiring + keyboard routing +
+  render assembly.
+- **Estimate: 1 | Risk: Low** — routing logic, visible immediately if wrong.
+- **Tests: marginal** — routing logic testable; screen output not. —
+
+---
+
+## Summary
+
+| Step | Estimate | Risk | Tests (additional) |
+|------|----------|------|-------------------|
+| Prereq: vitest setup | 1 | Low | — |
+| 1a Conversation split | 1 | Low | +1 |
+| 1b History replay | 2 | Medium | +1 |
+| 2 RequestBuilder | 1 | Very Low | +1 |
+| 3a Editor state | 1 | Low | — |
+| 3b Editor key handling | 2 | Medium-High | +2 |
+| 3c Editor rendering | 1 | Medium | +1 |
+| 4a MessageHandler stateless | 1 | Low | +1 |
+| 4b MessageHandler stateful | 1-2 | Medium | +1 |
+| 5a StatusBar | 1 | Low | +1 |
+| 5b ConversationDisplay | 2 | Medium | +1-2 |
+| 5c ToolApprovalWidget | 2 | Medium-High | +2 |
+| 5d ScreenCoordinator cleanup | 1 | Low | — |
+| **Total** | **17-19** | | **+12-13** |
+
+Refactoring alone: ~17-19 units. With tests written at each step: ~29-32 units.
+
+The steps with the best test ROI (high value, catches real bugs): **1a, 2, 3b, 4b**.
+Start there. The rest can follow.

From e09e46def2467be0fc45290275f31839b5b93432 Mon Sep 17 00:00:00 2001
From: Stephen Hellicar <shellicar@gmail.com>
Date: Mon, 6 Apr 2026 17:09:22 +1000
Subject: [PATCH 2/5] Update CLAUDE.md: current state and plan reference

Refresh current state to reflect main branch, completed OAuth2/system
prompts/query-summary work, and point to the architecture refactor plan.
Add plan reference in the architecture section so future sessions find it.
---
 .claude/CLAUDE.md | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index 927fa46..cd07fab 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -66,32 +66,39 @@ Every session has three phases: start, work, end.
 
 <!-- BEGIN:REPO:current-state -->
 ## Current State
-Branch: `feature/sdk-tooling` — pushed, clean working tree.
+Branch: `main` — clean. PR #182 open (`feature/architecture-refactor-plan`, docs only).
 
 Active development is in **`apps/claude-sdk-cli/`** — a TUI terminal app built on `@shellicar/claude-sdk`.
 
+**Architecture refactor planned** — see `.claude/plans/architecture-refactor.md` for the full
+step-by-step plan with estimates and risk ratings. Next step: **1a** (split `Conversation`
+from `ConversationStore`). Each substep ships independently; the CLI works at every commit.
+
 **Completed:**
 - Full cursor-aware multi-line editor (`AppLayout.ts`)
-- Clipboard text attachments via command mode (`ctrl+/` → `t` paste, `d` delete, chips in status bar)
+- Clipboard text/file attachments via command mode
 - `ConversationHistory.push(msg, {id?})` + `remove(id)` for tagged message pruning
 - `IAnthropicAgent.injectContext/removeContext` public API
-- `RunAgentQuery.thinking` + `pauseAfterCompact` options; `AnthropicBeta` enum cleanup
-- `BetaMessageParam` used directly in public interface (no more `JsonObject` casts)
+- `RunAgentQuery.thinking` + `pauseAfterCompact` + `systemPrompts` options
+- `BetaMessageParam` used directly in public interface
 - Ref tool + RefStore for large output ref-swapping
 - Tool approval flow (auto-approve/deny/prompt)
 - Compaction display with context high-water mark
-- File attachments via `f` command: three-stage clipboard path reading (pbpaste / VS Code code/file-list JXA / osascript furl), stat-first handler, `file`/`dir`/`missing` chips
+- OAuth2 authentication (`TokenRefreshingAnthropic` subclass)
+- System prompts hardcoded in `apps/claude-sdk-cli/src/systemPrompts.ts`
+- `SdkQuerySummary` — query context block shown before each API call (ℹ️ query)
 
 **In-progress / next:**
-- Skills system (`ActivateSkill`/`DeactivateSkill`) — primitives in place; timing design issue unresolved (see `docs/skills-design.md`)
-- Image attachments — `pngpaste` + clipboard image detection (deferred)
+- Architecture refactor step 1a — split `Conversation` from `ConversationStore`
+- History replay in TUI (step 1b) — show prior turns on startup
+- Vitest setup (prerequisite for unit tests, do alongside 1a)
 <!-- END:REPO:current-state -->
 
 <!-- BEGIN:REPO:architecture -->
 ## Architecture
 
 **Stack**: TypeScript, esbuild (bundler), `@anthropic-ai/sdk` (direct). pnpm monorepo with turbo. Two apps: active (`apps/claude-sdk-cli/`) and legacy (`apps/claude-cli/`).
-
+| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream`. **Refactor planned** — see `.claude/plans/architecture-refactor.md`. |
 ### Packages
 
 | Package | Role |

From ed3d1789dda7dd2d238efd28d705d23f4c2e74c1 Mon Sep 17 00:00:00 2001
From: Stephen Hellicar <shellicar@gmail.com>
Date: Mon, 6 Apr 2026 17:40:32 +1000
Subject: [PATCH 3/5] Fix corrupted architecture table in CLAUDE.md

Dangling row was inserted between Stack line and Packages heading;
original table row was left unchanged. Replace the whole block cleanly.
---
 .claude/CLAUDE.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index cd07fab..7d652b1 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -98,14 +98,14 @@ from `ConversationStore`). Each substep ships independently; the CLI works at ev
 ## Architecture
 
 **Stack**: TypeScript, esbuild (bundler), `@anthropic-ai/sdk` (direct). pnpm monorepo with turbo. Two apps: active (`apps/claude-sdk-cli/`) and legacy (`apps/claude-cli/`).
-| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream`. **Refactor planned** — see `.claude/plans/architecture-refactor.md`. |
+
 ### Packages
 
 | Package | Role |
 |---------|------|
 | `apps/claude-sdk-cli/` | **Active TUI CLI** — talks directly to `@shellicar/claude-sdk` |
 | `apps/claude-cli/` | Legacy CLI using a different SDK path (not actively developed) |
-| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream` |
+| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream`. **Refactor planned** — see `.claude/plans/architecture-refactor.md`. |
 | `packages/claude-sdk-tools/` | Tool definitions: `Find`, `ReadFile`, `Grep`, `Head`, `Tail`, `Range`, `SearchFiles`, `Pipe`, `EditFile`, `PreviewEdit`, `CreateFile`, `DeleteFile`, `DeleteDirectory`, `Exec`, `Ref` |
 | `packages/claude-core/` | Shared ANSI/terminal utilities: `sanitise`, `reflow`, `screen`, `status-line`, `viewport`, `renderer` |
 | `packages/typescript-config/` | Shared tsconfig base |

From fc8fa1b358c7b87cfeb587968816e1b03c003f64 Mon Sep 17 00:00:00 2001
From: Stephen Hellicar <shellicar@gmail.com>
Date: Mon, 6 Apr 2026 17:42:16 +1000
Subject: [PATCH 4/5] Add missing CommandMode extraction step to refactor plan

CommandMode (step 5d) was in the target design but absent from the
step list. Insert it between ToolApprovalWidget and ScreenCoordinator
cleanup. ScreenCoordinator becomes 5e. Totals updated.
---
 .claude/plans/architecture-refactor.md | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/.claude/plans/architecture-refactor.md b/.claude/plans/architecture-refactor.md
index 24b375a..589c884 100644
--- a/.claude/plans/architecture-refactor.md
+++ b/.claude/plans/architecture-refactor.md
@@ -167,7 +167,14 @@ Do these in order. Each substep compiles and runs standalone.
   tool-use interaction.
 - **Tests: valuable** — async approval flow, cancel flow, keyboard navigation. +2.
 
-**5d — `ScreenCoordinator` cleanup**
+**5d — Extract `CommandMode`**
+- Move `#commandMode`, `#previewMode` state, `#handleCommandKey`, `#buildCommandRow`,
+  and the clipboard/attachment logic (`t`, `f`, `d`, `← →` commands).
+- The clipboard reads are async; the attachment store interaction must move with it.
+- **Estimate: 2 | Risk: Medium** — async clipboard flow, attachment state coordination.
+- **Tests: partial** — command dispatch logic testable; clipboard reads need mocking. +1.
+
+**5e — `ScreenCoordinator` cleanup**
 - By this point all logic has moved out. `AppLayout` becomes wiring + keyboard routing +
   render assembly.
 - **Estimate: 1 | Risk: Low** — routing logic, visible immediately if wrong.
@@ -191,10 +198,11 @@ Do these in order. Each substep compiles and runs standalone.
 | 5a StatusBar | 1 | Low | +1 |
 | 5b ConversationDisplay | 2 | Medium | +1-2 |
 | 5c ToolApprovalWidget | 2 | Medium-High | +2 |
-| 5d ScreenCoordinator cleanup | 1 | Low | — |
-| **Total** | **17-19** | | **+12-13** |
+| 5d CommandMode | 2 | Medium | +1 |
+| 5e ScreenCoordinator cleanup | 1 | Low | — |
+| **Total** | **19-21** | | **+13-14** |
 
-Refactoring alone: ~17-19 units. With tests written at each step: ~29-32 units.
+Refactoring alone: ~19-21 units. With tests written at each step: ~32-35 units.
 
 The steps with the best test ROI (high value, catches real bugs): **1a, 2, 3b, 4b**.
 Start there. The rest can follow.

From d3ab8dc343d175d5bc0bd834b9ad8380c2ca2f65 Mon Sep 17 00:00:00 2001
From: Stephen Hellicar <shellicar@gmail.com>
Date: Mon, 6 Apr 2026 17:43:04 +1000
Subject: [PATCH 5/5] =?UTF-8?q?Remove=20vitest=20prerequisite=20=E2=80=94?=
 =?UTF-8?q?=20already=20present=20in=20the=20monorepo?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .claude/plans/architecture-refactor.md | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/.claude/plans/architecture-refactor.md b/.claude/plans/architecture-refactor.md
index 589c884..39fdd21 100644
--- a/.claude/plans/architecture-refactor.md
+++ b/.claude/plans/architecture-refactor.md
@@ -45,12 +45,6 @@ Auth is already well-decomposed. No changes planned there.
 
 ## Steps
 
-### Prerequisite: Test framework
-Set up vitest in the monorepo (workspace config, turbo pipeline, per-package config).
-**Estimate: 1 — do this alongside step 1a.**
-
----
-
 ### Step 1 — Split `Conversation` from `ConversationStore`
 
 **1a — Extract `Conversation` (pure data)**
@@ -186,7 +180,6 @@ Do these in order. Each substep compiles and runs standalone.
 
 | Step | Estimate | Risk | Tests (additional) |
 |------|----------|------|-------------------|
-| Prereq: vitest setup | 1 | Low | — |
 | 1a Conversation split | 1 | Low | +1 |
 | 1b History replay | 2 | Medium | +1 |
 | 2 RequestBuilder | 1 | Very Low | +1 |