diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index 927fa46..7d652b1 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -66,25 +66,32 @@ Every session has three phases: start, work, end.
 
 <!-- BEGIN:REPO:current-state -->
 ## Current State
-Branch: `feature/sdk-tooling` — pushed, clean working tree.
+Branch: `main` — clean. PR #182 open (`feature/architecture-refactor-plan`, docs only).
 
 Active development is in **`apps/claude-sdk-cli/`** — a TUI terminal app built on `@shellicar/claude-sdk`.
 
+**Architecture refactor planned** — see `.claude/plans/architecture-refactor.md` for the full
+step-by-step plan with estimates and risk ratings. Next step: **1a** (split `Conversation`
+from `ConversationStore`). Each substep ships independently; the CLI works at every commit.
+
 **Completed:**
 - Full cursor-aware multi-line editor (`AppLayout.ts`)
-- Clipboard text attachments via command mode (`ctrl+/` → `t` paste, `d` delete, chips in status bar)
+- Clipboard text/file attachments via command mode
 - `ConversationHistory.push(msg, {id?})` + `remove(id)` for tagged message pruning
 - `IAnthropicAgent.injectContext/removeContext` public API
-- `RunAgentQuery.thinking` + `pauseAfterCompact` options; `AnthropicBeta` enum cleanup
-- `BetaMessageParam` used directly in public interface (no more `JsonObject` casts)
+- `RunAgentQuery.thinking` + `pauseAfterCompact` + `systemPrompts` options
+- `BetaMessageParam` used directly in public interface
 - Ref tool + RefStore for large output ref-swapping
 - Tool approval flow (auto-approve/deny/prompt)
 - Compaction display with context high-water mark
-- File attachments via `f` command: three-stage clipboard path reading (pbpaste / VS Code code/file-list JXA / osascript furl), stat-first handler, `file`/`dir`/`missing` chips
+- OAuth2 authentication (`TokenRefreshingAnthropic` subclass)
+- System prompts hardcoded in `apps/claude-sdk-cli/src/systemPrompts.ts`
+- `SdkQuerySummary` — query context block shown before each API call (ℹ️ query)
 
 **In-progress / next:**
-- Skills system (`ActivateSkill`/`DeactivateSkill`) — primitives in place; timing design issue unresolved (see `docs/skills-design.md`)
-- Image attachments — `pngpaste` + clipboard image detection (deferred)
+- Architecture refactor step 1a — split `Conversation` from `ConversationStore`
+- History replay in TUI (step 1b) — show prior turns on startup
+- Vitest setup (prerequisite for unit tests, do alongside 1a)
 <!-- END:REPO:current-state -->
 
 <!-- BEGIN:REPO:architecture -->
@@ -98,7 +105,7 @@ Active development is in **`apps/claude-sdk-cli/`** — a TUI terminal app built
 |---------|------|
 | `apps/claude-sdk-cli/` | **Active TUI CLI** — talks directly to `@shellicar/claude-sdk` |
 | `apps/claude-cli/` | Legacy CLI using a different SDK path (not actively developed) |
-| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream` |
+| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream`. **Refactor planned** — see `.claude/plans/architecture-refactor.md`. |
 | `packages/claude-sdk-tools/` | Tool definitions: `Find`, `ReadFile`, `Grep`, `Head`, `Tail`, `Range`, `SearchFiles`, `Pipe`, `EditFile`, `PreviewEdit`, `CreateFile`, `DeleteFile`, `DeleteDirectory`, `Exec`, `Ref` |
 | `packages/claude-core/` | Shared ANSI/terminal utilities: `sanitise`, `reflow`, `screen`, `status-line`, `viewport`, `renderer` |
 | `packages/typescript-config/` | Shared tsconfig base |
diff --git a/.claude/plans/architecture-refactor.md b/.claude/plans/architecture-refactor.md
new file mode 100644
index 0000000..39fdd21
--- /dev/null
+++ b/.claude/plans/architecture-refactor.md
@@ -0,0 +1,201 @@
+# Architecture Refactor Plan
+
+## Context
+
+Identified 2026-04-06. The SDK and CLI work correctly but several classes carry more
+responsibilities than they should, which makes them harder to extend and impossible to
+unit test in isolation. This plan captures the agreed design direction and the ordered
+steps to get there.
+
+The core invariant: **the CLI+SDK must work at every commit**. Each substep is a
+complete, shippable unit. If a substep goes wrong, revert it — all previous substeps
+remain intact. Nothing is abandoned; rollback cost is always exactly one step.
+
+---
+
+## Target Design
+
+### SDK
+
+| Class | Responsibility |
+|-------|---------------|
+| `Conversation` | Pure data: ordered messages, role alternation, compaction trim. No I/O. |
+| `ConversationStore` | Load/save a `Conversation` from a JSONL file. |
+| `RequestBuilder` | Pure function: `(RunAgentQuery, messages) → BetaMessageStreamParams`. |
+| `StreamProcessor` | Raw Anthropic stream events → typed blocks. Already `MessageStream`, keep as-is. |
+| `ToolRunner` | Validate input, call handler, transform result. |
+| `AgentLoop` | Orchestrate: Conversation + RequestBuilder + StreamProcessor + ToolRunner. The loop itself. |
+
+Auth is already well-decomposed. No changes planned there.
+
+### CLI
+
+| Class | Responsibility |
+|-------|---------------|
+| `TerminalEditor` | Multi-line text input, cursor, word navigation. No rendering, no stream knowledge. |
+| `ConversationDisplay` | Sealed block system, streaming display, flush-to-scroll. |
+| `ToolApprovalWidget` | Pending tools list, keyboard-driven approval, async promise. |
+| `StatusBar` | Running token/cost totals. Renders to a single line. |
+| `CommandMode` | `/` command handling, clipboard, attachments. |
+| `ScreenCoordinator` | Owns the physical screen. Routes keyboard events. Assembles render output. (Slimmed `AppLayout`.) |
+| `AgentMessageHandler` | Maps `SdkMessage` events → display component calls. Extracted from `runAgent.ts`. |
+| `PermissionPolicy` | Auto-approve/deny logic. Currently split across `permissions.ts` and `runAgent.ts`. |
+
+---
+
+## Steps
+
+### Step 1 — Split `Conversation` from `ConversationStore`
+
+**1a — Extract `Conversation` (pure data)**
+- `Conversation`: holds `#items`, `push()`, `remove()`, `messages` getter, role-alternation,
+  compaction trim. No file I/O.
+- `ConversationStore`: wraps `Conversation`, loads from JSONL in constructor, calls save on
+  every mutation.
+- `AgentRun` receives `Conversation` instead of `ConversationHistory`.
+- External API unchanged (`historyFile` option still works).
+- **Estimate: 1 | Risk: Low** — single failure mode: forgetting to call save in
+  `ConversationStore.push()`. Obvious, fast to catch.
+- **Tests: `Conversation` becomes fully unit-testable** — role alternation, compaction clear,
+  push/remove, trim logic. All pure assertions, no mocks needed. +1 for tests.
+
+**1b — History replay in TUI**
+- On startup, walk the messages loaded from file and replay them into `ConversationDisplay`
+  so prior turns are visible.
+- Requires decisions: what to show for compaction blocks, tool use/result pairs, thinking
+  blocks. Get this wrong → confusing display, not a crash.
+- Depends on 1a (cleanly) but is a separate commit.
+- **Estimate: 2 | Risk: Medium** — decisions about what to display, runtime-visible.
+- **Tests: partial** — the parse-to-display-events function is testable if extracted. +1.
+
+---
+
+### Step 2 — Extract `RequestBuilder`
+
+- Extract `AgentRun.#getMessageStream` params-building into a pure function/class:
+  tools mapping, betas resolution, context_management config, system prompts, cache_control,
+  thinking flag → `BetaMessageStreamParams`.
+- `#getMessageStream` keeps the stream call, delegates params to `RequestBuilder`.
+- No UI impact, no external API change.
+- **Estimate: 1 | Risk: Very Low** — TypeScript catches missed fields. API error on first
+  run if anything wrong. Fast to diagnose.
+- **Tests: clean** — one test per beta feature: "given compact enabled, request has compact
+  edit". Pure assertions. +1 for tests.
+
+---
+
+### Step 3 — Extract `TerminalEditor` from `AppLayout`
+
+Do these in order. Each substep compiles and runs standalone.
+
+**3a — Extract editor state**
+- Move `#editorLines`, `#cursorLine`, `#cursorCol` to `TerminalEditor`.
+- Expose accessors. `AppLayout` holds `this.#editor` and reads state from it.
+- Key handling and rendering stay in `AppLayout` for now.
+- **Estimate: 1 | Risk: Low** — TypeScript finds every missed reference at compile time.
+- **Tests: marginal at this point** — state is there but key handling isn't yet.
+
+**3b — Move key handling into `TerminalEditor.handleKey()`**
+- `AppLayout.handleKey` routes: if editor mode → `this.#editor.handleKey(key)`.
+- Must be done atomically — extract AND update call site in same commit. No gap where
+  neither has the logic.
+- Edge cases: backspace at col 0 merges lines, Enter mid-line splits, multi-line paste,
+  word jump at line boundary. These are where regressions hide.
+- **Estimate: 2 | Risk: Medium-High** — runtime edge case regressions, caught by typing.
+- **Tests: clean and valuable** — pure state machine. Test every edge case: backspace at
+  line start, Enter mid-line, Ctrl+Left, paste. High confidence. +2 for tests.
+
+**3c — Move editor rendering into `TerminalEditor.render(cols)`**
+- `AppLayout.render()` calls `this.#editor.render(cols)` for the editor region.
+- Visual regression if column width or ANSI cursor placement is wrong — visible immediately.
+- **Estimate: 1 | Risk: Medium** — fast feedback, obvious if wrong.
+- **Tests: partial** — render string for known input is assertable; ANSI codes noisy. +1.
+
+---
+
+### Step 4 — Extract `AgentMessageHandler` from `runAgent.ts`
+
+**4a — Stateless cases**
+- Move `message_thinking`, `message_text`, `message_compaction_start`, `message_compaction`,
+  `done`, `error`, `query_summary` into `AgentMessageHandler`.
+- Constructor takes `layout`, `logger`, model, cacheTtl.
+- `port.on('message', (msg) => handler.handle(msg))`.
+- **Estimate: 1 | Risk: Low** — straight delegations, TypeScript catches missing refs.
+- **Tests: clean** — mock layout components, assert right methods called for each message. +1.
+
+**4b — Stateful cases**
+- Move `usageBeforeTools` tracking and delta calculation.
+- Move `toolApprovalRequest` async function.
+- The invariant: capture usage at start of first tool batch, compute delta on next
+  `message_usage`, then null. Getting the reset timing wrong → wrong delta annotation.
+  Not a crash, but a wrong number on the tools block.
+- **Estimate: 1-2 | Risk: Medium** — runtime-visible wrong number, needs a tool-use
+  interaction to catch.
+- **Tests: clean** — fire a sequence of tool+usage messages, assert delta annotation
+  string. Catches the reset-timing bug. +1.
+
+---
+
+### Step 5 — Split display components out of `AppLayout`
+
+**5a — Extract `StatusBar`**
+- Move the 5 token/cost accumulators and the status line render logic.
+- `AppLayout` holds `this.#statusBar`, calls `updateUsage` and `render`.
+- **Estimate: 1 | Risk: Low** — pure state + string render. Visible immediately if wrong.
+- **Tests: clean** — given usage sequence, assert totals and rendered string. +1.
+
+**5b — Extract `ConversationDisplay`**
+- Move sealed blocks, active block, flush count, `transitionBlock`, `appendStreaming`,
+  `completeStreaming`, `appendToLastSealed`, render logic.
+- The flush-to-scroll boundary is subtle — blocks flushed to scroll are permanently written.
+  Getting `#flushedCount` wrong causes double-rendering or missing content.
+- **Estimate: 2 | Risk: Medium** — flush logic is the dangerous part, visible but confusing.
+- **Tests: partial** — state logic yes; flush-to-scroll needs mock screen infrastructure. +1-2.
+
+**5c — Extract `ToolApprovalWidget`**
+- Move pending tools list, selection, expand/collapse, keyboard handler, approval promise
+  queue (`#pendingApprovals`).
+- The async coordination — resolve functions in an array, keyboard handler pops them —
+  must move together. Splitting this across two commits creates a broken state.
+- **Estimate: 2 | Risk: Medium-High** — async approval flow, only caught during a
+  tool-use interaction.
+- **Tests: valuable** — async approval flow, cancel flow, keyboard navigation. +2.
+
+**5d — Extract `CommandMode`**
+- Move `#commandMode`, `#previewMode` state, `#handleCommandKey`, `#buildCommandRow`,
+  and the clipboard/attachment logic (`t`, `f`, `d`, `← →` commands).
+- The clipboard reads are async; the attachment store interaction must move with it.
+- **Estimate: 2 | Risk: Medium** — async clipboard flow, attachment state coordination.
+- **Tests: partial** — command dispatch logic testable; clipboard reads need mocking. +1.
+
+**5e — `ScreenCoordinator` cleanup**
+- By this point all logic has moved out. `AppLayout` becomes wiring + keyboard routing +
+  render assembly.
+- **Estimate: 1 | Risk: Low** — routing logic, visible immediately if wrong.
+- **Tests: marginal** — routing logic testable; screen output not. —
+
+---
+
+## Summary
+
+| Step | Estimate | Risk | Tests (additional) |
+|------|----------|------|-------------------|
+| 1a Conversation split | 1 | Low | +1 |
+| 1b History replay | 2 | Medium | +1 |
+| 2 RequestBuilder | 1 | Very Low | +1 |
+| 3a Editor state | 1 | Low | — |
+| 3b Editor key handling | 2 | Medium-High | +2 |
+| 3c Editor rendering | 1 | Medium | +1 |
+| 4a MessageHandler stateless | 1 | Low | +1 |
+| 4b MessageHandler stateful | 1-2 | Medium | +1 |
+| 5a StatusBar | 1 | Low | +1 |
+| 5b ConversationDisplay | 2 | Medium | +1-2 |
+| 5c ToolApprovalWidget | 2 | Medium-High | +2 |
+| 5d CommandMode | 2 | Medium | +1 |
+| 5e ScreenCoordinator cleanup | 1 | Low | — |
+| **Total** | **19-21** | | **+13-14** |
+
+Refactoring alone: ~19-21 units. With tests written at each step: ~32-35 units.
+
+The steps with the best test ROI (high value, catches real bugs): **1a, 2, 3b, 4b**.
+Start there. The rest can follow.