diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 927fa46..7d652b1 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -66,25 +66,32 @@ Every session has three phases: start, work, end. ## Current State -Branch: `feature/sdk-tooling` — pushed, clean working tree. +Branch: `main` — clean. PR #182 open (`feature/architecture-refactor-plan`, docs only). Active development is in **`apps/claude-sdk-cli/`** — a TUI terminal app built on `@shellicar/claude-sdk`. +**Architecture refactor planned** — see `.claude/plans/architecture-refactor.md` for the full +step-by-step plan with estimates and risk ratings. Next step: **1a** (split `Conversation` +from `ConversationStore`). Each substep ships independently; the CLI works at every commit. + **Completed:** - Full cursor-aware multi-line editor (`AppLayout.ts`) -- Clipboard text attachments via command mode (`ctrl+/` → `t` paste, `d` delete, chips in status bar) +- Clipboard text/file attachments via command mode - `ConversationHistory.push(msg, {id?})` + `remove(id)` for tagged message pruning - `IAnthropicAgent.injectContext/removeContext` public API -- `RunAgentQuery.thinking` + `pauseAfterCompact` options; `AnthropicBeta` enum cleanup -- `BetaMessageParam` used directly in public interface (no more `JsonObject` casts) +- `RunAgentQuery.thinking` + `pauseAfterCompact` + `systemPrompts` options +- `BetaMessageParam` used directly in public interface - Ref tool + RefStore for large output ref-swapping - Tool approval flow (auto-approve/deny/prompt) - Compaction display with context high-water mark -- File attachments via `f` command: three-stage clipboard path reading (pbpaste / VS Code code/file-list JXA / osascript furl), stat-first handler, `file`/`dir`/`missing` chips +- OAuth2 authentication (`TokenRefreshingAnthropic` subclass) +- System prompts hardcoded in `apps/claude-sdk-cli/src/systemPrompts.ts` +- `SdkQuerySummary` — query context block shown before each API call (ℹ️ query) **In-progress / next:** -- Skills system (`ActivateSkill`/`DeactivateSkill`) — primitives in place; timing design issue unresolved (see `docs/skills-design.md`) -- Image attachments — `pngpaste` + clipboard image detection (deferred) +- Architecture refactor step 1a — split `Conversation` from `ConversationStore` +- History replay in TUI (step 1b) — show prior turns on startup +- Vitest setup (prerequisite for unit tests, do alongside 1a) @@ -98,7 +105,7 @@ Active development is in **`apps/claude-sdk-cli/`** — a TUI terminal app built |---------|------| | `apps/claude-sdk-cli/` | **Active TUI CLI** — talks directly to `@shellicar/claude-sdk` | | `apps/claude-cli/` | Legacy CLI using a different SDK path (not actively developed) | -| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream` | +| `packages/claude-sdk/` | Anthropic SDK wrapper: `IAnthropicAgent`, `AnthropicAgent`, `AgentRun`, `ConversationHistory`, `MessageStream`. **Refactor planned** — see `.claude/plans/architecture-refactor.md`. | | `packages/claude-sdk-tools/` | Tool definitions: `Find`, `ReadFile`, `Grep`, `Head`, `Tail`, `Range`, `SearchFiles`, `Pipe`, `EditFile`, `PreviewEdit`, `CreateFile`, `DeleteFile`, `DeleteDirectory`, `Exec`, `Ref` | | `packages/claude-core/` | Shared ANSI/terminal utilities: `sanitise`, `reflow`, `screen`, `status-line`, `viewport`, `renderer` | | `packages/typescript-config/` | Shared tsconfig base | diff --git a/.claude/plans/architecture-refactor.md b/.claude/plans/architecture-refactor.md new file mode 100644 index 0000000..39fdd21 --- /dev/null +++ b/.claude/plans/architecture-refactor.md @@ -0,0 +1,201 @@ +# Architecture Refactor Plan + +## Context + +Identified 2026-04-06. The SDK and CLI work correctly but several classes carry more +responsibilities than they should, which makes them harder to extend and impossible to +unit test in isolation. This plan captures the agreed design direction and the ordered +steps to get there. + +The core invariant: **the CLI+SDK must work at every commit**. Each substep is a +complete, shippable unit. If a substep goes wrong, revert it — all previous substeps +remain intact. Nothing is abandoned; rollback cost is always exactly one step. + +--- + +## Target Design + +### SDK + +| Class | Responsibility | +|-------|---------------| +| `Conversation` | Pure data: ordered messages, role alternation, compaction trim. No I/O. | +| `ConversationStore` | Load/save a `Conversation` from a JSONL file. | +| `RequestBuilder` | Pure function: `(RunAgentQuery, messages) → BetaMessageStreamParams`. | +| `StreamProcessor` | Raw Anthropic stream events → typed blocks. Already `MessageStream`, keep as-is. | +| `ToolRunner` | Validate input, call handler, transform result. | +| `AgentLoop` | Orchestrate: Conversation + RequestBuilder + StreamProcessor + ToolRunner. The loop itself. | + +Auth is already well-decomposed. No changes planned there. + +### CLI + +| Class | Responsibility | +|-------|---------------| +| `TerminalEditor` | Multi-line text input, cursor, word navigation. No rendering, no stream knowledge. | +| `ConversationDisplay` | Sealed block system, streaming display, flush-to-scroll. | +| `ToolApprovalWidget` | Pending tools list, keyboard-driven approval, async promise. | +| `StatusBar` | Running token/cost totals. Renders to a single line. | +| `CommandMode` | `/` command handling, clipboard, attachments. | +| `ScreenCoordinator` | Owns the physical screen. Routes keyboard events. Assembles render output. (Slimmed `AppLayout`.) | +| `AgentMessageHandler` | Maps `SdkMessage` events → display component calls. Extracted from `runAgent.ts`. | +| `PermissionPolicy` | Auto-approve/deny logic. Currently split across `permissions.ts` and `runAgent.ts`. | + +--- + +## Steps + +### Step 1 — Split `Conversation` from `ConversationStore` + +**1a — Extract `Conversation` (pure data)** +- `Conversation`: holds `#items`, `push()`, `remove()`, `messages` getter, role-alternation, + compaction trim. No file I/O. +- `ConversationStore`: wraps `Conversation`, loads from JSONL in constructor, calls save on + every mutation. +- `AgentRun` receives `Conversation` instead of `ConversationHistory`. +- External API unchanged (`historyFile` option still works). +- **Estimate: 1 | Risk: Low** — single failure mode: forgetting to call save in + `ConversationStore.push()`. Obvious, fast to catch. +- **Tests: `Conversation` becomes fully unit-testable** — role alternation, compaction clear, + push/remove, trim logic. All pure assertions, no mocks needed. +1 for tests. + +**1b — History replay in TUI** +- On startup, walk the messages loaded from file and replay them into `ConversationDisplay` + so prior turns are visible. +- Requires decisions: what to show for compaction blocks, tool use/result pairs, thinking + blocks. Get this wrong → confusing display, not a crash. +- Depends on 1a (cleanly) but is a separate commit. +- **Estimate: 2 | Risk: Medium** — decisions about what to display, runtime-visible. +- **Tests: partial** — the parse-to-display-events function is testable if extracted. +1. + +--- + +### Step 2 — Extract `RequestBuilder` + +- Extract `AgentRun.#getMessageStream` params-building into a pure function/class: + tools mapping, betas resolution, context_management config, system prompts, cache_control, + thinking flag → `BetaMessageStreamParams`. +- `#getMessageStream` keeps the stream call, delegates params to `RequestBuilder`. +- No UI impact, no external API change. +- **Estimate: 1 | Risk: Very Low** — TypeScript catches missed fields. API error on first + run if anything wrong. Fast to diagnose. +- **Tests: clean** — one test per beta feature: "given compact enabled, request has compact + edit". Pure assertions. +1 for tests. + +--- + +### Step 3 — Extract `TerminalEditor` from `AppLayout` + +Do these in order. Each substep compiles and runs standalone. + +**3a — Extract editor state** +- Move `#editorLines`, `#cursorLine`, `#cursorCol` to `TerminalEditor`. +- Expose accessors. `AppLayout` holds `this.#editor` and reads state from it. +- Key handling and rendering stay in `AppLayout` for now. +- **Estimate: 1 | Risk: Low** — TypeScript finds every missed reference at compile time. +- **Tests: marginal at this point** — state is there but key handling isn't yet. + +**3b — Move key handling into `TerminalEditor.handleKey()`** +- `AppLayout.handleKey` routes: if editor mode → `this.#editor.handleKey(key)`. +- Must be done atomically — extract AND update call site in same commit. No gap where + neither has the logic. +- Edge cases: backspace at col 0 merges lines, Enter mid-line splits, multi-line paste, + word jump at line boundary. These are where regressions hide. +- **Estimate: 2 | Risk: Medium-High** — runtime edge case regressions, caught by typing. +- **Tests: clean and valuable** — pure state machine. Test every edge case: backspace at + line start, Enter mid-line, Ctrl+Left, paste. High confidence. +2 for tests. + +**3c — Move editor rendering into `TerminalEditor.render(cols)`** +- `AppLayout.render()` calls `this.#editor.render(cols)` for the editor region. +- Visual regression if column width or ANSI cursor placement is wrong — visible immediately. +- **Estimate: 1 | Risk: Medium** — fast feedback, obvious if wrong. +- **Tests: partial** — render string for known input is assertable; ANSI codes noisy. +1. + +--- + +### Step 4 — Extract `AgentMessageHandler` from `runAgent.ts` + +**4a — Stateless cases** +- Move `message_thinking`, `message_text`, `message_compaction_start`, `message_compaction`, + `done`, `error`, `query_summary` into `AgentMessageHandler`. +- Constructor takes `layout`, `logger`, model, cacheTtl. +- `port.on('message', (msg) => handler.handle(msg))`. +- **Estimate: 1 | Risk: Low** — straight delegations, TypeScript catches missing refs. +- **Tests: clean** — mock layout components, assert right methods called for each message. +1. + +**4b — Stateful cases** +- Move `usageBeforeTools` tracking and delta calculation. +- Move `toolApprovalRequest` async function. +- The invariant: capture usage at start of first tool batch, compute delta on next + `message_usage`, then null. Getting the reset timing wrong → wrong delta annotation. + Not a crash, but a wrong number on the tools block. +- **Estimate: 1-2 | Risk: Medium** — runtime-visible wrong number, needs a tool-use + interaction to catch. +- **Tests: clean** — fire a sequence of tool+usage messages, assert delta annotation + string. Catches the reset-timing bug. +1. + +--- + +### Step 5 — Split display components out of `AppLayout` + +**5a — Extract `StatusBar`** +- Move the 5 token/cost accumulators and the status line render logic. +- `AppLayout` holds `this.#statusBar`, calls `updateUsage` and `render`. +- **Estimate: 1 | Risk: Low** — pure state + string render. Visible immediately if wrong. +- **Tests: clean** — given usage sequence, assert totals and rendered string. +1. + +**5b — Extract `ConversationDisplay`** +- Move sealed blocks, active block, flush count, `transitionBlock`, `appendStreaming`, + `completeStreaming`, `appendToLastSealed`, render logic. +- The flush-to-scroll boundary is subtle — blocks flushed to scroll are permanently written. + Getting `#flushedCount` wrong causes double-rendering or missing content. +- **Estimate: 2 | Risk: Medium** — flush logic is the dangerous part, visible but confusing. +- **Tests: partial** — state logic yes; flush-to-scroll needs mock screen infrastructure. +1-2. + +**5c — Extract `ToolApprovalWidget`** +- Move pending tools list, selection, expand/collapse, keyboard handler, approval promise + queue (`#pendingApprovals`). +- The async coordination — resolve functions in an array, keyboard handler pops them — + must move together. Splitting this across two commits creates a broken state. +- **Estimate: 2 | Risk: Medium-High** — async approval flow, only caught during a + tool-use interaction. +- **Tests: valuable** — async approval flow, cancel flow, keyboard navigation. +2. + +**5d — Extract `CommandMode`** +- Move `#commandMode`, `#previewMode` state, `#handleCommandKey`, `#buildCommandRow`, + and the clipboard/attachment logic (`t`, `f`, `d`, `← →` commands). +- The clipboard reads are async; the attachment store interaction must move with it. +- **Estimate: 2 | Risk: Medium** — async clipboard flow, attachment state coordination. +- **Tests: partial** — command dispatch logic testable; clipboard reads need mocking. +1. + +**5e — `ScreenCoordinator` cleanup** +- By this point all logic has moved out. `AppLayout` becomes wiring + keyboard routing + + render assembly. +- **Estimate: 1 | Risk: Low** — routing logic, visible immediately if wrong. +- **Tests: marginal** — routing logic testable; screen output not. — + +--- + +## Summary + +| Step | Estimate | Risk | Tests (additional) | +|------|----------|------|-------------------| +| 1a Conversation split | 1 | Low | +1 | +| 1b History replay | 2 | Medium | +1 | +| 2 RequestBuilder | 1 | Very Low | +1 | +| 3a Editor state | 1 | Low | — | +| 3b Editor key handling | 2 | Medium-High | +2 | +| 3c Editor rendering | 1 | Medium | +1 | +| 4a MessageHandler stateless | 1 | Low | +1 | +| 4b MessageHandler stateful | 1-2 | Medium | +1 | +| 5a StatusBar | 1 | Low | +1 | +| 5b ConversationDisplay | 2 | Medium | +1-2 | +| 5c ToolApprovalWidget | 2 | Medium-High | +2 | +| 5d CommandMode | 2 | Medium | +1 | +| 5e ScreenCoordinator cleanup | 1 | Low | — | +| **Total** | **19-21** | | **+13-14** | + +Refactoring alone: ~19-21 units. With tests written at each step: ~32-35 units. + +The steps with the best test ROI (high value, catches real bugs): **1a, 2, 3b, 4b**. +Start there. The rest can follow.