diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 1e73e1f..ef1beee 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -66,32 +66,27 @@ Every session has three phases: start, work, end. ## Current State -Branch: `main` — clean. PR #182 open (`feature/architecture-refactor-plan`, docs only). +Branch: `feature/agent-message-handler-stateful` — PR #193 open (step 4b), auto-merge set. Active development is in **`apps/claude-sdk-cli/`** — a TUI terminal app built on `@shellicar/claude-sdk`. -**Architecture refactor planned** — see `.claude/plans/architecture-refactor.md` for the full -step-by-step plan with estimates and risk ratings. Next step: **1a** (split `Conversation` -from `ConversationStore`). Each substep ships independently; the CLI works at every commit. - -**Completed:** -- Full cursor-aware multi-line editor (`AppLayout.ts`) -- Clipboard text/file attachments via command mode -- `ConversationHistory.push(msg, {id?})` + `remove(id)` for tagged message pruning -- `IAnthropicAgent.injectContext/removeContext` public API -- `RunAgentQuery.thinking` + `pauseAfterCompact` + `systemPrompts` options -- `BetaMessageParam` used directly in public interface -- Ref tool + RefStore for large output ref-swapping -- Tool approval flow (auto-approve/deny/prompt) -- Compaction display with context high-water mark -- OAuth2 authentication (`TokenRefreshingAnthropic` subclass) -- System prompts hardcoded in `apps/claude-sdk-cli/src/systemPrompts.ts` -- `SdkQuerySummary` — query context block shown before each API call (ℹ️ query) - -**In-progress / next:** -- Architecture refactor step 1a — split `Conversation` from `ConversationStore` -- History replay in TUI (step 1b) — show prior turns on startup -- Vitest setup (prerequisite for unit tests, do alongside 1a) +**Architecture refactor in progress** — see `.claude/plans/architecture-refactor.md`. +Follows a State / Renderer / ScreenCoordinator (MVVM) pattern. Each substep ships independently. + +**Completed refactor steps:** +- **1a** `Conversation` (pure data) split from `ConversationStore` (I/O) — PR #183 +- **1b** History replay into TUI on startup — PR #186 +- **2** `RequestBuilder` pure function extracted from `AgentRun` — PR #187 +- **3a** `EditorState` extracted from `AppLayout` (fields + `reset`) — PR #189 +- **3b** `EditorState.handleKey` — all editor key transitions moved out of `AppLayout` — PR #190 +- **3c** `renderEditor(state, cols): string[]` pure renderer extracted — PR #191 +- **4a** `AgentMessageHandler` stateless cases extracted from `runAgent.ts` — PR #192 +- **4b** `AgentMessageHandler` stateful cases moved in (`message_usage`, `tool_approval_request`, `tool_error`) — PR #193 (pending merge) + +**Next: step 5a** — extract `StatusState` + `renderStatus` from `AppLayout` +- Move the 5 token/cost accumulators to `StatusState` +- Move status line render logic to `renderStatus(state, cols): string` +- `AppLayout` holds `this.#statusState`, calls `renderStatus` in its render pass @@ -138,13 +133,19 @@ Full detail: `.claude/five-banana-pillars.md` | `AttachmentStore.ts` | `TextAttachment \| FileAttachment` union; SHA-256 dedup; 10 KB text cap; `addFile(path, kind, size?)` | | `clipboard.ts` | `readClipboardText()`; three-stage `readClipboardPath()` (pbpaste → VS Code code/file-list JXA → osascript furl); `looksLikePath`; `sanitiseFurlResult` | | `runAgent.ts` | Wires agent to layout: sets up tools, beta flags, event handlers | +| File | Role | +|------|------| +| `entry/main.ts` | Entry point: creates agent, layout, starts readline loop | +| `AppLayout.ts` | TUI: full cursor editor, streaming display, compaction blocks, tool approval, command mode, attachment chips | +| `AttachmentStore.ts` | `TextAttachment \| FileAttachment` union; SHA-256 dedup; 10 KB text cap; `addFile(path, kind, size?)` | +| `clipboard.ts` | `readClipboardText()`; three-stage `readClipboardPath()` (pbpaste → VS Code code/file-list JXA → osascript furl); `looksLikePath`; `sanitiseFurlResult` | +| `EditorState.ts` | Pure editor state + `handleKey(key): boolean` transitions. No rendering, no I/O. | +| `renderEditor.ts` | Pure `renderEditor(state: EditorState, cols: number): string[]` renderer. | +| `AgentMessageHandler.ts` | Maps all `SdkMessage` events → layout calls / state mutations. Extracted from `runAgent.ts`. | +| `runAgent.ts` | Wires agent to layout: sets up tools, beta flags, constructs handler, wires `port.on` | | `permissions.ts` | Tool auto-approve/deny rules | | `redact.ts` | Strips sensitive values from tool inputs before display | | `logger.ts` | Winston file logger (`claude-sdk-cli.log`) | - -### Key files in `packages/claude-sdk/src/` - -| File | Role | |------|------| | `public/interfaces.ts` | `IAnthropicAgent` abstract class (public contract) | | `public/types.ts` | `RunAgentQuery`, `SdkMessage` union, tool types | @@ -230,22 +231,16 @@ Opt-in via `shellicarMcp: true` config. Registers an in-process MCP server (`she 9. **No atomic session file writes** — `writeFileSync` is not atomic. Crash during write corrupts `.claude/cli-session`. - +- **MVVM architecture refactor** (2026-04-06): Three-layer model — State (pure data + transitions), Renderer (pure `(state, cols) → string[]`), ScreenCoordinator (owns screen, routes events, calls renderers). Pull-based: coordinator decides when to render. Plan in `.claude/plans/architecture-refactor.md`. Enables unit testing of state and render logic without terminal knowledge. +- **`previousPatchId` chaining for multi-patch edits** (2026-04-06): When making sequential edits to the same file, chain `PreviewEdit` calls using `previousPatchId`, then apply once with `EditFile`. Previewing without applying then moving to a second patch is the failure mode — only the second patch gets applied, first is silently dropped. esbuild and vitest don't catch this; it only surfaces at runtime. - **f command clipboard system** (2026-04-05): Three-stage `readClipboardPath()` — (1) pbpaste filtered by `looksLikePath`, (2) VS Code `code/file-list` JXA probe (file:// URI → POSIX path), (3) osascript `furl` filtered by `sanitiseFurlResult`. Injectable `readClipboardPathCore` for tests. `looksLikePath` is permissive (accepts bare-relative like `apps/foo/bar.ts`); `isLikelyPath` in AppLayout is strict (explicit prefixes only) and only used for the missing-chip case. `sanitiseFurlResult` rejects paths containing `:` (HFS artifacts). `f` handler is stat-first: if the file exists attach it directly; only apply `isLikelyPath` if stat fails. - **Clipboard text attachments** (2026-04-06): `ctrl+/` enters command mode; `t` reads clipboard via `pbpaste` and adds a `` block attachment; `d` removes selected chip; `← →` select chips. On `ctrl+enter` submit, attachments are folded into the prompt as `` XML blocks and cleared. - **ConversationHistory ID tagging** (2026-04-06): `push(msg, { id? })` tags messages for later removal. `remove(id)` splices the last item with matching ID. IDs are session-scoped (not persisted). Used by `IAnthropicAgent.injectContext/removeContext` for skills context management. - **IAnthropicAgent uses BetaMessageParam** (2026-04-06): `getHistory/loadHistory/injectContext` now use `BetaMessageParam` directly instead of `JsonObject` casts. `JsonObject`, `JsonValue`, `ContextMessage` types removed. `BetaMessageParam` re-exported from package index. - **thinking/pauseAfterCompact as RunAgentQuery options** (2026-04-06): Both default off. `thinking: true` adds `{ type: 'adaptive' }` to the API body. `pauseAfterCompact: true` wires into `compact_20260112.pause_after_compaction`. When `pauseAfterCompact: true` and compaction fires, the agent sends `done` with `stopReason: 'pause_turn'` — user sees the summary and resumes manually (intentional UX). - **Skills timing design issue** (2026-04-06): Documented in `docs/skills-design.md`. Calling `agent.injectContext()` from inside a tool handler merges the injected user message with the pending tool-results user message (consecutive merge policy). Resolution options documented; implementation deferred. -## Recent Decisions - -- **Structured command execution via in-process MCP** (#99) — replaced freeform Bash with a structured Exec tool served by an in-process MCP server. Glob-based auto-approve (`execAutoApprove`) with custom zero-dep glob matcher (no minimatch dependency). -- **Exec tool extracted to `@shellicar/mcp-exec`** — schema, executor, pipeline, validation rules, and ANSI stripping moved to a published package. CLI retains only `autoApprove.ts` (CLI-specific config concern). -- **ZWJ sanitisation in layout pipeline**: `sanitiseZwj` strips U+200D before `wrapLine` measures width. Terminals render ZWJ sequences as individual emojis; `string-width` assumes composed form. Stripping at the layout boundary removes the mismatch. -- **Monorepo workspace conversion**: CLI source moved to `packages/claude-cli/`. Root package is private workspace with turbo, syncpack, biome, lefthook. Turbo orchestrates build/test/type-check. syncpack enforces version consistency. `.packagename` file at root holds the active package name for scripts and pre-push hooks. -- **SDK bidirectional channel** (`packages/claude-sdk/`): New package wrapping the Anthropic API. Uses `MessagePort` for bidirectional consumer/SDK communication. Tool validation (existence + input schema) happens before approval requests are sent. Approval requests are sent in bulk; tools execute in approval-arrival order. -- **Screen utilities extracted to `claude-core`**: `sanitise`, `reflow` (wrapLine/rewrapFromSegments/computeLineSegments), `screen` (Screen interface + StdoutScreen), `status-line` (StatusLineBuilder), `viewport` (Viewport), `renderer` (Renderer) all moved from `claude-cli` to `claude-core`. `claude-cli` now imports from `@shellicar/claude-core/*`. `tsconfig.json` in claude-core requires `"types": ["node"]` for process globals with moduleResolution bundler. + diff --git a/.claude/sessions/2026-04-06.md b/.claude/sessions/2026-04-06.md index ea54212..04b8466 100644 --- a/.claude/sessions/2026-04-06.md +++ b/.claude/sessions/2026-04-06.md @@ -314,3 +314,105 @@ if (key.type !== 'ctrl+enter') return; - The editor rendering block in `AppLayout.render()` pushes: a divider, a blank line, then each editor line with prefix and cursor highlighting via `INVERSE_ON/OFF` - Dependencies to resolve: `buildDivider`, `BLOCK_PLAIN.prompt`, `EDITOR_PROMPT`, `CONTENT_INDENT` — check whether these are AppLayout-private or shareable - Result is a pure `string[]`, testable without ANSI stripping (just check the non-cursor lines contain the text, cursor line contains `INVERSE_ON`) + + +--- + +## Session continuation 5 (same day, later still) + +### Step 3c — `renderEditor` (PR #191, merged) + +New file `apps/claude-sdk-cli/src/renderEditor.ts`: +- Pure function `renderEditor(state: EditorState, cols: number): string[]` +- Returns only the editor text lines — no divider, no trailing blank line. `AppLayout` owns that chrome for all block types, so the renderer shouldn't add it. +- Handles prefix (`💬 ` for line 0, ` ` for subsequent lines), cursor highlighting via `INVERSE_ON`/`INVERSE_OFF`, end-of-line cursor space, and `wrapLine` for long lines. +- `EDITOR_PROMPT` constant removed from `AppLayout` — only consumer was the rendering loop, which moved here. + +`apps/claude-sdk-cli/test/renderEditor.spec.ts` — 11 tests covering: empty state cursor, prefix on line 0 only, cursor highlights mid-line character, cursor at EOL appends space inside inverse, multi-line indent prefix, line wrapping. + +Bananabot review: "Clean extraction, textbook renderer. YAGNI on shared constants." 🍌 + +--- + +### Step 4a — `AgentMessageHandler` stateless cases (PR #192, merged) + +New file `apps/claude-sdk-cli/src/AgentMessageHandler.ts`: +- `constructor(layout: AppLayout, log: typeof logger)` +- `public handle(msg: SdkMessage): void` +- Handles: `query_summary`, `message_thinking`, `message_text`, `message_compaction_start`, `message_compaction`, `done`, `error` +- These are all the cases that require no accumulated state — just route to a layout call. + +`runAgent.ts` changes: +- `import { AgentMessageHandler }` added +- `const handler = new AgentMessageHandler(layout, logger)` instantiated at function top +- The switch retains explicit cases for `tool_approval_request`, `tool_error`, `message_usage` (all stateful — need `usageBeforeTools`, `lastUsage`, async approval flow) +- `default: handler.handle(msg)` catches everything else + +**Known temporary regression:** `message_compaction` no longer shows the `[compacted at X/Y (Z%)]` annotation. That annotation reads `lastUsage`, which is only set by `message_usage` — a 4b case. The omission is documented in a comment in `AgentMessageHandler.ts`. Compaction only fires at 150k tokens so the regression isn't visible in normal use. + +`apps/claude-sdk-cli/test/AgentMessageHandler.spec.ts` — 16 tests using `vi.fn()` mocks cast to `AppLayout`. + +#### Bug encountered and lesson learned + +During the 4a edits I previewed two separate patches to `runAgent.ts` — one adding the import+declaration, one removing the old cases and adding `default: handler.handle(msg)` — but applied only the second without applying the first. The result compiled fine (esbuild doesn't type-check) and tests didn't catch it (vitest doesn't import `runAgent.ts` directly), so the `ReferenceError: handler is not defined` only surfaced at runtime. + +Fix: a second commit adding the missing import + `const handler` declaration. + +**Lesson:** When making sequential edits to the same file, either chain them with `previousPatchId` so they compose into a single `EditFile`, or apply each patch immediately before previewing the next. Preview-without-apply then move on is the failure mode. + +--- + +### State at end of session + +- Branch: `main`, clean, up to date (PR #192 squash-merged) +- **Next: step 4b** — move stateful message handling into `AgentMessageHandler` + - Add fields: `usageBeforeTools: SdkMessageUsage | null`, `lastUsage: SdkMessageUsage | null` + - Move `message_usage` handling in: delta token calculation, marginal cost via `calculateCost`, `appendToLastSealed('tools', ...)` annotation + - Move `tool_approval_request` in: snapshot `usageBeforeTools`, call async `toolApprovalRequest` + - Move `tool_error` in + - Constructor will need additional dependencies: `model`, `cacheTtl`, `cwd`, `store`, and a `respond` callback for posting `tool_approval_response` + - The reset timing for `usageBeforeTools = null` after a tool batch closes is the risky part — test that sequence explicitly + + +--- + +## Session continuation 6 (same day, even later) + +### Step 4b — `AgentMessageHandler` stateful cases (PR #193) + +Completes the `AgentMessageHandler` extraction. Everything that was still in `runAgent.ts`'s message switch moved in. + +**What moved into `AgentMessageHandler.ts`:** +- Helper functions `fmtBytes`, `primaryArg`, `formatRefSummary`, `formatToolSummary` (were module-level in `runAgent.ts`) +- Private fields `#lastUsage: SdkMessageUsage | null` and `#usageBeforeTools: SdkMessageUsage | null` +- `message_usage` case: computes delta tokens and marginal cost, annotates the sealed tools block via `appendToLastSealed`, resets `usageBeforeTools` +- `tool_approval_request` case: snapshots `usageBeforeTools` on first tool of batch, fires `void this.#toolApprovalRequest(msg)` +- `tool_error` case: formats the error block +- `#toolApprovalRequest` private async method: permission check, approval prompt, `respond` callback, `addPendingTool`/`removePendingTool` + +**Constructor change:** adds `opts: AgentMessageHandlerOptions` — `model`, `cacheTtl`, `cwd`, `store`, `tools`, `respond`. The `respond` callback (`(requestId, approved) => void`) abstracts `port.postMessage` so the handler doesn't hold a reference to `port`. + +**Restores** the `message_compaction` `[compacted at X/Y (Z%)]` annotation that was temporarily dropped in 4a. + +**`runAgent.ts` after this:** 89 lines. Sets up tools, calls `agent.runAgent()`, constructs `respond` + handler, wires `port.on('message', (msg) => handler.handle(msg))`. No message routing logic lives there anymore. + +**Tests:** 27 total (16 unchanged + 11 new). New tests: +- `tool_error`: transitions to tools block, streams tool name and error message +- `message_usage` without tool batch: calls `updateUsage`, no annotation +- `message_usage` delta annotation: asserts `+500` in annotation string after a tool batch +- Reset timing sequence: two independent batches annotate separately with correct deltas +- No double-snapshot: second tool in same batch doesn't reset the snapshot +- `message_compaction` annotation: present when `lastUsage` known, absent when not + +All 167 tests pass. Manual testing confirmed: tool approval flow, delta annotation, two-tool batch annotates once, `tool_error` format all working. + +--- + +### State at end of session + +- Branch: `feature/agent-message-handler-stateful`, PR #193 open, auto-merge set +- **Next: step 5a** — extract `StatusState` + `renderStatus` from `AppLayout` + - Move the 5 token/cost accumulators to `StatusState` + - Move status line render logic to `renderStatus(state, cols): string` + - `AppLayout` holds `this.#statusState`, calls `renderStatus` in its render pass + - Tests: given a usage sequence, assert state totals and render output diff --git a/apps/claude-sdk-cli/src/AgentMessageHandler.ts b/apps/claude-sdk-cli/src/AgentMessageHandler.ts index afddfd1..29fff60 100644 --- a/apps/claude-sdk-cli/src/AgentMessageHandler.ts +++ b/apps/claude-sdk-cli/src/AgentMessageHandler.ts @@ -1,26 +1,115 @@ -import type { SdkMessage } from '@shellicar/claude-sdk'; -import type { AppLayout } from './AppLayout.js'; +import { relative } from 'node:path'; +import { type AnyToolDefinition, type CacheTtl, calculateCost, type SdkMessage, type SdkMessageUsage, type SdkToolApprovalRequest } from '@shellicar/claude-sdk'; +import type { RefStore } from '@shellicar/claude-sdk-tools/RefStore'; +import type { AppLayout, PendingTool } from './AppLayout.js'; import type { logger } from './logger.js'; +import { getPermission, PermissionAction } from './permissions.js'; + +// ---- helpers (moved from runAgent.ts) ------------------------------------ + +function fmtBytes(n: number): string { + if (n >= 1024 * 1024) { + return `${(n / 1024 / 1024).toFixed(1)}mb`; + } + if (n >= 1024) { + return `${(n / 1024).toFixed(1)}kb`; + } + return `${n}b`; +} + +function primaryArg(input: Record, cwd: string): string | null { + for (const key of ['path', 'file']) { + if (typeof input[key] === 'string') { + return relative(cwd, input[key] as string) || (input[key] as string); + } + } + if (typeof input.pattern === 'string') { + return input.pattern; + } + if (typeof input.description === 'string') { + return input.description; + } + return null; +} + +function formatRefSummary(input: Record, store: RefStore): string { + const id = typeof input.id === 'string' ? input.id : ''; + if (!id) { + return 'Ref(?)'; + } + const hint = store.getHint(id) ?? id.slice(0, 8); + const content = store.get(id); + if (content === undefined) { + return `Ref(${id.slice(0, 8)}\u2026)`; + } + const sizeStr = fmtBytes(content.length); + const start = typeof input.start === 'number' ? input.start : 0; + const limit = typeof input.limit === 'number' ? input.limit : 1000; + const end = Math.min(start + limit, content.length); + return `Ref \u2190 ${hint} [${start}\u2013${end} / ${sizeStr}]`; +} + +function formatToolSummary(name: string, input: Record, cwd: string, store: RefStore): string { + if (name === 'Ref') { + return formatRefSummary(input, store); + } + if (name === 'Pipe' && Array.isArray(input.steps)) { + const steps = (input.steps as Array<{ tool?: unknown; input?: unknown }>) + .map((s) => { + const tool = typeof s.tool === 'string' ? s.tool : '?'; + const stepInput = s.input != null && typeof s.input === 'object' ? (s.input as Record) : {}; + const arg = primaryArg(stepInput, cwd); + return arg ? `${tool}(${arg})` : tool; + }) + .join(' | '); + return steps; + } + const arg = primaryArg(input, cwd); + return arg ? `${name}(${arg})` : name; +} + +// ---- types --------------------------------------------------------------- + +export interface AgentMessageHandlerOptions { + model: string; + cacheTtl: CacheTtl; + cwd: string; + store: RefStore; + tools: AnyToolDefinition[]; + respond: (requestId: string, approved: boolean) => void; +} + +// ---- class --------------------------------------------------------------- /** - * Handles the stateless SdkMessage cases: routes each message to the - * appropriate layout call. No accumulated state here. + * Handles all SdkMessage cases: routes each message to the appropriate + * layout call or state mutation. * - * Stateful cases (tool_approval_request, tool_error, message_usage) stay - * in runAgent.ts until step 4b, when usageBeforeTools tracking and the - * async tool approval flow move here too. - * - * NOTE: message_compaction currently omits the "compacted at X/Y (Z%)" - * context-usage annotation. That annotation reads lastUsage, which is - * set by message_usage — a 4b case. The annotation is restored in 4b. + * Stateless cases (query_summary, message_thinking, etc.) just delegate to + * layout. Stateful cases maintain usageBeforeTools / lastUsage to produce + * the per-tool-batch token-delta annotation on the sealed tools block. */ export class AgentMessageHandler { #layout: AppLayout; #logger: typeof logger; + #model: string; + #cacheTtl: CacheTtl; + #cwd: string; + #store: RefStore; + #tools: AnyToolDefinition[]; + #respond: (requestId: string, approved: boolean) => void; + #lastUsage: SdkMessageUsage | null = null; + #usageBeforeTools: SdkMessageUsage | null = null; - public constructor(layout: AppLayout, log: typeof logger) { + public constructor(layout: AppLayout, log: typeof logger, opts: AgentMessageHandlerOptions) { this.#layout = layout; this.#logger = log; + this.#model = opts.model; + this.#cacheTtl = opts.cacheTtl; + this.#cwd = opts.cwd; + this.#store = opts.store; + this.#tools = opts.tools; + this.#respond = opts.respond; } public handle(msg: SdkMessage): void { @@ -28,7 +117,7 @@ export class AgentMessageHandler { case 'query_summary': { const parts = [`${msg.systemPrompts} system`, `${msg.userMessages} user`, `${msg.assistantMessages} assistant`, ...(msg.thinkingBlocks > 0 ? [`${msg.thinkingBlocks} thinking`] : [])]; this.#layout.transitionBlock('meta'); - this.#layout.appendStreaming(parts.join(' · ')); + this.#layout.appendStreaming(parts.join(' \u00b7 ')); break; } case 'message_thinking': @@ -42,10 +131,56 @@ export class AgentMessageHandler { case 'message_compaction_start': this.#layout.transitionBlock('compaction'); break; - case 'message_compaction': + case 'message_compaction': { this.#layout.transitionBlock('compaction'); this.#layout.appendStreaming(msg.summary); + if (this.#lastUsage) { + const used = this.#lastUsage.inputTokens + this.#lastUsage.cacheCreationTokens + this.#lastUsage.cacheReadTokens; + const pct = ((used / this.#lastUsage.contextWindow) * 100).toFixed(1); + const fmt = (n: number) => (n >= 1000 ? `${(n / 1000).toFixed(1)}k` : String(n)); + this.#layout.appendStreaming(`\n\n[compacted at ${fmt(used)} / ${fmt(this.#lastUsage.contextWindow)} (${pct}%)]`); + } + break; + } + case 'tool_approval_request': + this.#layout.transitionBlock('tools'); + this.#layout.appendStreaming(`${formatToolSummary(msg.name, msg.input, this.#cwd, this.#store)}\n`); + if (!this.#usageBeforeTools) { + this.#usageBeforeTools = this.#lastUsage; + } + void this.#toolApprovalRequest(msg); + break; + case 'tool_error': + this.#layout.transitionBlock('tools'); + this.#layout.appendStreaming(`${msg.name} error\n\`\`\`json\n${JSON.stringify(msg.input, null, 2)}\n\`\`\`\n\n${msg.error}\n`); break; + case 'message_usage': { + this.#logger.debug('message_usage', { hasUsageBeforeTools: this.#usageBeforeTools !== null }); + if (this.#usageBeforeTools !== null) { + const prev = this.#usageBeforeTools; + const prevCtx = prev.inputTokens + prev.cacheCreationTokens + prev.cacheReadTokens; + const currCtx = msg.inputTokens + msg.cacheCreationTokens + msg.cacheReadTokens; + const delta = currCtx - prevCtx; + const sign = delta >= 0 ? '+' : ''; + const marginalCost = calculateCost( + { + inputTokens: Math.max(0, msg.inputTokens - prev.inputTokens), + cacheCreationTokens: Math.max(0, msg.cacheCreationTokens - prev.cacheCreationTokens), + cacheReadTokens: Math.max(0, msg.cacheReadTokens - prev.cacheReadTokens), + outputTokens: msg.outputTokens, + }, + this.#model, + this.#cacheTtl, + ); + const costStr = `$${marginalCost.toFixed(4)}`; + this.#logger.debug('tool_batch_tokens', { prevCtx, currCtx, delta, marginalCost }); + this.#layout.appendToLastSealed('tools', `[\u2191 ${sign}${delta.toLocaleString()} tokens \u00b7 ${costStr}]\n`); + this.#usageBeforeTools = null; + } + this.#lastUsage = msg; + this.#layout.updateUsage(msg); + break; + } case 'done': this.#logger.info('done', { stopReason: msg.stopReason }); if (msg.stopReason !== 'end_turn') { @@ -59,4 +194,29 @@ export class AgentMessageHandler { break; } } + + async #toolApprovalRequest(msg: SdkToolApprovalRequest): Promise { + try { + this.#logger.info('tool_approval_request', { name: msg.name, input: msg.input }); + const pendingTool: PendingTool = { requestId: msg.requestId, name: msg.name, input: msg.input }; + this.#layout.addPendingTool(pendingTool); + const perm = getPermission({ name: msg.name, input: msg.input }, this.#tools, this.#cwd); + let approved: boolean; + if (perm === PermissionAction.Approve) { + this.#logger.info('Auto approving', { name: msg.name }); + approved = true; + } else if (perm === PermissionAction.Deny) { + this.#logger.info('Auto denying', { name: msg.name }); + approved = false; + } else { + approved = await this.#layout.requestApproval(); + } + this.#respond(msg.requestId, approved); + this.#layout.removePendingTool(msg.requestId); + } catch (err) { + this.#logger.error('Error', err); + this.#respond(msg.requestId, false); + this.#layout.removePendingTool(msg.requestId); + } + } } diff --git a/apps/claude-sdk-cli/src/runAgent.ts b/apps/claude-sdk-cli/src/runAgent.ts index 78723a2..f123a76 100644 --- a/apps/claude-sdk-cli/src/runAgent.ts +++ b/apps/claude-sdk-cli/src/runAgent.ts @@ -1,5 +1,4 @@ -import { relative } from 'node:path'; -import { AnthropicBeta, type AnyToolDefinition, type CacheTtl, calculateCost, type IAnthropicAgent, type SdkMessage, type SdkMessageUsage, type SdkToolApprovalRequest } from '@shellicar/claude-sdk'; +import { AnthropicBeta, type AnyToolDefinition, type IAnthropicAgent, type SdkMessage } from '@shellicar/claude-sdk'; import { CreateFile } from '@shellicar/claude-sdk-tools/CreateFile'; import { DeleteDirectory } from '@shellicar/claude-sdk-tools/DeleteDirectory'; import { DeleteFile } from '@shellicar/claude-sdk-tools/DeleteFile'; @@ -17,73 +16,10 @@ import type { RefStore } from '@shellicar/claude-sdk-tools/RefStore'; import { SearchFiles } from '@shellicar/claude-sdk-tools/SearchFiles'; import { Tail } from '@shellicar/claude-sdk-tools/Tail'; import { AgentMessageHandler } from './AgentMessageHandler.js'; -import type { AppLayout, PendingTool } from './AppLayout.js'; +import type { AppLayout } from './AppLayout.js'; import { logger } from './logger.js'; -import { getPermission, PermissionAction } from './permissions.js'; import { systemPrompts } from './systemPrompts.js'; -function fmtBytes(n: number): string { - if (n >= 1024 * 1024) { - return `${(n / 1024 / 1024).toFixed(1)}mb`; - } - if (n >= 1024) { - return `${(n / 1024).toFixed(1)}kb`; - } - return `${n}b`; -} - -function primaryArg(input: Record, cwd: string): string | null { - for (const key of ['path', 'file']) { - if (typeof input[key] === 'string') { - return relative(cwd, input[key] as string) || (input[key] as string); - } - } - if (typeof input.pattern === 'string') { - return input.pattern; - } - if (typeof input.description === 'string') { - return input.description; - } - return null; -} - -function formatRefSummary(input: Record, store: RefStore): string { - const id = typeof input.id === 'string' ? input.id : ''; - if (!id) { - return 'Ref(?)'; - } - const hint = store.getHint(id) ?? id.slice(0, 8); - const content = store.get(id); - if (content === undefined) { - return `Ref(${id.slice(0, 8)}…)`; - } - const sizeStr = fmtBytes(content.length); - // start and limit always have defaults now (0 and 1000) so always show the range - const start = typeof input.start === 'number' ? input.start : 0; - const limit = typeof input.limit === 'number' ? input.limit : 1000; - const end = Math.min(start + limit, content.length); - return `Ref ← ${hint} [${start}–${end} / ${sizeStr}]`; -} - -function formatToolSummary(name: string, input: Record, cwd: string, store: RefStore): string { - if (name === 'Ref') { - return formatRefSummary(input, store); - } - if (name === 'Pipe' && Array.isArray(input.steps)) { - const steps = (input.steps as Array<{ tool?: unknown; input?: unknown }>) - .map((s) => { - const tool = typeof s.tool === 'string' ? s.tool : '?'; - const stepInput = s.input != null && typeof s.input === 'object' ? (s.input as Record) : {}; - const arg = primaryArg(stepInput, cwd); - return arg ? `${tool}(${arg})` : tool; - }) - .join(' | '); - return steps; - } - const arg = primaryArg(input, cwd); - return arg ? `${name}(${arg})` : name; -} - export async function runAgent(agent: IAnthropicAgent, prompt: string, layout: AppLayout, store: RefStore): Promise { const pipeSource = [Find, ReadFile, Grep, Head, Tail, Range, SearchFiles]; const { tool: Ref, transformToolResult: refTransform } = createRef(store, 2_000); @@ -92,10 +28,8 @@ export async function runAgent(agent: IAnthropicAgent, prompt: string, layout: A const tools: AnyToolDefinition[] = [pipe, ...pipeSource, ...otherTools]; const cwd = process.cwd(); - let lastUsage: SdkMessageUsage | null = null; - /** Snapshot of usage at the start of the current tool batch; used to compute the token delta - * when the next message_usage arrives. Non-null while a batch is in-flight. */ - let usageBeforeTools: SdkMessageUsage | null = null; + const model = 'claude-sonnet-4-6'; + const cacheTtl = '5m' as const; const transformToolResult = (toolName: string, output: unknown): unknown => { const result = refTransform(toolName, output); @@ -106,13 +40,8 @@ export async function runAgent(agent: IAnthropicAgent, prompt: string, layout: A return result; }; - const handler = new AgentMessageHandler(layout, logger); - layout.startStreaming(prompt); - const model = 'claude-sonnet-4-6'; - const cacheTtl: CacheTtl = '5m'; - const { port, done } = agent.runAgent({ model, maxTokens: 32768, @@ -136,101 +65,13 @@ export async function runAgent(agent: IAnthropicAgent, prompt: string, layout: A }, }); - const toolApprovalRequest = async (msg: SdkToolApprovalRequest) => { - try { - logger.info('tool_approval_request', { name: msg.name, input: msg.input }); - - const pendingTool: PendingTool = { requestId: msg.requestId, name: msg.name, input: msg.input }; - layout.addPendingTool(pendingTool); - - const perm = getPermission({ name: msg.name, input: msg.input }, tools, cwd); - let approved: boolean; - if (perm === PermissionAction.Approve) { - logger.info('Auto approving', { name: msg.name }); - approved = true; - } else if (perm === PermissionAction.Deny) { - logger.info('Auto denying', { name: msg.name }); - approved = false; - } else { - approved = await layout.requestApproval(); - } - - port.postMessage({ type: 'tool_approval_response', requestId: msg.requestId, approved }); - layout.removePendingTool(msg.requestId); - } catch (err) { - logger.error('Error', err); - port.postMessage({ type: 'tool_approval_response', requestId: msg.requestId, approved: false }); - layout.removePendingTool(msg.requestId); - } + const respond = (requestId: string, approved: boolean) => { + port.postMessage({ type: 'tool_approval_response', requestId, approved }); }; - port.on('message', (msg: SdkMessage) => { - switch (msg.type) { - case 'query_summary': { - const parts = [`${msg.systemPrompts} system`, `${msg.userMessages} user`, `${msg.assistantMessages} assistant`, ...(msg.thinkingBlocks > 0 ? [`${msg.thinkingBlocks} thinking`] : [])]; - layout.transitionBlock('meta'); - layout.appendStreaming(parts.join(' · ')); - break; - } + const handler = new AgentMessageHandler(layout, logger, { model, cacheTtl, cwd, store, tools, respond }); - case 'message_thinking': - layout.transitionBlock('thinking'); - layout.appendStreaming(msg.text); - break; - case 'message_text': - layout.transitionBlock('response'); - layout.appendStreaming(msg.text); - break; - case 'tool_approval_request': - layout.transitionBlock('tools'); - layout.appendStreaming(`${formatToolSummary(msg.name, msg.input, cwd, store)}\n`); - // Snapshot usage at the start of the first tool in this batch so we can - // compute the per-batch turn cost when the next message_usage arrives. - if (!usageBeforeTools) { - usageBeforeTools = lastUsage; - } - toolApprovalRequest(msg); - break; - case 'tool_error': - layout.transitionBlock('tools'); - layout.appendStreaming(`${msg.name} error\n\`\`\`json\n${JSON.stringify(msg.input, null, 2)}\n\`\`\`\n\n${msg.error}\n`); - break; - case 'message_usage': { - // Annotate the (now-sealed) tools block with how many tokens this batch added to the - // context window: delta = (input+cacheCreate+cacheRead at N+1) - (same at N). - // This captures tool-result tokens + the assistant tool-call tokens that moved into - // the cache between turns. The running cost total is in the status bar. - logger.debug('message_usage', { hasUsageBeforeTools: usageBeforeTools !== null }); - if (usageBeforeTools !== null) { - const prevCtx = usageBeforeTools.inputTokens + usageBeforeTools.cacheCreationTokens + usageBeforeTools.cacheReadTokens; - const currCtx = msg.inputTokens + msg.cacheCreationTokens + msg.cacheReadTokens; - const delta = currCtx - prevCtx; - const sign = delta >= 0 ? '+' : ''; - // Marginal cost: price only the net-new tokens this batch added (delta per category) - // plus the output tokens Claude generated in response to those results. - const marginalCost = calculateCost( - { - inputTokens: Math.max(0, msg.inputTokens - usageBeforeTools.inputTokens), - cacheCreationTokens: Math.max(0, msg.cacheCreationTokens - usageBeforeTools.cacheCreationTokens), - cacheReadTokens: Math.max(0, msg.cacheReadTokens - usageBeforeTools.cacheReadTokens), - outputTokens: msg.outputTokens, - }, - model, - cacheTtl, - ); - const costStr = `$${marginalCost.toFixed(4)}`; - logger.debug('tool_batch_tokens', { prevCtx, currCtx, delta, marginalCost }); - layout.appendToLastSealed('tools', `[\u2191 ${sign}${delta.toLocaleString()} tokens \u00b7 ${costStr}]\n`); - usageBeforeTools = null; - } - lastUsage = msg; - layout.updateUsage(msg); - break; - } - default: - handler.handle(msg); - } - }); + port.on('message', (msg: SdkMessage) => handler.handle(msg)); layout.setCancelFn(() => port.postMessage({ type: 'cancel' })); diff --git a/apps/claude-sdk-cli/test/AgentMessageHandler.spec.ts b/apps/claude-sdk-cli/test/AgentMessageHandler.spec.ts index 13860e1..5018ac7 100644 --- a/apps/claude-sdk-cli/test/AgentMessageHandler.spec.ts +++ b/apps/claude-sdk-cli/test/AgentMessageHandler.spec.ts @@ -1,5 +1,5 @@ import { describe, expect, it, vi } from 'vitest'; -import { AgentMessageHandler } from '../src/AgentMessageHandler.js'; +import { AgentMessageHandler, type AgentMessageHandlerOptions } from '../src/AgentMessageHandler.js'; import type { AppLayout } from '../src/AppLayout.js'; import { logger } from '../src/logger.js'; @@ -11,11 +11,28 @@ function makeLayout() { return { transitionBlock: vi.fn(), appendStreaming: vi.fn(), + appendToLastSealed: vi.fn(), + updateUsage: vi.fn(), + addPendingTool: vi.fn(), + removePendingTool: vi.fn(), + requestApproval: vi.fn().mockResolvedValue(true), } as unknown as AppLayout; } -function makeHandler(layout: AppLayout) { - return new AgentMessageHandler(layout, logger); +function makeOpts(overrides: Partial = {}): AgentMessageHandlerOptions { + return { + model: 'claude-test', + cacheTtl: '5m', + cwd: '/test', + store: { get: vi.fn(), getHint: vi.fn() } as unknown as AgentMessageHandlerOptions['store'], + tools: [], + respond: vi.fn(), + ...overrides, + }; +} + +function makeHandler(layout: AppLayout, opts?: Partial) { + return new AgentMessageHandler(layout, logger, makeOpts(opts)); } // --------------------------------------------------------------------------- @@ -187,3 +204,145 @@ describe('AgentMessageHandler — error', () => { expect(actual).toBe(expected); }); }); + +// --------------------------------------------------------------------------- +// tool_error +// --------------------------------------------------------------------------- + +describe('AgentMessageHandler — tool_error', () => { + it('transitions to tools block', () => { + const layout = makeLayout(); + makeHandler(layout).handle({ type: 'tool_error', name: 'EditFile', input: { file: 'x.ts' }, error: 'oops' }); + const expected = 'tools'; + const actual = vi.mocked(layout.transitionBlock).mock.calls[0]?.[0]; + expect(actual).toBe(expected); + }); + + it('streams tool name in the error line', () => { + const layout = makeLayout(); + makeHandler(layout).handle({ type: 'tool_error', name: 'EditFile', input: { file: 'x.ts' }, error: 'boom' }); + const expected = true; + const actual = (vi.mocked(layout.appendStreaming).mock.calls[0]?.[0] ?? '').includes('EditFile error'); + expect(actual).toBe(expected); + }); + + it('includes the error message in the output', () => { + const layout = makeLayout(); + makeHandler(layout).handle({ type: 'tool_error', name: 'EditFile', input: { file: 'x.ts' }, error: 'bad things' }); + const expected = true; + const actual = (vi.mocked(layout.appendStreaming).mock.calls[0]?.[0] ?? '').includes('bad things'); + expect(actual).toBe(expected); + }); +}); + +// --------------------------------------------------------------------------- +// message_usage +// --------------------------------------------------------------------------- + +function makeUsage(inputTokens: number): { type: 'message_usage'; inputTokens: number; cacheCreationTokens: number; cacheReadTokens: number; outputTokens: number; costUsd: number; contextWindow: number } { + return { type: 'message_usage', inputTokens, cacheCreationTokens: 0, cacheReadTokens: 0, outputTokens: 100, costUsd: 0.001, contextWindow: 200_000 }; +} + +describe('AgentMessageHandler — message_usage without prior tools', () => { + it('calls updateUsage', () => { + const layout = makeLayout(); + makeHandler(layout).handle(makeUsage(1000)); + const expected = 1; + const actual = vi.mocked(layout.updateUsage).mock.calls.length; + expect(actual).toBe(expected); + }); + + it('does not annotate when no tool batch is open', () => { + const layout = makeLayout(); + makeHandler(layout).handle(makeUsage(1000)); + const expected = 0; + const actual = vi.mocked(layout.appendToLastSealed).mock.calls.length; + expect(actual).toBe(expected); + }); +}); + +describe('AgentMessageHandler — message_usage delta annotation', () => { + it('annotates the tools block with token delta after a tool batch', () => { + const layout = makeLayout(); + const handler = makeHandler(layout); + // Establish a baseline usage, then fire a tool that snapshots it + handler.handle(makeUsage(1000)); + handler.handle({ type: 'tool_approval_request', requestId: 'r1', name: 'Find', input: { path: '.' } }); + // Next usage shows 500 more input tokens + handler.handle(makeUsage(1500)); + const expected = true; + const annotation = vi.mocked(layout.appendToLastSealed).mock.calls[0]?.[1] ?? ''; + const actual = annotation.includes('+500'); + expect(actual).toBe(expected); + }); + + it('resets usageBeforeTools after the annotation so second batch computes independently', () => { + const layout = makeLayout(); + const handler = makeHandler(layout); + handler.handle(makeUsage(1000)); + handler.handle({ type: 'tool_approval_request', requestId: 'r1', name: 'Find', input: { path: '.' } }); + handler.handle(makeUsage(1500)); + // Second tool batch + handler.handle({ type: 'tool_approval_request', requestId: 'r2', name: 'Find', input: { path: '.' } }); + handler.handle(makeUsage(1700)); + // appendToLastSealed should have been called twice, second annotation for +200 + const expected = 2; + const actual = vi.mocked(layout.appendToLastSealed).mock.calls.length; + expect(actual).toBe(expected); + }); + + it('second batch delta is computed from the post-first-batch usage, not the original', () => { + const layout = makeLayout(); + const handler = makeHandler(layout); + handler.handle(makeUsage(1000)); + handler.handle({ type: 'tool_approval_request', requestId: 'r1', name: 'Find', input: { path: '.' } }); + handler.handle(makeUsage(1500)); + handler.handle({ type: 'tool_approval_request', requestId: 'r2', name: 'Find', input: { path: '.' } }); + handler.handle(makeUsage(1700)); + const expected = true; + const secondAnnotation = vi.mocked(layout.appendToLastSealed).mock.calls[1]?.[1] ?? ''; + const actual = secondAnnotation.includes('+200'); + expect(actual).toBe(expected); + }); + + it('does not snapshot usageBeforeTools again if batch already open', () => { + const layout = makeLayout(); + const handler = makeHandler(layout); + handler.handle(makeUsage(1000)); + // Two tools in the same batch before usage arrives + handler.handle({ type: 'tool_approval_request', requestId: 'r1', name: 'Find', input: { path: '.' } }); + handler.handle({ type: 'tool_approval_request', requestId: 'r2', name: 'Find', input: { path: '.' } }); + handler.handle(makeUsage(1800)); + // Delta should be relative to the 1000 baseline, not the second tool + const expected = true; + const annotation = vi.mocked(layout.appendToLastSealed).mock.calls[0]?.[1] ?? ''; + const actual = annotation.includes('+800'); + expect(actual).toBe(expected); + }); +}); + +// --------------------------------------------------------------------------- +// message_compaction with lastUsage +// --------------------------------------------------------------------------- + +describe('AgentMessageHandler — message_compaction annotation', () => { + it('appends context-usage annotation when lastUsage is known', () => { + const layout = makeLayout(); + const handler = makeHandler(layout); + handler.handle(makeUsage(150_000)); + handler.handle({ type: 'message_compaction', summary: 'trimmed' }); + const calls = vi.mocked(layout.appendStreaming).mock.calls; + const expected = true; + const actual = calls.some((c) => (c[0] ?? '').includes('compacted at')); + expect(actual).toBe(expected); + }); + + it('omits annotation when lastUsage is not yet known', () => { + const layout = makeLayout(); + makeHandler(layout).handle({ type: 'message_compaction', summary: 'trimmed' }); + const calls = vi.mocked(layout.appendStreaming).mock.calls; + const expected = false; + const actual = calls.some((c) => (c[0] ?? '').includes('compacted at')); + expect(actual).toBe(expected); + }); +});