From 4ecffa78b3e956d1f27055c73fe88adbb4f57863 Mon Sep 17 00:00:00 2001 From: Stephen Hellicar Date: Mon, 6 Apr 2026 18:47:04 +1000 Subject: [PATCH] Restore lost philosophy from b67b1f6 and document the five banana pillars MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit b67b1f6 was committed to feature/sdk-tooling after PR #176 was squash-merged. The wrap-up session content never made it into main — it included the full philosophy discussion from the 2026-04-05 session: the sandbox threat model, the composable tool philosophy, and the reasoning behind claude-sandbox as the fifth banana pillar. Restored to .claude/sessions/2026-04-05.md. Also added: - .claude/five-banana-pillars.md — the five pillars as a standalone reference (The Case, The Cage, The Mailroom, The Tower, The Pit) with the guiding question for each architectural decision - .claude/CLAUDE.md — new vision section summarising the pillars before the architecture section, so the why is visible before the what --- .claude/CLAUDE.md | 19 +++++++ .claude/five-banana-pillars.md | 39 ++++++++++++++ .claude/sessions/2026-04-05.md | 99 ++++++++++++++++++++++++++++++++++ 3 files changed, 157 insertions(+) create mode 100644 .claude/five-banana-pillars.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 7d652b1..1e73e1f 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -94,6 +94,25 @@ from `ConversationStore`). Each substep ships independently; the CLI works at ev - Vitest setup (prerequisite for unit tests, do alongside 1a) + +## Why This SDK Exists — The Five Banana Pillars + +The official Anthropic SDK is a black box: you get a response, but the agent loop is opaque. `@shellicar/claude-sdk` makes the loop transparent, and that transparency is what enables everything else. + +| Pillar | What it needs from the SDK | +|--------|---------------------------| +| **The Case** (context management) | Own the messages array; expose push/remove; control what enters context | +| **The Cage** (cost visibility) | Stream per-turn usage data so the consumer can track costs as they happen | +| **The Mailroom** (orchestration) | Bidirectional MessageChannel protocol; every agent looks the same to an orchestrator | +| **The Tower** (observability) | Emit events (tools, approvals, costs, errors); consumer slots in as approver via held-promise | +| **The Pit** (sandbox) | Consumer-controlled tool pipeline: validate → approve → execute | + +If a design decision serves none of the pillars, it probably doesn't belong in the SDK. + +Full detail: `.claude/five-banana-pillars.md` + + + ## Architecture diff --git a/.claude/five-banana-pillars.md b/.claude/five-banana-pillars.md new file mode 100644 index 0000000..f8d6e94 --- /dev/null +++ b/.claude/five-banana-pillars.md @@ -0,0 +1,39 @@ +# The Five Banana Pillars + +The SDK is the runtime that all five pillars bolt onto. + +**The core insight:** the official Anthropic SDK is a black box. `@shellicar/claude-sdk` makes the agent loop transparent — and that transparency is what enables everything else. + +Without owning the agent loop, you cannot manage context (The Case), track costs (The Cage), route messages (The Mailroom), emit events (The Tower), or control tool execution (The Pit). Every pillar requires visibility into what the loop is doing. + +--- + +## The Five Pillars + +**The Case — Context Management** +The SDK owns the messages array. It controls what enters context, manages compaction, and exposes push/remove for tagged pruning. The consumer saves, loads, and edits. Compaction is the consumer editing the array. Long-term: tiered context model — small results inline, old results pruned, important results stored for recall. + +**The Cage — Cost Visibility** +The SDK streams usage data per turn: input tokens, output tokens, cache read/write. The consumer tracks and displays however they want. Without owning the loop, per-turn costs are invisible — you get a total at the end, nothing you can act on mid-session. + +**The Mailroom — Orchestration** +The `MessageChannel` is the mailroom. Bidirectional SDK/consumer communication over a typed message protocol. Multi-agent orchestration: each agent exposes the same interface; the orchestrator speaks one protocol to all of them. + +**The Tower — Observability** +The SDK emits events: tool calls, approvals, cost deltas, context usage, errors. The Tower slots in as the approver via the held-promise pattern — no SDK changes needed. Observability is a consumer concern; the SDK just emits faithfully. + +**The Pit — Sandbox** +The SDK runs inside whatever environment the consumer provides. The tool pipeline (validate → approve → execute) is what makes the pit safe. The consumer controls which tools exist and whether each invocation is allowed. + +--- + +## How the pillars guide architecture decisions + +When a design question comes up, run it against the pillars: +- Does this decision keep the messages array transparent and editable? (The Case) +- Does this decision make per-turn costs visible to the consumer? (The Cage) +- Does this decision keep the message protocol clean and consistent? (The Mailroom) +- Does this decision emit enough events for the consumer to observe what's happening? (The Tower) +- Does this decision keep the consumer in control of tool execution? (The Pit) + +If a decision serves none of the pillars, it probably doesn't belong in the SDK. diff --git a/.claude/sessions/2026-04-05.md b/.claude/sessions/2026-04-05.md index 6b81063..d88704b 100644 --- a/.claude/sessions/2026-04-05.md +++ b/.claude/sessions/2026-04-05.md @@ -323,3 +323,102 @@ The previous edit session had left `AppLayout.ts` in a broken state (parse error - **Skills system** — design in `docs/skills-design.md`; `ConversationHistory.remove(id)` primitive is in place - **Image attachments** — `pngpaste` + clipboard image detection (text-only first was the stated goal; now met) - **`IRefStore` interface extraction** — documented in CLAUDE.md, straightforward + + + +--- + +# Session 2026-04-05 (wrap-up — PR merge, backlog triage, architecture discussion) + +## What was done + +### PR #176 merged + +Wrote a comprehensive PR body for `feature/sdk-tooling` (previously empty) covering all five areas: `claude-core` primitives, `claude-sdk-tools` suite, `claude-sdk` enhancements, the TUI app, and the clipboard system. Merged as squash commit `974e1c0` into main. + +### Issue backlog triage + +Reviewed all open issues. Closed 16 as superseded or complete: + +**Superseded by new app architecture** (old `claude-cli` zones/displayBuffer/add-dir): +`#168`, `#167`, `#163`, `#159`, `#158`, `#129`, `#127`, `#120`, `#114`, `#113`, `#112`, `#110`, `#109`, `#87` + +**Complete in `claude-sdk-cli`:** `#93` (status line visible in command mode) + +**Not planned — single session by design:** `#95` (session browser) + +Remaining open issues of note: +- `#179` alt/history view — the right replacement for all the old scroll/search issues; search should be part of it +- `#178` system reminders for file modifications — between-turns mtime tracking is the high-value/low-complexity part; `fs.watch` during exec deferred +- `#177` LSP validation — POC exists; fits PreviewEdit/EditFile pattern; advisory not blocking; baseline snapshot on session start +- `#104`/`#101` Exec permission model — needed for development machine use, not sandbox (see philosophy below) +- `#97` conversation history on startup — audit log data is there, just not surfaced +- `#94` always show model name — small, should already be done or trivial +- `#128` configurable keybinds — low priority, keybindings are working + +--- + +## Philosophy — the important part + +This session had a long discussion that reframes the whole project direction. Writing it here because *why* matters more than *what*, and a future session reading only the commit log will miss it entirely. + +### Session logs should capture reasoning, not just decisions + +The key insight: if you write down decisions, a future session has to guess at the reasoning. If you write down the reasoning — the dead ends, the why-not-that, the philosophy behind a choice — a future session can actually think from it. The clipboard system is a good example: the log should say not just "three-stage probe" but *why* — that `looksLikePath` is permissive at stage 1 because restricting it pushes VS Code relative paths to a stage that silently mangles them via HFS coercion. That's the recoverable insight. + +### The sandbox changes the threat model entirely + +Months of work on permission rules, skill management, approval flows, context injection — all of it is a software approximation of physical isolation. Software approximations of physical isolation are always leaky. The sandbox provides isolation at the right layer. + +Inside a sandbox with an isolated environment and a cloned repo: +- `rm -rf` just means re-clone +- Credential leakage is scoped to short-lived injected credentials +- Scope creep is contained to the repo +- The residual threats are network exfiltration and prompt injection — real but a much smaller surface + +The permission model (`#101`, `#104`) still matters — but for *development machine use*, not for the sandbox. In the sandbox, the environment is the safety. On the development machine, the rules are the safety. + +### Skills and CLAUDE.md are environment workarounds + +The skills system, CLAUDE.md harness, context injection, session memory — all of these are answers to the question "how do I configure a general-purpose agent to behave correctly in many contexts." That's a hard question with no clean answer. + +The sandbox asks a different question: "what does this specific agent need to know to do this specific job?" That question is easy to answer. You compose the prompt fresh, put exactly the right reference material in the repo, and the agent has exactly what it needs. No skill loader, no CLAUDE.md harness, no timing problems with context injection. + +The fleet repos (`claude-fleet-shellicar`, `claude-fleet-eagers`, etc.) are already this — collections of job specs, templates, and context artifacts for specific domains. The orchestrator's job is composition and scheduling, not runtime configuration. + +### The composable tool philosophy + +`claude-sdk-cli` was built with a specific principle: the context window should contain *decisions and reasoning*, not raw data. Every tool design decision reflects this: +- **Pipe** — compose Find/ReadFile/Grep into one call; only the final result enters context +- **Ref** — large outputs stored outside context, addressable by handle; Claude pages through as needed +- **walkAndRef** — automatically swaps large tool outputs for tokens at the SDK boundary +- **ReadFile size guard** — refuses to load files that would dominate context; redirects to Head/Tail/Grep + +This is also why "code mode" (Cloudflare) and similar approaches are the right instinct — give Claude the *operation*, not the data. The query is 20 tokens, the answer is 5 tokens; the 50KB file never enters context. Pipe and Ref are the general form of that idea. + +The irony the industry hasn't fully reckoned with: most "agent problems" (context exhaustion, approval complexity, memory systems, RAG) are self-inflicted by the agent loop design. Atomic jobs + ephemeral sandboxes + fresh context removes most of them by construction. The hard problems that remain — prompt quality, job scoping, judge quality — are genuinely hard and human work. But they're the *right* hard problems. + +### The next major project: claude-sandbox (The Pit) + +Fifth banana pillar. The sandbox is the runtime environment that makes fire-and-forget jobs viable: +- Isolated environment (clone, credentials via ENV, tear down after) +- `@shellicar/mcp-exec` as the execution layer (already built) +- `@shellicar/claude-sdk` as the agent loop (already built) +- Structured permission model (`#101`) for the development machine case +- Deterministic exit criteria evaluation (tests pass, schema validates, LSP clean) +- Budget gate (cost cap, escalate on threshold) + +The `sandbox-claude.md` document in `claude-fleet` has the full architecture thinking including the compound probability math, the three quality surfaces, and the supervision layer design. Read it before building anything. + +## Current state + +`feature/sdk-tooling` merged to main as `974e1c0`. Clean. No active branch. + +## What's next + +- **`claude-sandbox`** — the Pit; isolated execution environment for fire-and-forget agent jobs +- **`#177` LSP validation** — high value, POC exists, fits PreviewEdit/EditFile naturally +- **`#178` system reminders** — between-turns mtime tracking in NodeFileSystem +- **`#179` alt/history view** — block navigation + search for `claude-sdk-cli` +- **`#101`/`#104` permission model** — for development machine; not sandbox +- **`#94` model name in status line** — quick win