Make the SDK stateful: agent session pattern, durable config, filesystem removal#235
Merged
Make the SDK stateful: agent session pattern, durable config, filesystem removal#235
Conversation
sdk-shape.md is the authoritative plan for the SDK refactor. It leads with the two underlying problems that drove the rewrite (the SDK had no durable in-memory session concept, and the SDK was touching files), and lays out the building-block design that follows. It supersedes the six-step technical plan in issue #232. The session log is carried in its handoff state. Today's code-walk entries will land as separate commits on this branch.
Four additions, all derived from answers given in the File 1 walk of AnthropicClient.ts. None are new design; they record decisions that were discussed but not yet written down. 1. New principle: substitution happens through behavioural interfaces. Interfaces carry behavioural contracts, not just type signatures. LSP applies, so 'same shape' at the type level is not sufficient. 2. New principle: not every helper is a building block. The named blocks are the block-level identities; helpers inside a block are wiring, not blocks. 3. Client block description: names the OAuth flow as inside the block, and records credential storage as the one pragmatic exception to the no-filesystem rule (access to the API, not state from or to the API). 4. Closing section: notes that the superseded #232 framing still lives in code comments (starting with AnthropicClient.ts) and should be updated as the refactor touches each file.
Client and Request builder blocks both used 'header set' to describe what accompanies the body at call time. The phrase is ambiguous: it can mean literal HTTP headers, or call-level options including abort signal and timeout. The current AnthropicClient.stream() signature takes Anthropic.RequestOptions (the broader meaning), but the plan's wording did not reflect that. Tightened both block descriptions to 'request options (headers plus per-call concerns such as abort signal and timeout)' so the boundary is explicit.
The 2026-04-09 session logged one drift failure (importing outside concerns into the walk). This session surfaced a different one: treating the code as ground truth and asking whether the plan matches it, instead of treating the plan as the target and asking what the implementer needs to know to reach it. The symptom was a 'plan promises X / code delivers Y / match yes-or-no' table on Conversation.ts that turned every missing piece into a numbered question asking whether the refactor was in scope. The plan is the scope. There was no question. Captured the pattern at the top of the log so a compacted-me reads the rule before writing any more walk output. Also filed the work already done this session (three commits, four files walked) and the next files in the queue, so the state survives another compaction cleanly.
Walking MessageStream.ts, I produced a long list of 'gaps an implementer will fill' covering raw event taps, per-TTL cache split, usage.iterations, non-streamed entry points, and per-block lifecycle events. Every item on that list was new functionality the plan's Stream processor description names as belonging to the block. None of it was refactor work. The plan can be read two ways: 'block X is responsible for A, B, C, D' as an implementation requirement, or 'block X is the home A, B, C, D would live in if they existed' as a shape statement naming where future features go. Stephen confirmed the second reading is the correct one for this refactor. The refactor extracts the blocks and preserves current behaviour. Feature work is follow-up. Captured the rule alongside the existing 'plan is the target, not the code' rule at the top of the 2026-04-10 log so both drift patterns survive the next compaction. The two rules describe different failure modes of the same underlying plan-vs-code confusion and both need to be filtered out of walk output.
Stephen was angry (correctly) that the AGENT_SDK_PREFIX question was re-asked in this session, because it had been answered previously and the answer was never captured in writing. The question resurfaced after compaction and wasted his time. The root cause: I treated the session log as a place to record what happened, not as the durable memory for decisions. When an answer was given in conversation, I moved on instead of writing the answer down in the same turn. Compaction then ate the in-memory version and left only the unanswered question. Added two new discipline rules to the session log: 1. Answers get written into the log in the same turn they are given. PreviewEdit -> EditFile -> commit before proceeding. Not optional. 2. Questions to Stephen are in plain English. If a question cannot be phrased without quoting a line from sdk-shape.md or naming a principle by its title, the question is not ready to ask. Refined Q4 as a worked example: jargon version vs plain English version. Also recorded the resolutions to Q3, Q4, and Q5 in a new 'Resolved questions' section so the answers are permanently captured: - Q3: request builder maps inputs to what the HTTP request needs; the wording of its return shape is not a choice point. - Q4: consumer-facing options stay flat; grouping compaction fields into a single object is a separate change, not refactor scope. - Q5: AGENT_SDK_PREFIX is required by the API, not optional, not overridable, not a consumer policy decision. Stays hardcoded.
The question 'is AGENT_SDK_PREFIX a consumer policy decision or is it required?' has now come up twice: once in the 2026-04-09 session where it was captured as an open question and Stephen answered yes-update- the-plan and the plan was never updated; once again in this 2026-04-10 session where I re-asked it during the RequestBuilder.ts walk. The plan is not only what we want to change: it is also what could reasonably confuse someone reading it. A question that keeps resurfac- ing is a confusion point and belongs in the plan, even if the fact it documents is an API-level constraint rather than a design decision. Added a sentence to the Request builder block making the constraint explicit: the system prompt always begins with a fixed SDK identity prefix (AGENT_SDK_PREFIX); this is required by the Anthropic API; not a consumer policy decision, not optional, not overridable. Anyone reading the plan without a session log in the other hand now gets the answer one step away instead of re-asking.
The AGENT_SDK_PREFIX question has now been answered twice. The first time, in 2026-04-09, Stephen said yes update the plan and the plan was not updated. The second time, in this session, he was correctly angry and I finally did the plan edit (commit 0bdfa49). The surface failure was 'answers not recorded', which is covered by the first discipline rule. But there is a second failure underneath it: the plan itself was not treated as a place where confusion-prevention facts belong. When an answer is 'that is an API constraint', I assumed the answer did not merit a plan edit because it is not a design choice. It still belongs in the plan because the plan reader will otherwise ask the same question. Added a new discipline rule to the session log: the plan documents confusion points, not only change points. A question that keeps resurfacing is evidence of a confusion point and is sufficient reason to add a sentence to the plan, even if the fact being documented is an API-level constraint rather than a design decision. Also extended the Q5 resolution with the full history: when it was asked, when it was answered, when the plan was (finally) updated, and which plan-edit commit captured it. Future me reading this log gets the whole story, not just the one-line answer.
Files 5 (MessageStream) and 6 (RequestBuilder) were walked earlier in this session but never captured in the log. The Tool registry walk (defineTool.ts + the tool type definitions in public/types.ts) has just finished. Capture all three rounds now, while they are still in head, before the next context compaction eats them. Also catch the commits list up: 4d8668e, 949ad4b, 96f5ba1, 0bdfa49, eb8a2d3 had all landed but none were listed in the session log's commits section. Update the \"what is next\" section to reflect the remaining walk: AgentRun (single-file round, check in before starting), then AnthropicAgent, then auth, then the appendix work on sdk-shape.md.
Stephen caught that my previous Tool registry walk notes were muddled. I was framing the work as \"extract the runtime out of AgentRun and preserve current behaviour\", which buried the point: the refactor creates the Tool registry block. It does not exist today. There is no file, no class, no cohesive thing that owns tools. The runtime is spread across AgentRun and RequestBuilder with neither of them knowing that it is tool-registry work they are doing. The drift pattern 2 rule (\"refactor is not a feature build\") is about adding NEW capabilities to blocks, not about whether the block itself is new. Creating the block is the refactor. Extending the block is not. Also capture Stephen's concrete insight: toJSONSchema is called from RequestBuilder.ts on every request today. Once there is a registry, schema conversion happens once at registration and the cached result is handed to the request builder when it needs the wire schema. This is not an added capability; it is what falls out naturally when the registry exists. It also removes per-call work from a hot path that should be pure. The reframed notes now lead with \"the refactor creates the block\", list what exists today, list the responsibilities of the new block (preserving current behaviour), and list what is explicitly not in scope (per-operation approval, standalone public API, changing the consumer-facing tools array shape).
The Tool registry block's description previously said \"validates input against the schema\" with \"schema\" unqualified. A plan reader would reasonably assume that meant the Zod validation schema and not think about the wire JSON Schema at all. I fell into exactly that reading when writing the Tool registry walk notes: I left schema conversion in the request builder by default, because the plan did not say the registry owns it. Stephen corrected that with a concrete example: the request builder should not be calling toJSONSchema, because tool schemas do not change between requests and conversion is therefore per-tool work, not per-request work. That only works if the registry owns the schema in both its Zod form (for validation) and its JSON Schema form (for the wire), and converts at registration time. This is not a new capability. It is what naturally falls out once the Tool registry exists as a cohesive block instead of runtime code smeared across AgentRun.ts and RequestBuilder.ts. The plan edit makes that fact visible to a plan reader who does not have the session log in their other hand. Under the plan-is-confusion-prevention rule added earlier this session: if I got confused reading it, the next reader will too, so the clarification belongs in the plan, not just the walk notes.
AgentRun.ts is the untangling target: four plan blocks (query runner, turn runner, tool registry runtime, approval coordinator) are all tangled in one 270-line class. Three rounds of walking produced the findings in file 8 of the log, condensed here: class-field ownership after extraction, method-to-block mapping, ten implementer gotchas, and the after-extraction shape of each of the four blocks (what it holds, receives, returns, calls). The walk also generated four design decisions that have to be recorded now, not left as 'I would decide these' notes for the next session to guess at. Stephen's feedback earlier in this turn: writing 'I would decide X' is the same failure mode as Q5 (answer given, not recorded, re-asked later). The four decisions are: 1. Abort signal assembly stays at the turn runner (builder stays pure). 2. transformToolResult stays per-query input (not session durable). 3. Tool-not-found channel-send asymmetry is preserved (refactor-only). 4. No imperative cancel() on approval coordinator (refactor-only). Decisions 1 and 2 are genuine choices; both defensible alternatives are presented with pros and cons so the next reader can evaluate the call instead of having to trust it. Decisions 3 and 4 collapse under the refactor-is-not-a-feature-build rule. Also correct a misattribution in the file 7 (Tool registry) walk: I had written that the empty-tool-use retry at AgentRun.ts line 118 was tool-runtime work. That framing was wrong. The retry re-issues the whole turn's HTTP request, not an individual tool call, so it is a query runner concern. The file 7 bullet now says so, and the file 8 walk covers it in the right block. Also catch the commits list up with 93ecad7, 7ab4bfd, and 6bb05a4 (the three commits since the log was last updated). What-is-next updated: files 1-8 walked, remaining work is AnthropicAgent (short), auth files (confirmation), appendix behavioural-contract seeding, and the refactor playbook.
…rule Stephen asked me to verify the reasoning behind each decision is captured, not just the decisions themselves, before context compaction happens. Audit found two gaps. Gap 1: Decision 3 (preserve tool-not-found asymmetry) had the decision and the refactor-only rationale, but no capture of the candidate reason the asymmetry might exist in the first place. A future maintainer deciding whether to unify needs to know the hypothetical intent reading: missing tool is an SDK or consumer config bug the consumer cannot react to live (they wrote the tool array themselves); invalid input is a model behaviour the consumer may legitimately want to observe. That reasoning is now in the log as part of the decision 3 block, so a unification pass has material to evaluate against. Gap 2: The new discipline rule about not laundering decisions as questions was never committed. Stephen caught the failure pattern in this turn (four 'questions' in the AgentRun walk that were all decisions-with-recommendations), but the lesson was only in the commit history and the conversation, not in the discipline section of the session log. Adding it as 'Rule: do not launder a decision as a question' alongside the existing question-handling rules. The rule captures the failure mode, two signs of it, the test for whether a question is real, and a worked example from this session. Both fixes land together so a single commit precedes the carry-forward summary at ~73% context.
The misshapen Session block. Short walk because the untangling work lived in AgentRun, not here. Findings recorded: - The 14-field RunAgentQuery IS the "rebuild per call" problem the refactor exists to solve. Every field except messages is work the consumer redoes on every call even though almost none of it changes between calls. - Refactor mapping 11 durable / 3 per-query, derived from the plan Session and query runner blocks. Durable: model, thinking, maxTokens, systemPrompts, tools, betas, requireToolApproval, pauseAfterCompact, compactInputTokens, cacheTtl, cachedReminders. Per-query: messages, systemReminder, transformToolResult. - messages: string[] is "new messages this query", not the whole conversation. The Conversation lives on the Session. - The four history methods (getHistory, loadHistory, injectContext, removeContext) are Conversation-access stand-ins. Plan Session block exposes .conversation directly so all four vanish. Zero tests for them exist. - Conversation.load() is dead code the moment ConversationStore is deleted. Exactly one runtime caller. Three test call sites at Conversation.spec.ts lines 219, 228, 236 get removed or rewritten against push() as part of the delete. - historyFile leaves AnthropicAgentOptions entirely. Consumer owns the save/restore flow. SDK knows nothing about it.
Confirmation walk; the plan already captures the Client block as
owning auth. Findings recorded:
- 20 files under private/Auth plus TokenRefreshingAnthropic in
private/http. Most are tiny OAuth-ceremony helpers (PKCE, state,
callback server, schema, consts); per the "not every helper is a
building block" principle these are wiring, not substitution
surfaces.
- AnthropicAuth.getCredentials() is the single entry point: load,
login-if-missing, refresh-if-expired, save. login() has local
(HTTP callback on port 3001, opens browser) and manual (paste
code from stdin) modes.
- TokenRefreshingAnthropic overrides apiKeyAuth/bearerAuth to call
a token getter per request. The Anthropic SDK constructor
silently discards non-string auth values so the subclass
captures the getter before super() drops it. This mechanism is
what makes refresh invisible and the refactor preserves it.
- AnthropicAuth is currently exported as public API at
index.ts:31 despite living under private/Auth/. CLI wires it up
manually (main.ts lines 58-63): construct, eager getCredentials,
wrap as authToken closure, pass to createAnthropicAgent.
- Per plan Client block ownership, AnthropicAuth folds in as an
internal collaborator and stops being exported. Consumer
substitution happens at the Client block interface, not at an
Auth sub-interface. Folding is restructuring, not new
capability, so in refactor scope.
- Eager login UX must be preserved. Recommendation is an async
createSession factory (`const session = await createSession(...)`)
over an explicit session.login() method, because the async
factory matches the 95%-case-is-simple principle and hides the
step from consumers who do not care. Final call is the
implementer's.
- File I/O is exactly two functions (loadCredentials,
saveCredentials) at ~/.claude/.credentials.json. Matches the
plan's "one pragmatic exception" verbatim.
- Flagged-but-not-refactor-scope: saveCredentials assumes the
directory exists, execFile('open', url) is macOS/BSD-only,
credentials path is hardcoded, fetchProfile enrichment runs
only on first login, schema validation on both load and save,
two named errors (InvalidAuthorisationCodeError,
TokenExchangeFailedError), AnthropicClient.ts JSDoc at line 15
incorrectly claims to "own auth".
First draft of the Auth walk conflated two separate blocks. I wrote that `AnthropicAuth` "becomes an internal collaborator" of the Client block and stops being exported, and proposed an async `createSession` factory to preserve eager login at startup. Both are wrong. Stephen's correction: login has nothing to do with Session. Session holds a reference to Client; it does not construct Client or trigger Client's auth. Login is a Client-block concern the consumer triggers before Session exists. Reading the plan Client block sentences together, "Owns authentication" and "You hand it a token source" are compatible if the OAuth CODE lives in the Client block's namespace but the Client CLASS still takes a token source. The consumer builds the token source by calling the Client block's auth helper and hands the result to the Client. That is exactly what the CLI does today at main.ts lines 58-63. The refactor does not change the consumer-wired pattern. The async-factory recommendation was chasing a problem the plan does not pose. Why I got it wrong matters more than the correction. Two failures: 1. I read "Client block owns OAuth" as "Client class encapsulates OAuth construction" and started designing an API change. The plan's "hand it a token source" sentence rules that out but I paired the two sentences as tensions to resolve rather than reading them together. 2. I reached for Session as the home for the async factory because Session is the top-level "the thing I am talking to" object. But Session holds a reference to Client; the direction of ownership is Session -> Client, not the other way. Auth lives on the Client side of that edge and nothing Session does touches it. Recording the correction in full instead of silently editing so the next session can see what the wrong mental model looked like and what the right one looks like side by side.
The ten-file walk finished with no runtime flow written down. The plan has nouns but no verbs. Drafting a pseudocode end-to-end and mapping each step to a plan block gives a clean mapping for eight of the nine blocks. Session is the one that does not map: it has zero operations in the pseudocode. Stephen's two-point framing (add session consideration to the SDK, persist and hydrate only in the CLI) is the intent behind the plan's Problem 1 and Problem 2 and makes Session collapse to a state shape with no methods. The plan's Session block is wrong under this framing and has to be rewritten. This commit records the discovery so the next session has it in writing.
The old Session block described a class that bundles Client, Conversation, and durable config and offers query(input). Stephen's two-point framing makes that wrong: point 1 is "add session consideration (accumulation of messages) to the SDK"; point 2 is "persist and hydrate only in the CLI". Neither point asks for a Session class. Session is the name for the SDK's in-memory state across queries, and that state is exactly two things: the Conversation and the durable config. Both have homes elsewhere (Conversation is a block, durable config is usage options). There is no Session class to construct. Rewritten as Session (state, not a class) so a reader scanning the blocks list still finds it in the expected place but reads the correct definition. The entry explicitly says there is no SDK class called Session and no session.query(input) method. The state is held by whatever constructs the blocks; in the default assembly today that is the CLI. The entry also flags that the SDK's query entry point shape (free function, factory, bound helper) is the next design decision, because the mapping exercise made clear there IS something the CLI calls, but its shape is still open. Problem 1's phrasing (The SDK can have a session object) is updated in the same edit because leaving it alone would contradict the Session block rewrite. The replacement says The SDK holds session state in memory and names the two things (Conversation and durable config), which matches the block rewrite. The runtime flow discovery that led to this correction is recorded in commit c670d2b on the session log.
The earlier plan described an agent session concept as a class called Session that bundled Client, Conversation, and durable config and offered query(input). That description required the reader to already hold the agent session concept in their head. A fresh instance reading the plan cold reconstructed the wrong concept (framework-session, like HTTP or database sessions) to fit the class shape, and the error cost five hours of real-time back-and-forth to detect and fix. This rewrite is a structural response to that failure, not a patch on individual sections. Every fix is concrete: 1. The refactor's two goals go at the top of the document, verbatim, in plain English, before any other prose. A reader who reads only section 1 knows what the refactor does. 2. What an agent session is is now the second section, read before any block is described, and it includes both an IS list (one ongoing conversation with the model) and an IS NOT list (not HTTP, not database, not REPL, not a class, not a session id, not a session file). A fresh reader cannot pattern-match to the wrong domain without walking past an explicit denial of that pattern. 3. A glossary pins every domain term to a precise meaning and includes a words NOT to use without qualification ban list. Session without agent is banned. State meaning the agent session is banned. Bundles and wraps when describing blocks are banned, because those verbs primed the previous failure. Each ban has a replacement. 4. Every block description opens with For (what it is for) before Does (what it does) and Not (what it is not). A reader who understands the purpose can derive the methods; a reader who only has the methods will reverse-engineer a purpose that may not match, which is how the previous Session block misled its later readers. 5. The blocks list has no Session entry at all. Instead, the top of the blocks list has a There is no Session block callout that directs the reader to What an agent session is and explains that the agent session is a Conversation plus a durable config held by the consumer, not a class. 6. Non-class blocks (Turn runner, Query runner) are explicitly marked as named logic and not long-lived objects. 7. The Why this plan is written this way section names the five-hour failure concretely, explains the instance continuity problem (plans must carry concepts across instance boundaries because every instance that reads a plan is a fresh reader), and lists concrete rules with concrete failure modes they address. The closing note says: if a future instance is tempted to add an abstract warning about drift, add a concrete fix instead. Abstract warnings lost to concrete drift in the previous version of this plan and will lose again if they are all that is added. The old plan content survives in git history at the previous commit on this branch. This commit replaces it in full with a document whose goal is to carry the agent session concept across instance boundaries without requiring the reader to already hold it. The runtime flow discovery and the five-hour failure itself are recorded in the session log at .claude/sessions/2026-04-10.md in commits c670d2b and earlier. The session log carries the reasoning; the plan carries the outcome. The link between them is deliberate and stated in the plan's rules.
The old section defined an agent session as an object shape (a Conversation plus durable config). That framing invited the bundle reading: a Session is a thing that holds the blocks. The correct framing is that an agent session is a usage pattern of the SDK, where the SDK holds a Conversation across queries and mutates it as turns happen, and the caller does not rebuild the message list. Specific changes: 1. What an agent session IS is now one sentence: a usage pattern where the SDK holds a Conversation across queries and mutates it as turns happen. The motivation paragraph explains why the pattern matters (somebody has to maintain the message list; the pattern decides whether that somebody is the SDK or the caller). 2. The old object-graph sentence (that growing message list plus the settings that stay the same is the agent session) is gone. The settings are not part of the session; they are durable config the caller holds separately. 3. A new IS NOT bullet, Not a bundle of SDK blocks, addresses the specific error the earlier plan and its rewrite made: treating Client, Tool registry, Control channel, and so on as agent-session-scoped. They are SDK infrastructure that exists in the without-concept case too. 4. A new subsection, What code would change if the agent session concept were added or removed?, gives the operational definition. Three push sites in the turn runner. Everything else is unchanged. With the concept the SDK runs the turn loop and pushes messages; without it the caller runs one turn at a time and manages its own list. Tools still work in the without case. Approvals still work. Control channel still works. The concept is only about who maintains the message list and who owns the turn loop. This is a targeted edit, not a full rewrite. The remaining items from the previous proposal list are still pending and will be done in separate commits as each is confirmed.
Ten places in the plan used agent-session-scoped or session-scoped language to describe blocks that are not scoped to sessions at all. That framing conflated two distinct concerns: the agent session concept (whether the SDK mutates the Conversation across queries) and the stateful SDK (whether blocks are constructed once or reconstructed per call). The Client, Tool registry, Control channel, and Approval coordinator exist whether or not the session pattern is in use; their lifetime is the consumer's choice, not the session's. Changes in this commit: 1. Problem section. The 'second consequence of the same missing concept' paragraph that folded per-call reconstruction into the session problem is rewritten as a separate problem: the old SDK is stateless. It is related to the session concern because both touch the same code, but it is not a consequence of the missing session concept. A stateless SDK would be wasteful even in the one-shot case. 2. Principles section. The Per-agent-session construction principle is renamed Stateful SDK, not per-call reconstruction. The Client, Tool registry, Control channel, and Approval coordinator are described as stateful blocks, not session-scoped. The principle now explicitly notes that stateful SDK blocks would be valuable in the one-shot case. 3. What the consumer does. The paragraph claiming the Conversation plus durable config IS the agent session (bundle framing) is removed. Replaced with a paragraph that says the consumer holds all the blocks as long-lived fields across its SDK usage, none of them is the session, and the session pattern is what happens when the SDK mutates the Conversation (not a set of objects). 4. Running a query. The list item agent-session-scoped collaborators is replaced with blocks and config the consumer holds across queries. 5. What is not in the SDK. The per-query construction bullet is rewritten to refer to stateful blocks instead of agent-session-scoped objects. 6. Glossary. The Agent-session-scoped definition is replaced with Stateful block (a block that holds state and is constructed once, not scoped to agent sessions). The Agent session collaborators entry is removed entirely; the concept was wrong and does not need a replacement term. 7. Ban list. A new entry bans agent-session-scoped and session-scoped for describing blocks, pointing at the specific error this commit fixes. The State ban list entry is updated to remove the wrong replacement suggestion (the Conversation and durable config is not a valid substitute; that phrasing is the bundle framing in different words).
The old top section listed two bullets that were both about the agent session: adding the concept, and putting persistence in the consumer. The stateful SDK concern (blocks constructed once and reused instead of reconstructed per call) was not a top-level bullet at all, even though it is a distinct concern that touches the same code. This commit restructures the section into two concerns: 1. Add the agent session concept to the SDK. Folds the old bullet 2 (persist and hydrate in the consumer) in as a consequence, because persistence is a subpoint of the session concept. The bullet is written in usage-pattern language (the SDK mutates the Conversation on the consumer's behalf) rather than object-graph language. 2. Make the SDK stateful. New bullet. Names the specific blocks that become stateful. Explicitly notes that this concern is distinct from the session concept and would be valuable even in the one-shot case.
…result split, runner coordination Five block descriptions had specific issues that needed targeted fixes, plus three secondary cleanups for consistency. Client block. The old Does paragraph said the Client owns transport-identifying headers such as user-agent and SDK version with no explanation. The new Does paragraph names them as client-identifying headers (User-Agent, Anthropic SDK version headers, who is calling metadata) and explicitly contrasts them with anthropic-beta headers, which describe what features this specific request is using and are computed per request by the request builder. Stream processor block. The old Not said the processor hands assembled messages to the turn runner. That phrasing suggested the processor calls into the turn runner; it does not. The new wording makes the flow explicit: the turn runner calls the Client, hands the event stream to the processor, and reads the assembled message from the processor's final-message surface when it is ready. The stream processor is passive output; the turn runner is the driver. The Does paragraph now describes the processor's two output surfaces. Tool registry block. The old block said the registry formats the result as a tool_result block. A full tool_result needs a tool_use_id, which is a turn runner concern (only the turn runner sees the original tool_use block). The new split says the registry returns content blocks and the turn runner wraps them in the tool_result envelope with the corresponding tool_use_id. Both Does and Not are updated. For drops the for an agent session phrasing. Turn runner block. Old block said Not a long-lived object. Not a class with per-query instances. That was dismissive of what the turn runner actually does. The new Does paragraph lays out the 9-step sequence explicitly and names the local state the runner holds during a turn. The new Not paragraph explains what not constructed per turn means operationally: implemented as a function or a stateless method call, not an allocated object with a constructor. Query runner block. Same treatment. Does is a 5-step sequence. Not explains the same construction distinction. Secondary cleanups: Conversation Not drops the two halves of an agent session bundle framing. Approval coordinator Not changes agent session to Control channel for lifetime. Control channel Not drops the trailing one instance per agent session sentence. Note on process: the first PreviewEdit I composed for this commit used wrong line numbers and I caught the error in the diff before applying. The second attempt used correct line numbers verified against a fresh Range read.
Every block has a construction lifetime, but the block descriptions deliberately do not cover lifetimes; they cover responsibility (For / Does / Not). Lifetimes belong in their own section so the reader can look up the whole picture in one place instead of hunting across nine block descriptions. The new section sits between The blocks and What the consumer does. Four categories: 1. Process-lived. Client and Tool registry. Both hold expensive state (auth / connection pool, compiled schemas) that is worth sharing for the whole process. 2. Consumer-chosen. Conversation, durable config, Control channel, Approval coordinator. Their lifetime is up to the consumer. 3. Per-query. Abort controller and per-query input. Constructed when the query starts, discarded when it finishes. 4. Not a construct. Request builder, Turn runner, Query runner, Stream processor. Named logic called per invocation, not allocated instances. The section closes with the distinguishing rule: blocks that hold state worth reusing are constructed once by the consumer and reused, while blocks that are pure logic over other blocks are called per-invocation. Plus a reminder that per-query object churn in the old code was a symptom of the stateless SDK problem, not the missing session concept.
…into three The old section had two problems that conflated the agent session concept with SDK statelessness. Problem 1 was titled the SDK did not hold the agent session (bundle framing) and its body described the fifteen-field options rebuild, which is about statelessness, not about whether the SDK mutates the Conversation. The stateless SDK problem was demoted to an afterthought paragraph. Three distinct problems now: 1. The SDK did not mutate the Conversation. The division-of-labour issue: the caller was maintaining the message list when it is the SDK's job. 2. The SDK was stateless. The fifteen-field options description moves here. Per-call reconstruction cost. Stateful SDK is valuable independent of the session concept. 3. The SDK was reading and writing files for conversation data. Renamed from for the agent session and for agent session state.
The turn runner was doing multi-turn work. Its Does had nine steps including inspecting the assembled message for tool_use blocks, dispatching each tool via the registry and approval coordinator, constructing the tool_result message, and pushing it into the Conversation. None of that is one turn. A turn is one request-response cycle. Dispatching tools and feeding the result back creates the NEXT request, which is a new turn. That is the query runner's concern. The cleaner division: - Turn runner: one request-response cycle. Reads wire view, builds request, merges signal, streams, processes, pushes the assembled assistant message, returns the message and stop reason. Seven steps. No tool dispatch, no tool_result construction, no looping decision. - Query runner: owns the turn loop and tool dispatch between turns. After each turn runner call, if the stop reason is tool_use, dispatches each tool_use block (approval if required, registry execute, content blocks back, transform hook, tool_result wrapping with matching tool_use_id), assembles a user-role message carrying the tool_result blocks, pushes it to the Conversation, loops again. Two additional things Stephen asked about are pinned in this commit: 1. Stream processor subscription lifetime. The turn runner's step 5 now explicitly says the tap subscription is scoped to this turn: set up at step 5 and torn down at step 7 when the turn returns. The turn runner's local state list includes the tap subscription handle. This answers the .on lifetime question without adding a separate section. 2. Turn runner Not gets three new denials: does not dispatch tools (the query runner handles that between turns), does not construct tool_result messages, does not decide whether to loop. The query runner Does is now a nested 4a/4b/4c/4d loop so the reader can see the branch on stop reason and the tool dispatch flow clearly. Its local state adds the pending tool uses being dispatched. Process note: the first attempt at this commit used git commit -am which swept three unrelated WIP files into the commit. Stephen caught it, ran git reset --mixed HEAD~1 to undo, and I redid this commit with explicit git add. Never use -a in this workflow.
… tighten save/restore Four targeted fixes inside the What the consumer does section: 1. Setting up an agent session is renamed to Setting up the SDK. The setup is SDK-level, not session-level. A consumer making one-shot calls with no session pattern still runs every setup step. The previous heading wrongly implied the steps were for the session. 2. The Conversation step (step 3) drops the for a new agent session / for a restored agent session framing. The Conversation is just constructed empty. Restoration from saved messages (if any) is a push-loop, which is mentioned but not as session-specific language. 3. Running a query gains a paragraph that shows the turn runner in the flow. The previous version described the entry point inputs and the handle but never explained what actually happens inside a query. Now it says: the query runner pushes the user message, enters the turn loop, calls the turn runner per turn, and handles tool dispatch between turns if the stop reason is tool_use. This matches the runner responsibility split committed in cdf4e5d. 4. Saving and restoring an agent session is renamed to Saving and restoring the conversation. The only thing the consumer saves and restores for a conversation is the messages. The old text said the consumer reconstructs its durable config from its saved settings and reconstructs its Tool registry from its tool list, both of which are wrong: those are CLI state that gets reloaded at startup by running the Setting up steps, they are not saved data. The new text makes that explicit: the only thing that needs saving is the messages; the blocks are reconstructed from the consumer's own code. The circular sentence The restored agent session is functionally identical to the original because the state is identical is removed entirely because it said nothing. To save is simpler now: read out the messages, write them wherever. To restore is shorter: run the Setting up steps, push the saved messages into the Conversation at step 3, done. Process note: this commit used explicit git add on the single file, not git commit -a. The three unrelated WIP files stay in the working tree untouched.
…ribed once The runners and the stream processor were described as not constructed (named logic, called per invocation) and with subscriptions that were set up and torn down per turn or per query. Both wrong. The refactor's whole point is construct once, subscribe once. Changes in this commit: 1. Turn runner. Rewritten as a class constructed once by the consumer at setup. Constructor takes Client, Request builder, Stream processor, Tool registry, Approval coordinator as instance state. Reused via a run method. Step 5 no longer talks about subscribing to the processor's tap per turn; the processor's .on handlers are subscribed once at setup by the consumer, and the turn runner just calls process(rawIterable) and lets events fire on the existing handlers. Step 7 no longer tears anything down. Local state is method-local only. Not section says not constructed per turn, constructed once, reused. 2. Query runner. Same treatment. Constructed once, reused via run method. The query handle concept is removed entirely: step 1 and 2 that built the handle are gone. The consumer calls queryRunner.run(...) and awaits the return. Events fire on the already-subscribed .on handlers. Cancel goes through the Control channel. No handle, no per-query subscription, no per-query object. 3. Stream processor. Constructed once at setup. Exposes .on events (raw, deltas, lifecycle, final message) that the consumer subscribes to once. Same handlers fire for every stream the processor handles. Not subscribed and unsubscribed per stream. 4. Lifetimes section rewritten. Three categories now instead of four. The first is Constructed once at consumer setup and covers every block in the SDK plus the durable config. The second is Per-query and covers the abort controller and per-query input data (not blocks). The third is Pure function and covers the request builder alone. The old Not a construct category that swept up Turn runner, Query runner, and Stream processor is gone; they are all blocks constructed once like everything else. The distinguishing rule at the bottom is rewritten to state the rule directly: every block constructed once, reused for the life of the process; consumer subscribes once, never touches subscriptions again; per-query and per-turn subscription setup and teardown is the anti-pattern the refactor is removing. 5. Running a query section rewritten. No query entry point, no query handle, no per-query subscription. The consumer calls queryRunner.run(...) with the per-query input and awaits the return. Events flow through the .on handlers set up at setup. Cancel goes through the Control channel set up at setup. 6. Principles section: Observation and control are separate surfaces is rewritten to talk about .on subscriptions set up once at setup, not event taps on a query handle. 7. Two minor mentions of query handle fixed: the With the concept subsection of What an agent session is, and the Audit files bullet in What is not in the SDK. No new abstractions, no new API shapes. The refactor is not changing behaviour; it is only changing lifetimes (per-call to once-at-setup) and removing the drift surface of per-turn and per-query subscription dances. Whatever .on events exist in the current code remain; they become subscribe-once.
…t factory Two small fixes from your end-of-review pass. 1. Setting up step 3 (Conversation). The previous wording was overly detailed about pushing saved messages one by one. The point is simpler: the consumer owns the Conversation, the SDK mutates it during a query, and if the consumer has saved messages it sets the initial history at this point. Whether that is one push per message or a single set-history call is an implementation detail. 2. The What is not in the SDK bullet about a top-level class was too absolute. A convenience factory function like createDefaultAgent that constructs the default blocks and returns them does not bundle them into a class and is fine as a later addition. The bullet is updated to say that, and to mark the factory as out of scope for this refactor. The systemReminder removal from earlier this turn was reverted. The plan keeps systemReminder as before because the discussion concluded it has to stay: query runner tracks first-turn-only (loop state), turn runner takes it as a per-run argument, request builder does the wire-view injection after the cache marker.
… sections
Option 2 from the end-of-session discussion: surgical clean-up
that preserves what is still valid and excises what contradicts
the current plan.
1. Preamble blockquote added at the top. States that the plan
at .claude/plans/sdk-shape.md is authoritative, names what
this log still contains that is valid, and notes what was
excised and what was rewritten. Directs future readers to
use the log for code-map and discipline rules, and the plan
for design.
2. File 9 (AnthropicAgent.ts) walk rewritten in place. Raw code
observations preserved (current shape, 14-field
RunAgentQuery, history methods list, Conversation.load dead
code, historyFile removal). Wrong content replaced:
- Refactor mapping: now splits the 14 fields correctly
(durable config, Tool registry for tools, per-query input
on queryRunner.run) instead of the old 11 durable plus 3
per-query with tools in the wrong place.
- Four history methods: now call conversation.X directly,
not session.conversation.X, because there is no session
class.
- Per-query messages: pushed by the query runner, not the
turn runner, matching the runner responsibility split in
the current plan.
- createAnthropicAgent and IAnthropicAgent: the old proposal
of a createSession factory plus ISession interface
carrying query(input) is gone. New entry says the consumer
constructs the individual blocks directly and that a
convenience factory like createDefaultAgent is acceptable
but out of scope for this refactor.
3. Runtime flow discovery section entirely deleted. It contained
the pseudocode, the session-lived vs per-query vs
not-objects-at-all lifecycle split, the mapping of pseudocode
operations to plan blocks, and the initial two-point framing.
All of this is now covered by the plan itself, so keeping it
is duplication plus noise that misleads a future reader.
4. What is next section entirely deleted. It listed six stale
tasks, all either done or tracked differently now.
Not touched: the discipline rules, the raw observations from
file walks 1 through 8 and 10, the four design decisions from
the AgentRun walk, and the resolved questions Q3 Q4 Q5. All
still valid under the current plan.
Line count: log went from 1135 lines to around 820 lines. The
removed content is preserved in git history at the previous
commit.
The session log had stopped updating at the 6bb05a4 plan edit to the Tool registry block. Since then the session moved through the remaining file walks, the plan rewrites that landed the two-framing shape (agent session as usage pattern, stateful SDK), the session log surgical clean-up, playbook creation, and the first three phase 2 execution steps. The log now reflects all of that. Adds: a summary of the three arcs between 6bb05a4 and now (file walks, plan rewrites, clean-up + playbook), individual entries for the three phase 2 execution commits (StreamProcessor, ToolRegistry, TurnRunner), and a new Playbook execution progress subsection that tracks phase 2 step-by-step with the current state (steps 1-3 done, step 4 next, step 5 optional tidy-ups, phase 3 deferred to a later session). Updates: the Branch line in both Work completed and Working conventions, from docs/sdk-code-walk to feature/sdk-stateful-refactor (the branch was renamed mid-session once the plan settled and phase 2 execution began). Does not touch: the drift pattern rules, the discipline rules, the earlier commit entries, the file walk notes, the four AgentRun design decisions, or the Working conventions other than the Branch line. Those are all still accurate and rewriting them would be noise.
One query per `run` call. A query is one user ask turned into however many turns the model needs to answer it. Constructed once at consumer setup with its long-lived dependencies (turn runner, conversation, tool registry, approval state, control channel, durable config). Reused for every query with per-query input passed per call. Extracted from AgentRun.execute with exact behaviour parity across the four concerns that were tangled there: the turn loop, the tool dispatch between turns, the first-turn-only systemReminder lifecycle, and the cachedReminders injection on a fresh or post-compaction conversation. Key split from the current code: the tool registry API has been reshaped from a one-shot `execute(name, input, transform)` into a two-phase `resolve(name, input)` -> ready/not_found/invalid_input, with the ready branch carrying a `run(transform?)` closure. The closure captures the parsed input at resolve time so the handler is called with the already-parsed value. This matches the current AgentRun flow, which parses each tool_use input once up front in a resolve pass and threads the parsed data through the approval machinery to the handler. A `validate` method that returned the validation result alone would have forced a second safeParse after approval; the closure-based split avoids that. ToolRegistry tests are updated to cover both phases and include a dedicated single-parse regression test that counts safeParse invocations via a superRefine hook. The query runner owns: - cachedReminders injection on fresh or post-compaction conversations, pushing each reminder as a `<system-reminder>` block in front of the first user message. - Pushing the per-query user messages into the Conversation (the Conv merges adjacent user messages per the API's alternation rules). - The turn loop: calls ITurnRunner.run, resets systemReminder to undefined after the first turn, handles the stream-error case by sending `error` on the channel and returning, handles the empty tool_use retry (up to two attempts) before giving up with an error. - Tool dispatch: two-phase with the registry's resolve split. Phase 1 filters not_found and invalid_input into immediate tool_result errors, preserving Decision 3's asymmetry (not_found silent on the channel, invalid_input broadcasts `tool_error`). Phase 2 invokes the run closures, optionally gated on approval. With approval required, approval requests fire in parallel and closures run in the order approvals arrive (Promise.race). Without approval, closures run sequentially in the model's order. Both paths check `cancelled` between items. - query_summary, message_usage, and done channel sends matching the current AgentRun behaviour. The query runner does NOT close the channel. The channel is long-lived and owned by the consumer; closing it per query would break every subsequent query on the same SDK instance. AgentRun closes its per-run channel in its finally block; the query runner deliberately does not. Also adds ApprovalState.reset() so the shared instance can be reused across queries. AgentRun creates a fresh ApprovalState per run in its constructor and never needs this; QueryRunner holds a single instance and calls reset() at the start of each run to drop any cancelled state left over from a prior cancelled query. Stranded pending approvals from a cancelled query have already been resolved by handle(), so there is nothing else to reset. Adds `PerQueryInput` to public/types.ts (messages, optional systemReminder, optional transformToolResult, abortController) and `IQueryRunner` to public/interfaces.ts. Removes `ToolExecuteResult` in favour of `ToolResolveResult` plus `ToolRunResult`. Tests (14 new QueryRunner tests; 9 updated ToolRegistry tests): - Single-turn terminal exit: one turn, user+assistant history, done channel send; query_summary and message_usage emitted per turn. - Tool-use loop: handler runs with the already-parsed input, second turn includes the tool_result user message with matching tool_use_id. - Decision 3 asymmetry: tool_not_found is silent on the channel; tool_invalid_input broadcasts tool_error. - Handler error: broadcasts tool_error and emits is_error tool_result. - systemReminder lifecycle: present on first turn request body and query_summary, absent on second turn. - cachedReminders: injected on a fresh conversation, not injected on a populated one. - Approval required + approved: handler runs, second turn fires. - Approval required + rejected: handler does not run, tool_result carries the rejection reason. - Long-lived instance: two queries in sequence, channel stays open (close count stays at 0), Conversation accumulates across queries. - ApprovalState.reset: a cancelled flag from a prior query is cleared by QueryRunner.run's reset() call so a subsequent run proceeds. ToolRegistry tests include a single-parse regression test that counts safeParse invocations across resolve() and run() to ensure the parsed value is threaded through the closure without re-parsing. All 110 tests pass (96 previously + 14 new QueryRunner; the 9 ToolRegistry tests are updated in place, net +1 vs step 2's 8). See .claude/plans/sdk-refactor-playbook.md phase 2 step 4.
The previous JSDoc had three problems a new reader would trip over: 1. "owns auth" was misleading. The OAuth flow (token acquisition, login, credential storage) lives in private/Auth/. This class only holds the refresh loop via TokenRefreshingAnthropic. A reader who takes "owns auth" literally looks here for the OAuth machinery and finds nothing. 2. The "first step of the refactor series in issue #232" pointer is stale. The refactor is now driven by .claude/plans/sdk-shape.md and .claude/plans/sdk-refactor-playbook.md, not a GitHub issue. The early-session commit be2554a flagged this as a pointer-update item; this commit resolves it. 3. The Client block description in sdk-shape.md draws an explicit line between client-identifying headers (user-agent, SDK version, "who is calling") and feature beta headers (anthropic-beta, "what features this specific request is using"). The previous JSDoc did not mention this split, so a reader trying to trace where anthropic-beta comes from might start here and not find it. It comes from the request builder, computed per request from the durable config. The new JSDoc is structured as two bulleted lists: what the class owns (refresh timing, HTTP transport, client-identifying headers) and what it deliberately does not own (OAuth flow, feature beta headers, the request body, the abort signal), with a pointer at each not-owned item to the block that does own it. Pointing at the plan instead of the issue number gives the next reader a live reference. Drops the historical "extracted from AnthropicAgent" / "previous AnthropicMessageStreamer wrapper removed" paragraph. That content was accurate at the time but it is session history, not architecture: a new reader does not need to know how the class got to its current shape, only what the current shape is. The `extends IMessageStreamer` is already visible from the class signature. No behaviour change. 110 tests still pass.
Phase 2 landed across five commits (422556e, c0a1335, 3c77d99, 99997a7, 5bc5a7d) so the log needs the corresponding close-out. Adds a new Design decisions from phase 2 execution subsection with Decision 5: the tool registry API is split into resolve() plus a run() closure, not a single execute() method. This decision was forced by a behaviour-preservation constraint in step 4 that the step 2 execute() signature could not meet. The first draft of step 4 proposed dropping the pre-filter (so invalid tool_uses would go through approval before failing); this was identified as a behaviour change and rejected. The recorded options (validate method, shouldRun callback, closure-based split) are there so a future reader understands why the registry has this specific shape and does not try to simplify it back to a single execute(). Rewrites the Playbook execution progress subsection to reflect phase 2 complete: steps 4 and 5a done, 5b/5c deferred (5b cosmetic, 5c waits for the CLI restore flow in phase 3). The step 4 entry now links to Decision 5 for the registry API revision. The phase 3 heading now lists the five open decisions the next session has to resolve at the start: AnthropicAuth export status, Auth/ directory move, ApprovalState to ApprovalCoordinator rename, AgentChannel to ControlChannel rename, and Conversation.setHistory shape. Updates the test-count line to 110 passing across eight spec files, with per-file counts. Notes that AgentRun.spec.ts passes verbatim, which is the main signal phase 2 preserved behaviour on the old code path, and that both MessageStream.spec.ts and StreamProcessor.spec.ts are kept through phase 2 with the old one dropped in phase 3.
…ction Decision 1 (AnthropicAuth export status) was Stephen's correction that the question should not have been raised at all: the filter "is this necessary for the refactor?" would have caught it. Stays as-is. The remaining four all got yes: Auth directory move to Client/Auth, ApprovalState rename to ApprovalCoordinator, AgentChannel rename to ControlChannel, and a single setHistory method on Conversation. The setHistory method becomes the first phase 3 commit because step 6 (CLI swap) consumes it.
The CLI needs to restore a saved conversation into a fresh Conversation on startup. setHistory replaces the entire internal items array with the given messages, no merge logic applied (the saved state is already in valid alternating order), no id tags (those are session-scoped and not persisted). Clears any existing history first. Prerequisite for step 6 which wires the CLI to construct blocks directly instead of going through createAnthropicAgent. 7 new tests: populate empty, replace existing, no merge logic, push after setHistory, push merge with last user message from setHistory, cloneForRequest respects compaction blocks, id tags from before setHistory are cleared. Total: 34 Conversation tests.
The CLI no longer uses createAnthropicAgent or the 14-field RunAgentQuery options bag. Instead, main.ts constructs the individual blocks once at setup: AnthropicClient, Conversation, StreamProcessor, ToolRegistry, ApprovalState, AgentChannel, TurnRunner, QueryRunner, and a mutable DurableConfig object. Key wiring decisions: - Channel is long-lived, shared across queries. Cancel fans out via a mutable AbortController reference so the single channel callback can reach whatever controller is active for the current query. - StreamProcessor events are forwarded to the channel once at setup. This preserves the AgentMessageHandler receiving all events through one path (unchanged). - Per-query handler via a mutable reference on the channel consumer port listener. A fresh AgentMessageHandler is created before each query with the current model, avoiding listener add/remove timing issues between the channel.send and the promise resolution. - History persistence moves to the CLI. loadHistory reads JSONL at startup into conversation.setHistory. saveHistory writes after each query completes. The SDK no longer touches the filesystem for conversation data. - Tool definitions and the transformToolResult hook are constructed once at setup (they reference the long-lived RefStore) and reused across queries. - runAgent.ts becomes a thin per-query wrapper: sets the cancel function, calls queryRunner.run with per-query input, catches errors, clears cancel and completes streaming. The old AgentRun code path still exists and compiles but is no longer called. Steps 7 onward delete it.
AgentRun, AnthropicAgent, createAnthropicAgent, and IAnthropicAgent are no longer called after step 6 swapped the CLI to the new blocks. AgentRun.spec.ts is deleted because its assertions are covered by QueryRunner.spec.ts (14 tests) and TurnRunner.spec.ts (5 tests). index.ts drops the createAnthropicAgent and IAnthropicAgent exports. interfaces.ts drops the IAnthropicAgent abstract class. 109 tests remain across 7 spec files.
History persistence is now the CLI's responsibility. The CLI reads JSONL at startup into conversation.setHistory() and writes after each query -- no SDK-level store needed. - Remove Conversation.load() method (replaced by setHistory) - Remove Conversation.load tests (3 tests, 31 remain) - Remove historyFile from AnthropicAgentOptions - Delete ConversationStore.ts (file-backed wrapper around Conversation)
StreamProcessor is the long-lived replacement. MessageStream was a per-stream EventEmitter that allocated fresh state on every API call; StreamProcessor holds accumulated state across the conversation and emits the same events through the shared instance. No code imports the class any longer -- only its own spec file did. 101 tests remain across 6 spec files.
RunAgentQuery, RunAgentResult, and AnthropicAgentOptions belonged to the createAnthropicAgent bundle (deleted in step 7). Nothing imports them. - Remove three type definitions from public/types.ts - Remove MessagePort import (only used by RunAgentResult.port) - Remove all three from index.ts import and re-export - Update Ref.ts JSDoc to reference PerQueryInput instead of RunAgentQuery
The old name described the data shape (state); the new name describes the role (coordinating approval requests, responses, and cancellation across the query runner's tool dispatch).
The channel carries control messages (events, approvals, cancellation) between the SDK and the consumer. The old name tied it to the deleted agent bundle; the new name describes the communication pattern.
Auth modules are part of the Anthropic client layer, not general SDK infrastructure. Nesting under Client/ reflects the dependency direction: the client owns auth, not the other way around.
The AgentRun class was removed in Phase 2, but its name lingered in JSDoc comments and inline notes across 11 source and test files. Every reference now points to the actual component that owns the behaviour being described (QueryRunner, TurnRunner, ToolRegistry, etc.). grep -r AgentRun returns zero hits.
The handler was recreated every loop iteration to pick up a changed model. Now it holds a reference to the shared DurableConfig and reads model, cacheTtl, and tools from it directly. The respond callback was wrapping consumerPort.postMessage with no added value. The handler now holds the port and posts approval responses itself.
ControlChannel no longer takes an onMessage callback in its
constructor. It exposes channel.on('message', ...) so callers
listen directly, symmetric with consumerPort.on('message', ...).
The dead ControlChannelFactory is deleted.
AgentMessageHandler takes the consumer port directly instead of
a respond callback, and reads model/cacheTtl/tools from the
shared DurableConfig reference instead of copying them out.
Constructed once before the loop, no per-query recreation.
Session logs from 03-14 through 04-08 replaced with one-line summaries: all covered merged work with no live decisions. Logs 04-09 and 04-10 trimmed to recovery content only. Two pieces of reasoning moved from the (now-trimmed) session log into sdk-shape.md where they belong: - New principle: the plan describes shape, not a feature list. Block descriptions name where responsibilities would live; the refactor creates the shape but does not add missing capabilities. - Tool registry resolve/run closure split documented with reasoning: single-parse invariant across the approval gate. QueryRunner: replaced isFirst flag with [first, ...rest] destructuring.
The registry always constructs BetaTool (name + description + input_schema), but typed the return as BetaToolUnion — a 22-member union where most members lack those properties. TypeScript could not narrow the access, causing 6 type errors in the test file. Changed wireTools to return BetaTool[] in both ToolRegistry and IToolRegistry. Split the 4-assertion wireTools test into one assertion per test. 104 tests pass, type-check clean.
bananabot9000
approved these changes
Apr 10, 2026
Collaborator
bananabot9000
left a comment
There was a problem hiding this comment.
The refactor delivers on all three stated problems: SDK mutates the Conversation, SDK is stateful, SDK doesn't touch the filesystem for conversation data.
What works well:
- Plan documents (
sdk-shape.md,sdk-refactor-playbook.md) are the strongest planning artifacts in this repo. The instance-continuity post-mortem is a concrete fix for a concrete failure mode. - Clean block decomposition: QueryRunner/TurnRunner/StreamProcessor/ToolRegistry each own exactly one responsibility. No leaking.
- ToolRegistry resolve/run closure pattern preserves single-parse invariant across the approval gate.
- CLI wiring in main.ts is explicit and readable. 15-field options → 4-field per-query input.
- Net -1869 lines. AgentRun, AnthropicAgent, ConversationStore, MessageStream, createAnthropicAgent all deleted. Good deletions.
Observations (non-blocking):
DurableConfigmutated in-place in the main loop — works but the type doesn't signal mutability.toolslives on bothDurableConfigandToolRegistry— double source of truth worth noting.- QueryRunner constructor: 7 positional args, at the edge.
saveHistoryon every query — fine now, note for large contexts.
104 tests. Approve.
Co-Reviewed-By: BananaBot9000 🍌
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #232.
This branch implements the three-problem refactor described in
.claude/plans/sdk-shape.md:Problem 1: the SDK did not mutate the Conversation. The SDK now runs the agent session pattern. The consumer holds a Conversation; the SDK pushes user messages, assistant responses, and tool results into it as turns happen. The consumer no longer maintains the message list by hand.
Problem 2: the SDK was stateless. Client, ToolRegistry, StreamProcessor, TurnRunner, QueryRunner, ControlChannel, and ApprovalCoordinator are all constructed once and reused across queries. The 15-field
RunAgentQueryoptions object is gone. Per-query input is just the user message, optional system reminder, optional transform hook, and abort controller. Durable config (model, tools, cache settings, etc.) is held across queries and mutated in place.Problem 3: the SDK was reading and writing files.
ConversationStoredeleted.historyFileremoved from options. Nofscalls in the SDK except credential storage in the Client block. The CLI owns all persistence.What changed
StreamProcessor,TurnRunner,QueryRunner,ToolRegistry,ApprovalCoordinator,ControlChannel— each with a behavioural interfaceAgentChannel→ControlChannel,ApprovalState→ApprovalCoordinator,Auth/moved underClient/Auth/AgentRun,MessageStream,ConversationStore,AnthropicAgent(the old bundle class), dead typesapps/claude-sdk-cliconstructs blocks once at setup, callsqueryRunner.run()per query.claude/plans/sdk-shape.mdis the authoritative design document51 commits. 104 tests pass across 6 spec files. Build and type-check clean.