Skip to content

Make the SDK stateful: agent session pattern, durable config, filesystem removal#235

Merged
shellicar merged 55 commits intomainfrom
feature/sdk-stateful-refactor
Apr 10, 2026
Merged

Make the SDK stateful: agent session pattern, durable config, filesystem removal#235
shellicar merged 55 commits intomainfrom
feature/sdk-stateful-refactor

Conversation

@shellicar
Copy link
Copy Markdown
Owner

@shellicar shellicar commented Apr 10, 2026

Closes #232.

This branch implements the three-problem refactor described in .claude/plans/sdk-shape.md:

Problem 1: the SDK did not mutate the Conversation. The SDK now runs the agent session pattern. The consumer holds a Conversation; the SDK pushes user messages, assistant responses, and tool results into it as turns happen. The consumer no longer maintains the message list by hand.

Problem 2: the SDK was stateless. Client, ToolRegistry, StreamProcessor, TurnRunner, QueryRunner, ControlChannel, and ApprovalCoordinator are all constructed once and reused across queries. The 15-field RunAgentQuery options object is gone. Per-query input is just the user message, optional system reminder, optional transform hook, and abort controller. Durable config (model, tools, cache settings, etc.) is held across queries and mutated in place.

Problem 3: the SDK was reading and writing files. ConversationStore deleted. historyFile removed from options. No fs calls in the SDK except credential storage in the Client block. The CLI owns all persistence.

What changed

  • New blocks: StreamProcessor, TurnRunner, QueryRunner, ToolRegistry, ApprovalCoordinator, ControlChannel — each with a behavioural interface
  • Renamed: AgentChannelControlChannel, ApprovalStateApprovalCoordinator, Auth/ moved under Client/Auth/
  • Deleted: AgentRun, MessageStream, ConversationStore, AnthropicAgent (the old bundle class), dead types
  • CLI updated: apps/claude-sdk-cli constructs blocks once at setup, calls queryRunner.run() per query
  • Plan + session logs: .claude/plans/sdk-shape.md is the authoritative design document

51 commits. 104 tests pass across 6 spec files. Build and type-check clean.

sdk-shape.md is the authoritative plan for the SDK refactor. It leads
with the two underlying problems that drove the rewrite (the SDK had no
durable in-memory session concept, and the SDK was touching files), and
lays out the building-block design that follows. It supersedes the
six-step technical plan in issue #232.

The session log is carried in its handoff state. Today's code-walk
entries will land as separate commits on this branch.
Four additions, all derived from answers given in the File 1 walk of
AnthropicClient.ts. None are new design; they record decisions that
were discussed but not yet written down.

1. New principle: substitution happens through behavioural interfaces.
   Interfaces carry behavioural contracts, not just type signatures.
   LSP applies, so 'same shape' at the type level is not sufficient.

2. New principle: not every helper is a building block. The named
   blocks are the block-level identities; helpers inside a block are
   wiring, not blocks.

3. Client block description: names the OAuth flow as inside the block,
   and records credential storage as the one pragmatic exception to
   the no-filesystem rule (access to the API, not state from or to
   the API).

4. Closing section: notes that the superseded #232 framing still
   lives in code comments (starting with AnthropicClient.ts) and
   should be updated as the refactor touches each file.
Client and Request builder blocks both used 'header set' to describe
what accompanies the body at call time. The phrase is ambiguous: it
can mean literal HTTP headers, or call-level options including abort
signal and timeout. The current AnthropicClient.stream() signature
takes Anthropic.RequestOptions (the broader meaning), but the plan's
wording did not reflect that. Tightened both block descriptions to
'request options (headers plus per-call concerns such as abort signal
and timeout)' so the boundary is explicit.
The 2026-04-09 session logged one drift failure (importing outside
concerns into the walk). This session surfaced a different one:
treating the code as ground truth and asking whether the plan matches
it, instead of treating the plan as the target and asking what the
implementer needs to know to reach it.

The symptom was a 'plan promises X / code delivers Y / match yes-or-no'
table on Conversation.ts that turned every missing piece into a
numbered question asking whether the refactor was in scope. The plan
is the scope. There was no question.

Captured the pattern at the top of the log so a compacted-me reads the
rule before writing any more walk output. Also filed the work already
done this session (three commits, four files walked) and the next
files in the queue, so the state survives another compaction cleanly.
Walking MessageStream.ts, I produced a long list of 'gaps an
implementer will fill' covering raw event taps, per-TTL cache split,
usage.iterations, non-streamed entry points, and per-block lifecycle
events. Every item on that list was new functionality the plan's
Stream processor description names as belonging to the block. None of
it was refactor work.

The plan can be read two ways: 'block X is responsible for A, B, C,
D' as an implementation requirement, or 'block X is the home A, B, C,
D would live in if they existed' as a shape statement naming where
future features go. Stephen confirmed the second reading is the
correct one for this refactor. The refactor extracts the blocks and
preserves current behaviour. Feature work is follow-up.

Captured the rule alongside the existing 'plan is the target, not
the code' rule at the top of the 2026-04-10 log so both drift
patterns survive the next compaction. The two rules describe
different failure modes of the same underlying plan-vs-code
confusion and both need to be filtered out of walk output.
Stephen was angry (correctly) that the AGENT_SDK_PREFIX question was
re-asked in this session, because it had been answered previously and
the answer was never captured in writing. The question resurfaced after
compaction and wasted his time.

The root cause: I treated the session log as a place to record what
happened, not as the durable memory for decisions. When an answer was
given in conversation, I moved on instead of writing the answer down in
the same turn. Compaction then ate the in-memory version and left only
the unanswered question.

Added two new discipline rules to the session log:

1. Answers get written into the log in the same turn they are given.
   PreviewEdit -> EditFile -> commit before proceeding. Not optional.

2. Questions to Stephen are in plain English. If a question cannot be
   phrased without quoting a line from sdk-shape.md or naming a
   principle by its title, the question is not ready to ask. Refined
   Q4 as a worked example: jargon version vs plain English version.

Also recorded the resolutions to Q3, Q4, and Q5 in a new 'Resolved
questions' section so the answers are permanently captured:

- Q3: request builder maps inputs to what the HTTP request needs;
  the wording of its return shape is not a choice point.
- Q4: consumer-facing options stay flat; grouping compaction fields
  into a single object is a separate change, not refactor scope.
- Q5: AGENT_SDK_PREFIX is required by the API, not optional, not
  overridable, not a consumer policy decision. Stays hardcoded.
The question 'is AGENT_SDK_PREFIX a consumer policy decision or is it
required?' has now come up twice: once in the 2026-04-09 session where
it was captured as an open question and Stephen answered yes-update-
the-plan and the plan was never updated; once again in this 2026-04-10
session where I re-asked it during the RequestBuilder.ts walk.

The plan is not only what we want to change: it is also what could
reasonably confuse someone reading it. A question that keeps resurfac-
ing is a confusion point and belongs in the plan, even if the fact it
documents is an API-level constraint rather than a design decision.

Added a sentence to the Request builder block making the constraint
explicit: the system prompt always begins with a fixed SDK identity
prefix (AGENT_SDK_PREFIX); this is required by the Anthropic API; not
a consumer policy decision, not optional, not overridable. Anyone
reading the plan without a session log in the other hand now gets the
answer one step away instead of re-asking.
The AGENT_SDK_PREFIX question has now been answered twice. The first
time, in 2026-04-09, Stephen said yes update the plan and the plan was
not updated. The second time, in this session, he was correctly angry
and I finally did the plan edit (commit 0bdfa49).

The surface failure was 'answers not recorded', which is covered by the
first discipline rule. But there is a second failure underneath it: the
plan itself was not treated as a place where confusion-prevention facts
belong. When an answer is 'that is an API constraint', I assumed the
answer did not merit a plan edit because it is not a design choice. It
still belongs in the plan because the plan reader will otherwise ask
the same question.

Added a new discipline rule to the session log: the plan documents
confusion points, not only change points. A question that keeps
resurfacing is evidence of a confusion point and is sufficient reason
to add a sentence to the plan, even if the fact being documented is
an API-level constraint rather than a design decision.

Also extended the Q5 resolution with the full history: when it was
asked, when it was answered, when the plan was (finally) updated, and
which plan-edit commit captured it. Future me reading this log gets the
whole story, not just the one-line answer.
Files 5 (MessageStream) and 6 (RequestBuilder) were walked earlier in
this session but never captured in the log. The Tool registry walk
(defineTool.ts + the tool type definitions in public/types.ts) has
just finished. Capture all three rounds now, while they are still in
head, before the next context compaction eats them.

Also catch the commits list up: 4d8668e, 949ad4b, 96f5ba1, 0bdfa49,
eb8a2d3 had all landed but none were listed in the session log's
commits section.

Update the \"what is next\" section to reflect the remaining walk:
AgentRun (single-file round, check in before starting), then
AnthropicAgent, then auth, then the appendix work on sdk-shape.md.
Stephen caught that my previous Tool registry walk notes were muddled.
I was framing the work as \"extract the runtime out of AgentRun and
preserve current behaviour\", which buried the point: the refactor
creates the Tool registry block. It does not exist today. There is no
file, no class, no cohesive thing that owns tools. The runtime is
spread across AgentRun and RequestBuilder with neither of them knowing
that it is tool-registry work they are doing.

The drift pattern 2 rule (\"refactor is not a feature build\") is
about adding NEW capabilities to blocks, not about whether the block
itself is new. Creating the block is the refactor. Extending the block
is not.

Also capture Stephen's concrete insight: toJSONSchema is called from
RequestBuilder.ts on every request today. Once there is a registry,
schema conversion happens once at registration and the cached result
is handed to the request builder when it needs the wire schema. This
is not an added capability; it is what falls out naturally when the
registry exists. It also removes per-call work from a hot path that
should be pure.

The reframed notes now lead with \"the refactor creates the block\",
list what exists today, list the responsibilities of the new block
(preserving current behaviour), and list what is explicitly not in
scope (per-operation approval, standalone public API, changing the
consumer-facing tools array shape).
The Tool registry block's description previously said \"validates
input against the schema\" with \"schema\" unqualified. A plan reader
would reasonably assume that meant the Zod validation schema and not
think about the wire JSON Schema at all. I fell into exactly that
reading when writing the Tool registry walk notes: I left schema
conversion in the request builder by default, because the plan did
not say the registry owns it.

Stephen corrected that with a concrete example: the request builder
should not be calling toJSONSchema, because tool schemas do not change
between requests and conversion is therefore per-tool work, not
per-request work. That only works if the registry owns the schema
in both its Zod form (for validation) and its JSON Schema form (for
the wire), and converts at registration time.

This is not a new capability. It is what naturally falls out once the
Tool registry exists as a cohesive block instead of runtime code
smeared across AgentRun.ts and RequestBuilder.ts. The plan edit makes
that fact visible to a plan reader who does not have the session log
in their other hand.

Under the plan-is-confusion-prevention rule added earlier this session:
if I got confused reading it, the next reader will too, so the
clarification belongs in the plan, not just the walk notes.
AgentRun.ts is the untangling target: four plan blocks (query runner,
turn runner, tool registry runtime, approval coordinator) are all
tangled in one 270-line class. Three rounds of walking produced the
findings in file 8 of the log, condensed here: class-field ownership
after extraction, method-to-block mapping, ten implementer gotchas,
and the after-extraction shape of each of the four blocks (what it
holds, receives, returns, calls).

The walk also generated four design decisions that have to be
recorded now, not left as 'I would decide these' notes for the next
session to guess at. Stephen's feedback earlier in this turn: writing
'I would decide X' is the same failure mode as Q5 (answer given, not
recorded, re-asked later). The four decisions are:

1. Abort signal assembly stays at the turn runner (builder stays pure).
2. transformToolResult stays per-query input (not session durable).
3. Tool-not-found channel-send asymmetry is preserved (refactor-only).
4. No imperative cancel() on approval coordinator (refactor-only).

Decisions 1 and 2 are genuine choices; both defensible alternatives
are presented with pros and cons so the next reader can evaluate the
call instead of having to trust it. Decisions 3 and 4 collapse under
the refactor-is-not-a-feature-build rule.

Also correct a misattribution in the file 7 (Tool registry) walk: I
had written that the empty-tool-use retry at AgentRun.ts line 118 was
tool-runtime work. That framing was wrong. The retry re-issues the
whole turn's HTTP request, not an individual tool call, so it is a
query runner concern. The file 7 bullet now says so, and the file 8
walk covers it in the right block.

Also catch the commits list up with 93ecad7, 7ab4bfd, and 6bb05a4
(the three commits since the log was last updated).

What-is-next updated: files 1-8 walked, remaining work is AnthropicAgent
(short), auth files (confirmation), appendix behavioural-contract
seeding, and the refactor playbook.
…rule

Stephen asked me to verify the reasoning behind each decision is
captured, not just the decisions themselves, before context
compaction happens. Audit found two gaps.

Gap 1: Decision 3 (preserve tool-not-found asymmetry) had the
decision and the refactor-only rationale, but no capture of the
candidate reason the asymmetry might exist in the first place. A
future maintainer deciding whether to unify needs to know the
hypothetical intent reading: missing tool is an SDK or consumer
config bug the consumer cannot react to live (they wrote the tool
array themselves); invalid input is a model behaviour the consumer
may legitimately want to observe. That reasoning is now in the log
as part of the decision 3 block, so a unification pass has material
to evaluate against.

Gap 2: The new discipline rule about not laundering decisions as
questions was never committed. Stephen caught the failure pattern
in this turn (four 'questions' in the AgentRun walk that were all
decisions-with-recommendations), but the lesson was only in the
commit history and the conversation, not in the discipline section
of the session log. Adding it as 'Rule: do not launder a decision
as a question' alongside the existing question-handling rules. The
rule captures the failure mode, two signs of it, the test for
whether a question is real, and a worked example from this session.

Both fixes land together so a single commit precedes the carry-forward
summary at ~73% context.
The misshapen Session block. Short walk because the untangling work
lived in AgentRun, not here.

Findings recorded:

- The 14-field RunAgentQuery IS the "rebuild per call" problem the
  refactor exists to solve. Every field except messages is work the
  consumer redoes on every call even though almost none of it
  changes between calls.

- Refactor mapping 11 durable / 3 per-query, derived from the plan
  Session and query runner blocks. Durable: model, thinking,
  maxTokens, systemPrompts, tools, betas, requireToolApproval,
  pauseAfterCompact, compactInputTokens, cacheTtl, cachedReminders.
  Per-query: messages, systemReminder, transformToolResult.

- messages: string[] is "new messages this query", not the whole
  conversation. The Conversation lives on the Session.

- The four history methods (getHistory, loadHistory, injectContext,
  removeContext) are Conversation-access stand-ins. Plan Session
  block exposes .conversation directly so all four vanish. Zero
  tests for them exist.

- Conversation.load() is dead code the moment ConversationStore is
  deleted. Exactly one runtime caller. Three test call sites at
  Conversation.spec.ts lines 219, 228, 236 get removed or rewritten
  against push() as part of the delete.

- historyFile leaves AnthropicAgentOptions entirely. Consumer owns
  the save/restore flow. SDK knows nothing about it.
Confirmation walk; the plan already captures the Client block as
owning auth. Findings recorded:

- 20 files under private/Auth plus TokenRefreshingAnthropic in
  private/http. Most are tiny OAuth-ceremony helpers (PKCE, state,
  callback server, schema, consts); per the "not every helper is a
  building block" principle these are wiring, not substitution
  surfaces.

- AnthropicAuth.getCredentials() is the single entry point: load,
  login-if-missing, refresh-if-expired, save. login() has local
  (HTTP callback on port 3001, opens browser) and manual (paste
  code from stdin) modes.

- TokenRefreshingAnthropic overrides apiKeyAuth/bearerAuth to call
  a token getter per request. The Anthropic SDK constructor
  silently discards non-string auth values so the subclass
  captures the getter before super() drops it. This mechanism is
  what makes refresh invisible and the refactor preserves it.

- AnthropicAuth is currently exported as public API at
  index.ts:31 despite living under private/Auth/. CLI wires it up
  manually (main.ts lines 58-63): construct, eager getCredentials,
  wrap as authToken closure, pass to createAnthropicAgent.

- Per plan Client block ownership, AnthropicAuth folds in as an
  internal collaborator and stops being exported. Consumer
  substitution happens at the Client block interface, not at an
  Auth sub-interface. Folding is restructuring, not new
  capability, so in refactor scope.

- Eager login UX must be preserved. Recommendation is an async
  createSession factory (`const session = await createSession(...)`)
  over an explicit session.login() method, because the async
  factory matches the 95%-case-is-simple principle and hides the
  step from consumers who do not care. Final call is the
  implementer's.

- File I/O is exactly two functions (loadCredentials,
  saveCredentials) at ~/.claude/.credentials.json. Matches the
  plan's "one pragmatic exception" verbatim.

- Flagged-but-not-refactor-scope: saveCredentials assumes the
  directory exists, execFile('open', url) is macOS/BSD-only,
  credentials path is hardcoded, fetchProfile enrichment runs
  only on first login, schema validation on both load and save,
  two named errors (InvalidAuthorisationCodeError,
  TokenExchangeFailedError), AnthropicClient.ts JSDoc at line 15
  incorrectly claims to "own auth".
First draft of the Auth walk conflated two separate blocks. I
wrote that `AnthropicAuth` "becomes an internal collaborator" of
the Client block and stops being exported, and proposed an async
`createSession` factory to preserve eager login at startup. Both
are wrong.

Stephen's correction: login has nothing to do with Session.
Session holds a reference to Client; it does not construct Client
or trigger Client's auth. Login is a Client-block concern the
consumer triggers before Session exists.

Reading the plan Client block sentences together, "Owns
authentication" and "You hand it a token source" are compatible
if the OAuth CODE lives in the Client block's namespace but the
Client CLASS still takes a token source. The consumer builds the
token source by calling the Client block's auth helper and hands
the result to the Client. That is exactly what the CLI does
today at main.ts lines 58-63.

The refactor does not change the consumer-wired pattern. The
async-factory recommendation was chasing a problem the plan does
not pose.

Why I got it wrong matters more than the correction. Two
failures:

1. I read "Client block owns OAuth" as "Client class encapsulates
   OAuth construction" and started designing an API change. The
   plan's "hand it a token source" sentence rules that out but I
   paired the two sentences as tensions to resolve rather than
   reading them together.

2. I reached for Session as the home for the async factory
   because Session is the top-level "the thing I am talking to"
   object. But Session holds a reference to Client; the direction
   of ownership is Session -> Client, not the other way. Auth
   lives on the Client side of that edge and nothing Session does
   touches it.

Recording the correction in full instead of silently editing so
the next session can see what the wrong mental model looked like
and what the right one looks like side by side.
The ten-file walk finished with no runtime flow written down. The
plan has nouns but no verbs. Drafting a pseudocode end-to-end and
mapping each step to a plan block gives a clean mapping for eight
of the nine blocks. Session is the one that does not map: it has
zero operations in the pseudocode. Stephen's two-point framing
(add session consideration to the SDK, persist and hydrate only in
the CLI) is the intent behind the plan's Problem 1 and Problem 2
and makes Session collapse to a state shape with no methods. The
plan's Session block is wrong under this framing and has to be
rewritten. This commit records the discovery so the next session
has it in writing.
The old Session block described a class that bundles Client,
Conversation, and durable config and offers query(input).
Stephen's two-point framing makes that wrong: point 1 is "add
session consideration (accumulation of messages) to the SDK";
point 2 is "persist and hydrate only in the CLI". Neither point
asks for a Session class. Session is the name for the SDK's
in-memory state across queries, and that state is exactly two
things: the Conversation and the durable config. Both have homes
elsewhere (Conversation is a block, durable config is usage
options). There is no Session class to construct.

Rewritten as Session (state, not a class) so a reader scanning
the blocks list still finds it in the expected place but reads
the correct definition. The entry explicitly says there is no
SDK class called Session and no session.query(input) method. The
state is held by whatever constructs the blocks; in the default
assembly today that is the CLI. The entry also flags that the
SDK's query entry point shape (free function, factory, bound
helper) is the next design decision, because the mapping
exercise made clear there IS something the CLI calls, but its
shape is still open.

Problem 1's phrasing (The SDK can have a session object) is
updated in the same edit because leaving it alone would
contradict the Session block rewrite. The replacement says The
SDK holds session state in memory and names the two things
(Conversation and durable config), which matches the block
rewrite.

The runtime flow discovery that led to this correction is
recorded in commit c670d2b on the session log.
The earlier plan described an agent session concept as a class
called Session that bundled Client, Conversation, and durable
config and offered query(input). That description required the
reader to already hold the agent session concept in their head.
A fresh instance reading the plan cold reconstructed the wrong
concept (framework-session, like HTTP or database sessions) to
fit the class shape, and the error cost five hours of real-time
back-and-forth to detect and fix.

This rewrite is a structural response to that failure, not a
patch on individual sections. Every fix is concrete:

1. The refactor's two goals go at the top of the document,
   verbatim, in plain English, before any other prose. A reader
   who reads only section 1 knows what the refactor does.

2. What an agent session is is now the second section, read
   before any block is described, and it includes both an IS
   list (one ongoing conversation with the model) and an IS NOT
   list (not HTTP, not database, not REPL, not a class, not a
   session id, not a session file). A fresh reader cannot
   pattern-match to the wrong domain without walking past an
   explicit denial of that pattern.

3. A glossary pins every domain term to a precise meaning and
   includes a words NOT to use without qualification ban list.
   Session without agent is banned. State meaning the agent
   session is banned. Bundles and wraps when describing blocks
   are banned, because those verbs primed the previous failure.
   Each ban has a replacement.

4. Every block description opens with For (what it is for)
   before Does (what it does) and Not (what it is not). A
   reader who understands the purpose can derive the methods;
   a reader who only has the methods will reverse-engineer a
   purpose that may not match, which is how the previous Session
   block misled its later readers.

5. The blocks list has no Session entry at all. Instead, the
   top of the blocks list has a There is no Session block
   callout that directs the reader to What an agent session is
   and explains that the agent session is a Conversation plus a
   durable config held by the consumer, not a class.

6. Non-class blocks (Turn runner, Query runner) are explicitly
   marked as named logic and not long-lived objects.

7. The Why this plan is written this way section names the
   five-hour failure concretely, explains the instance
   continuity problem (plans must carry concepts across
   instance boundaries because every instance that reads a plan
   is a fresh reader), and lists concrete rules with concrete
   failure modes they address. The closing note says: if a
   future instance is tempted to add an abstract warning about
   drift, add a concrete fix instead. Abstract warnings lost to
   concrete drift in the previous version of this plan and
   will lose again if they are all that is added.

The old plan content survives in git history at the previous
commit on this branch. This commit replaces it in full with a
document whose goal is to carry the agent session concept
across instance boundaries without requiring the reader to
already hold it.

The runtime flow discovery and the five-hour failure itself
are recorded in the session log at .claude/sessions/2026-04-10.md
in commits c670d2b and earlier. The session log carries the
reasoning; the plan carries the outcome. The link between them
is deliberate and stated in the plan's rules.
The old section defined an agent session as an object shape (a
Conversation plus durable config). That framing invited the bundle
reading: a Session is a thing that holds the blocks. The correct
framing is that an agent session is a usage pattern of the SDK,
where the SDK holds a Conversation across queries and mutates it
as turns happen, and the caller does not rebuild the message list.

Specific changes:

1. What an agent session IS is now one sentence: a usage pattern
   where the SDK holds a Conversation across queries and mutates
   it as turns happen. The motivation paragraph explains why the
   pattern matters (somebody has to maintain the message list;
   the pattern decides whether that somebody is the SDK or the
   caller).

2. The old object-graph sentence (that growing message list
   plus the settings that stay the same is the agent session)
   is gone. The settings are not part of the session; they are
   durable config the caller holds separately.

3. A new IS NOT bullet, Not a bundle of SDK blocks, addresses
   the specific error the earlier plan and its rewrite made:
   treating Client, Tool registry, Control channel, and so on as
   agent-session-scoped. They are SDK infrastructure that exists
   in the without-concept case too.

4. A new subsection, What code would change if the agent session
   concept were added or removed?, gives the operational
   definition. Three push sites in the turn runner. Everything
   else is unchanged. With the concept the SDK runs the turn
   loop and pushes messages; without it the caller runs one turn
   at a time and manages its own list. Tools still work in the
   without case. Approvals still work. Control channel still
   works. The concept is only about who maintains the message
   list and who owns the turn loop.

This is a targeted edit, not a full rewrite. The remaining
items from the previous proposal list are still pending and
will be done in separate commits as each is confirmed.
Ten places in the plan used agent-session-scoped or session-scoped
language to describe blocks that are not scoped to sessions at all.
That framing conflated two distinct concerns: the agent session
concept (whether the SDK mutates the Conversation across queries)
and the stateful SDK (whether blocks are constructed once or
reconstructed per call). The Client, Tool registry, Control channel,
and Approval coordinator exist whether or not the session pattern
is in use; their lifetime is the consumer's choice, not the session's.

Changes in this commit:

1. Problem section. The 'second consequence of the same missing
   concept' paragraph that folded per-call reconstruction into the
   session problem is rewritten as a separate problem: the old SDK
   is stateless. It is related to the session concern because both
   touch the same code, but it is not a consequence of the missing
   session concept. A stateless SDK would be wasteful even in the
   one-shot case.

2. Principles section. The Per-agent-session construction principle
   is renamed Stateful SDK, not per-call reconstruction. The Client,
   Tool registry, Control channel, and Approval coordinator are
   described as stateful blocks, not session-scoped. The principle
   now explicitly notes that stateful SDK blocks would be valuable
   in the one-shot case.

3. What the consumer does. The paragraph claiming the Conversation
   plus durable config IS the agent session (bundle framing) is
   removed. Replaced with a paragraph that says the consumer holds
   all the blocks as long-lived fields across its SDK usage, none
   of them is the session, and the session pattern is what happens
   when the SDK mutates the Conversation (not a set of objects).

4. Running a query. The list item agent-session-scoped collaborators
   is replaced with blocks and config the consumer holds across
   queries.

5. What is not in the SDK. The per-query construction bullet is
   rewritten to refer to stateful blocks instead of
   agent-session-scoped objects.

6. Glossary. The Agent-session-scoped definition is replaced with
   Stateful block (a block that holds state and is constructed
   once, not scoped to agent sessions). The Agent session
   collaborators entry is removed entirely; the concept was wrong
   and does not need a replacement term.

7. Ban list. A new entry bans agent-session-scoped and
   session-scoped for describing blocks, pointing at the specific
   error this commit fixes. The State ban list entry is updated
   to remove the wrong replacement suggestion (the Conversation
   and durable config is not a valid substitute; that phrasing
   is the bundle framing in different words).
The old top section listed two bullets that were both about the
agent session: adding the concept, and putting persistence in the
consumer. The stateful SDK concern (blocks constructed once and
reused instead of reconstructed per call) was not a top-level
bullet at all, even though it is a distinct concern that touches
the same code.

This commit restructures the section into two concerns:

1. Add the agent session concept to the SDK. Folds the old bullet
   2 (persist and hydrate in the consumer) in as a consequence,
   because persistence is a subpoint of the session concept. The
   bullet is written in usage-pattern language (the SDK mutates
   the Conversation on the consumer's behalf) rather than
   object-graph language.

2. Make the SDK stateful. New bullet. Names the specific blocks
   that become stateful. Explicitly notes that this concern is
   distinct from the session concept and would be valuable even
   in the one-shot case.
…result split, runner coordination

Five block descriptions had specific issues that needed targeted
fixes, plus three secondary cleanups for consistency.

Client block. The old Does paragraph said the Client owns
transport-identifying headers such as user-agent and SDK version
with no explanation. The new Does paragraph names them as
client-identifying headers (User-Agent, Anthropic SDK version
headers, who is calling metadata) and explicitly contrasts them
with anthropic-beta headers, which describe what features this
specific request is using and are computed per request by the
request builder.

Stream processor block. The old Not said the processor hands
assembled messages to the turn runner. That phrasing suggested
the processor calls into the turn runner; it does not. The new
wording makes the flow explicit: the turn runner calls the
Client, hands the event stream to the processor, and reads the
assembled message from the processor's final-message surface
when it is ready. The stream processor is passive output; the
turn runner is the driver. The Does paragraph now describes the
processor's two output surfaces.

Tool registry block. The old block said the registry formats
the result as a tool_result block. A full tool_result needs a
tool_use_id, which is a turn runner concern (only the turn
runner sees the original tool_use block). The new split says
the registry returns content blocks and the turn runner wraps
them in the tool_result envelope with the corresponding
tool_use_id. Both Does and Not are updated. For drops the for
an agent session phrasing.

Turn runner block. Old block said Not a long-lived object. Not
a class with per-query instances. That was dismissive of what
the turn runner actually does. The new Does paragraph lays out
the 9-step sequence explicitly and names the local state the
runner holds during a turn. The new Not paragraph explains what
not constructed per turn means operationally: implemented as a
function or a stateless method call, not an allocated object
with a constructor.

Query runner block. Same treatment. Does is a 5-step sequence.
Not explains the same construction distinction.

Secondary cleanups: Conversation Not drops the two halves of an
agent session bundle framing. Approval coordinator Not changes
agent session to Control channel for lifetime. Control channel
Not drops the trailing one instance per agent session sentence.

Note on process: the first PreviewEdit I composed for this
commit used wrong line numbers and I caught the error in the
diff before applying. The second attempt used correct line
numbers verified against a fresh Range read.
Every block has a construction lifetime, but the block
descriptions deliberately do not cover lifetimes; they cover
responsibility (For / Does / Not). Lifetimes belong in their own
section so the reader can look up the whole picture in one place
instead of hunting across nine block descriptions.

The new section sits between The blocks and What the consumer
does. Four categories:

1. Process-lived. Client and Tool registry. Both hold expensive
   state (auth / connection pool, compiled schemas) that is
   worth sharing for the whole process.

2. Consumer-chosen. Conversation, durable config, Control
   channel, Approval coordinator. Their lifetime is up to the
   consumer.

3. Per-query. Abort controller and per-query input. Constructed
   when the query starts, discarded when it finishes.

4. Not a construct. Request builder, Turn runner, Query runner,
   Stream processor. Named logic called per invocation, not
   allocated instances.

The section closes with the distinguishing rule: blocks that
hold state worth reusing are constructed once by the consumer
and reused, while blocks that are pure logic over other blocks
are called per-invocation. Plus a reminder that per-query
object churn in the old code was a symptom of the stateless
SDK problem, not the missing session concept.
…into three

The old section had two problems that conflated the agent
session concept with SDK statelessness. Problem 1 was titled the
SDK did not hold the agent session (bundle framing) and its
body described the fifteen-field options rebuild, which is
about statelessness, not about whether the SDK mutates the
Conversation. The stateless SDK problem was demoted to an
afterthought paragraph.

Three distinct problems now:

1. The SDK did not mutate the Conversation. The
   division-of-labour issue: the caller was maintaining the
   message list when it is the SDK's job.

2. The SDK was stateless. The fifteen-field options
   description moves here. Per-call reconstruction cost.
   Stateful SDK is valuable independent of the session
   concept.

3. The SDK was reading and writing files for conversation
   data. Renamed from for the agent session and for agent
   session state.
The turn runner was doing multi-turn work. Its Does had nine
steps including inspecting the assembled message for tool_use
blocks, dispatching each tool via the registry and approval
coordinator, constructing the tool_result message, and pushing
it into the Conversation. None of that is one turn. A turn is
one request-response cycle. Dispatching tools and feeding the
result back creates the NEXT request, which is a new turn.
That is the query runner's concern.

The cleaner division:

- Turn runner: one request-response cycle. Reads wire view,
  builds request, merges signal, streams, processes, pushes
  the assembled assistant message, returns the message and
  stop reason. Seven steps. No tool dispatch, no tool_result
  construction, no looping decision.

- Query runner: owns the turn loop and tool dispatch between
  turns. After each turn runner call, if the stop reason is
  tool_use, dispatches each tool_use block (approval if
  required, registry execute, content blocks back, transform
  hook, tool_result wrapping with matching tool_use_id),
  assembles a user-role message carrying the tool_result
  blocks, pushes it to the Conversation, loops again.

Two additional things Stephen asked about are pinned in this
commit:

1. Stream processor subscription lifetime. The turn runner's
   step 5 now explicitly says the tap subscription is scoped
   to this turn: set up at step 5 and torn down at step 7 when
   the turn returns. The turn runner's local state list
   includes the tap subscription handle. This answers the .on
   lifetime question without adding a separate section.

2. Turn runner Not gets three new denials: does not dispatch
   tools (the query runner handles that between turns), does
   not construct tool_result messages, does not decide whether
   to loop.

The query runner Does is now a nested 4a/4b/4c/4d loop so the
reader can see the branch on stop reason and the tool dispatch
flow clearly. Its local state adds the pending tool uses being
dispatched.

Process note: the first attempt at this commit used git commit
-am which swept three unrelated WIP files into the commit.
Stephen caught it, ran git reset --mixed HEAD~1 to undo, and I
redid this commit with explicit git add. Never use -a in this
workflow.
… tighten save/restore

Four targeted fixes inside the What the consumer does section:

1. Setting up an agent session is renamed to Setting up the
   SDK. The setup is SDK-level, not session-level. A consumer
   making one-shot calls with no session pattern still runs
   every setup step. The previous heading wrongly implied the
   steps were for the session.

2. The Conversation step (step 3) drops the for a new agent
   session / for a restored agent session framing. The
   Conversation is just constructed empty. Restoration from
   saved messages (if any) is a push-loop, which is mentioned
   but not as session-specific language.

3. Running a query gains a paragraph that shows the turn
   runner in the flow. The previous version described the
   entry point inputs and the handle but never explained what
   actually happens inside a query. Now it says: the query
   runner pushes the user message, enters the turn loop, calls
   the turn runner per turn, and handles tool dispatch between
   turns if the stop reason is tool_use. This matches the
   runner responsibility split committed in cdf4e5d.

4. Saving and restoring an agent session is renamed to Saving
   and restoring the conversation. The only thing the consumer
   saves and restores for a conversation is the messages. The
   old text said the consumer reconstructs its durable config
   from its saved settings and reconstructs its Tool registry
   from its tool list, both of which are wrong: those are CLI
   state that gets reloaded at startup by running the Setting
   up steps, they are not saved data. The new text makes that
   explicit: the only thing that needs saving is the messages;
   the blocks are reconstructed from the consumer's own code.
   The circular sentence The restored agent session is
   functionally identical to the original because the state is
   identical is removed entirely because it said nothing.

To save is simpler now: read out the messages, write them
wherever. To restore is shorter: run the Setting up steps,
push the saved messages into the Conversation at step 3, done.

Process note: this commit used explicit git add on the single
file, not git commit -a. The three unrelated WIP files stay in
the working tree untouched.
…ribed once

The runners and the stream processor were described as not
constructed (named logic, called per invocation) and with
subscriptions that were set up and torn down per turn or per
query. Both wrong. The refactor's whole point is construct
once, subscribe once.

Changes in this commit:

1. Turn runner. Rewritten as a class constructed once by the
   consumer at setup. Constructor takes Client, Request
   builder, Stream processor, Tool registry, Approval
   coordinator as instance state. Reused via a run method.
   Step 5 no longer talks about subscribing to the processor's
   tap per turn; the processor's .on handlers are subscribed
   once at setup by the consumer, and the turn runner just
   calls process(rawIterable) and lets events fire on the
   existing handlers. Step 7 no longer tears anything down.
   Local state is method-local only. Not section says not
   constructed per turn, constructed once, reused.

2. Query runner. Same treatment. Constructed once, reused via
   run method. The query handle concept is removed entirely:
   step 1 and 2 that built the handle are gone. The consumer
   calls queryRunner.run(...) and awaits the return. Events
   fire on the already-subscribed .on handlers. Cancel goes
   through the Control channel. No handle, no per-query
   subscription, no per-query object.

3. Stream processor. Constructed once at setup. Exposes .on
   events (raw, deltas, lifecycle, final message) that the
   consumer subscribes to once. Same handlers fire for every
   stream the processor handles. Not subscribed and
   unsubscribed per stream.

4. Lifetimes section rewritten. Three categories now instead
   of four. The first is Constructed once at consumer setup
   and covers every block in the SDK plus the durable config.
   The second is Per-query and covers the abort controller and
   per-query input data (not blocks). The third is Pure
   function and covers the request builder alone. The old Not
   a construct category that swept up Turn runner, Query
   runner, and Stream processor is gone; they are all blocks
   constructed once like everything else. The distinguishing
   rule at the bottom is rewritten to state the rule directly:
   every block constructed once, reused for the life of the
   process; consumer subscribes once, never touches
   subscriptions again; per-query and per-turn subscription
   setup and teardown is the anti-pattern the refactor is
   removing.

5. Running a query section rewritten. No query entry point,
   no query handle, no per-query subscription. The consumer
   calls queryRunner.run(...) with the per-query input and
   awaits the return. Events flow through the .on handlers set
   up at setup. Cancel goes through the Control channel set up
   at setup.

6. Principles section: Observation and control are separate
   surfaces is rewritten to talk about .on subscriptions set
   up once at setup, not event taps on a query handle.

7. Two minor mentions of query handle fixed: the With the
   concept subsection of What an agent session is, and the
   Audit files bullet in What is not in the SDK.

No new abstractions, no new API shapes. The refactor is not
changing behaviour; it is only changing lifetimes (per-call to
once-at-setup) and removing the drift surface of per-turn and
per-query subscription dances. Whatever .on events exist in
the current code remain; they become subscribe-once.
…t factory

Two small fixes from your end-of-review pass.

1. Setting up step 3 (Conversation). The previous wording was
   overly detailed about pushing saved messages one by one. The
   point is simpler: the consumer owns the Conversation, the
   SDK mutates it during a query, and if the consumer has
   saved messages it sets the initial history at this point.
   Whether that is one push per message or a single set-history
   call is an implementation detail.

2. The What is not in the SDK bullet about a top-level class
   was too absolute. A convenience factory function like
   createDefaultAgent that constructs the default blocks and
   returns them does not bundle them into a class and is fine
   as a later addition. The bullet is updated to say that, and
   to mark the factory as out of scope for this refactor.

The systemReminder removal from earlier this turn was
reverted. The plan keeps systemReminder as before because the
discussion concluded it has to stay: query runner tracks
first-turn-only (loop state), turn runner takes it as a
per-run argument, request builder does the wire-view injection
after the cache marker.
… sections

Option 2 from the end-of-session discussion: surgical clean-up
that preserves what is still valid and excises what contradicts
the current plan.

1. Preamble blockquote added at the top. States that the plan
   at .claude/plans/sdk-shape.md is authoritative, names what
   this log still contains that is valid, and notes what was
   excised and what was rewritten. Directs future readers to
   use the log for code-map and discipline rules, and the plan
   for design.

2. File 9 (AnthropicAgent.ts) walk rewritten in place. Raw code
   observations preserved (current shape, 14-field
   RunAgentQuery, history methods list, Conversation.load dead
   code, historyFile removal). Wrong content replaced:
   - Refactor mapping: now splits the 14 fields correctly
     (durable config, Tool registry for tools, per-query input
     on queryRunner.run) instead of the old 11 durable plus 3
     per-query with tools in the wrong place.
   - Four history methods: now call conversation.X directly,
     not session.conversation.X, because there is no session
     class.
   - Per-query messages: pushed by the query runner, not the
     turn runner, matching the runner responsibility split in
     the current plan.
   - createAnthropicAgent and IAnthropicAgent: the old proposal
     of a createSession factory plus ISession interface
     carrying query(input) is gone. New entry says the consumer
     constructs the individual blocks directly and that a
     convenience factory like createDefaultAgent is acceptable
     but out of scope for this refactor.

3. Runtime flow discovery section entirely deleted. It contained
   the pseudocode, the session-lived vs per-query vs
   not-objects-at-all lifecycle split, the mapping of pseudocode
   operations to plan blocks, and the initial two-point framing.
   All of this is now covered by the plan itself, so keeping it
   is duplication plus noise that misleads a future reader.

4. What is next section entirely deleted. It listed six stale
   tasks, all either done or tracked differently now.

Not touched: the discipline rules, the raw observations from
file walks 1 through 8 and 10, the four design decisions from
the AgentRun walk, and the resolved questions Q3 Q4 Q5. All
still valid under the current plan.

Line count: log went from 1135 lines to around 820 lines. The
removed content is preserved in git history at the previous
commit.
The session log had stopped updating at the 6bb05a4 plan edit to the
Tool registry block. Since then the session moved through the remaining
file walks, the plan rewrites that landed the two-framing shape (agent
session as usage pattern, stateful SDK), the session log surgical
clean-up, playbook creation, and the first three phase 2 execution
steps. The log now reflects all of that.

Adds: a summary of the three arcs between 6bb05a4 and now (file walks,
plan rewrites, clean-up + playbook), individual entries for the three
phase 2 execution commits (StreamProcessor, ToolRegistry, TurnRunner),
and a new Playbook execution progress subsection that tracks phase 2
step-by-step with the current state (steps 1-3 done, step 4 next, step
5 optional tidy-ups, phase 3 deferred to a later session).

Updates: the Branch line in both Work completed and Working conventions,
from docs/sdk-code-walk to feature/sdk-stateful-refactor (the branch
was renamed mid-session once the plan settled and phase 2 execution
began).

Does not touch: the drift pattern rules, the discipline rules, the
earlier commit entries, the file walk notes, the four AgentRun design
decisions, or the Working conventions other than the Branch line.
Those are all still accurate and rewriting them would be noise.
One query per `run` call. A query is one user ask turned into however
many turns the model needs to answer it. Constructed once at consumer
setup with its long-lived dependencies (turn runner, conversation, tool
registry, approval state, control channel, durable config). Reused for
every query with per-query input passed per call.

Extracted from AgentRun.execute with exact behaviour parity across the
four concerns that were tangled there: the turn loop, the tool dispatch
between turns, the first-turn-only systemReminder lifecycle, and the
cachedReminders injection on a fresh or post-compaction conversation.

Key split from the current code: the tool registry API has been
reshaped from a one-shot `execute(name, input, transform)` into a
two-phase `resolve(name, input)` -> ready/not_found/invalid_input, with
the ready branch carrying a `run(transform?)` closure. The closure
captures the parsed input at resolve time so the handler is called with
the already-parsed value. This matches the current AgentRun flow,
which parses each tool_use input once up front in a resolve pass and
threads the parsed data through the approval machinery to the handler.
A `validate` method that returned the validation result alone would
have forced a second safeParse after approval; the closure-based
split avoids that. ToolRegistry tests are updated to cover both phases
and include a dedicated single-parse regression test that counts
safeParse invocations via a superRefine hook.

The query runner owns:
- cachedReminders injection on fresh or post-compaction conversations,
  pushing each reminder as a `<system-reminder>` block in front of the
  first user message.
- Pushing the per-query user messages into the Conversation (the Conv
  merges adjacent user messages per the API's alternation rules).
- The turn loop: calls ITurnRunner.run, resets systemReminder to
  undefined after the first turn, handles the stream-error case by
  sending `error` on the channel and returning, handles the empty
  tool_use retry (up to two attempts) before giving up with an error.
- Tool dispatch: two-phase with the registry's resolve split. Phase 1
  filters not_found and invalid_input into immediate tool_result
  errors, preserving Decision 3's asymmetry (not_found silent on the
  channel, invalid_input broadcasts `tool_error`). Phase 2 invokes
  the run closures, optionally gated on approval. With approval
  required, approval requests fire in parallel and closures run in
  the order approvals arrive (Promise.race). Without approval,
  closures run sequentially in the model's order. Both paths check
  `cancelled` between items.
- query_summary, message_usage, and done channel sends matching the
  current AgentRun behaviour.

The query runner does NOT close the channel. The channel is long-lived
and owned by the consumer; closing it per query would break every
subsequent query on the same SDK instance. AgentRun closes its per-run
channel in its finally block; the query runner deliberately does not.

Also adds ApprovalState.reset() so the shared instance can be reused
across queries. AgentRun creates a fresh ApprovalState per run in its
constructor and never needs this; QueryRunner holds a single instance
and calls reset() at the start of each run to drop any cancelled state
left over from a prior cancelled query. Stranded pending approvals
from a cancelled query have already been resolved by handle(), so
there is nothing else to reset.

Adds `PerQueryInput` to public/types.ts (messages, optional
systemReminder, optional transformToolResult, abortController) and
`IQueryRunner` to public/interfaces.ts. Removes `ToolExecuteResult`
in favour of `ToolResolveResult` plus `ToolRunResult`.

Tests (14 new QueryRunner tests; 9 updated ToolRegistry tests):

- Single-turn terminal exit: one turn, user+assistant history, done
  channel send; query_summary and message_usage emitted per turn.
- Tool-use loop: handler runs with the already-parsed input, second
  turn includes the tool_result user message with matching tool_use_id.
- Decision 3 asymmetry: tool_not_found is silent on the channel;
  tool_invalid_input broadcasts tool_error.
- Handler error: broadcasts tool_error and emits is_error tool_result.
- systemReminder lifecycle: present on first turn request body and
  query_summary, absent on second turn.
- cachedReminders: injected on a fresh conversation, not injected on
  a populated one.
- Approval required + approved: handler runs, second turn fires.
- Approval required + rejected: handler does not run, tool_result
  carries the rejection reason.
- Long-lived instance: two queries in sequence, channel stays open
  (close count stays at 0), Conversation accumulates across queries.
- ApprovalState.reset: a cancelled flag from a prior query is cleared
  by QueryRunner.run's reset() call so a subsequent run proceeds.

ToolRegistry tests include a single-parse regression test that counts
safeParse invocations across resolve() and run() to ensure the parsed
value is threaded through the closure without re-parsing.

All 110 tests pass (96 previously + 14 new QueryRunner; the 9
ToolRegistry tests are updated in place, net +1 vs step 2's 8). See
.claude/plans/sdk-refactor-playbook.md phase 2 step 4.
The previous JSDoc had three problems a new reader would trip over:

1. "owns auth" was misleading. The OAuth flow (token acquisition, login,
   credential storage) lives in private/Auth/. This class only holds the
   refresh loop via TokenRefreshingAnthropic. A reader who takes "owns
   auth" literally looks here for the OAuth machinery and finds nothing.

2. The "first step of the refactor series in issue #232" pointer is
   stale. The refactor is now driven by .claude/plans/sdk-shape.md and
   .claude/plans/sdk-refactor-playbook.md, not a GitHub issue. The
   early-session commit be2554a flagged this as a pointer-update item;
   this commit resolves it.

3. The Client block description in sdk-shape.md draws an explicit line
   between client-identifying headers (user-agent, SDK version, "who is
   calling") and feature beta headers (anthropic-beta, "what features
   this specific request is using"). The previous JSDoc did not mention
   this split, so a reader trying to trace where anthropic-beta comes
   from might start here and not find it. It comes from the request
   builder, computed per request from the durable config.

The new JSDoc is structured as two bulleted lists: what the class owns
(refresh timing, HTTP transport, client-identifying headers) and what
it deliberately does not own (OAuth flow, feature beta headers, the
request body, the abort signal), with a pointer at each not-owned item
to the block that does own it. Pointing at the plan instead of the
issue number gives the next reader a live reference.

Drops the historical "extracted from AnthropicAgent" / "previous
AnthropicMessageStreamer wrapper removed" paragraph. That content was
accurate at the time but it is session history, not architecture: a
new reader does not need to know how the class got to its current
shape, only what the current shape is. The `extends IMessageStreamer`
is already visible from the class signature.

No behaviour change. 110 tests still pass.
Phase 2 landed across five commits (422556e, c0a1335, 3c77d99, 99997a7,
5bc5a7d) so the log needs the corresponding close-out.

Adds a new Design decisions from phase 2 execution subsection with
Decision 5: the tool registry API is split into resolve() plus a run()
closure, not a single execute() method. This decision was forced by a
behaviour-preservation constraint in step 4 that the step 2 execute()
signature could not meet. The first draft of step 4 proposed dropping
the pre-filter (so invalid tool_uses would go through approval before
failing); this was identified as a behaviour change and rejected. The
recorded options (validate method, shouldRun callback, closure-based
split) are there so a future reader understands why the registry has
this specific shape and does not try to simplify it back to a single
execute().

Rewrites the Playbook execution progress subsection to reflect phase 2
complete: steps 4 and 5a done, 5b/5c deferred (5b cosmetic, 5c waits
for the CLI restore flow in phase 3). The step 4 entry now links to
Decision 5 for the registry API revision. The phase 3 heading now
lists the five open decisions the next session has to resolve at the
start: AnthropicAuth export status, Auth/ directory move, ApprovalState
to ApprovalCoordinator rename, AgentChannel to ControlChannel rename,
and Conversation.setHistory shape.

Updates the test-count line to 110 passing across eight spec files,
with per-file counts. Notes that AgentRun.spec.ts passes verbatim,
which is the main signal phase 2 preserved behaviour on the old code
path, and that both MessageStream.spec.ts and StreamProcessor.spec.ts
are kept through phase 2 with the old one dropped in phase 3.
…ction

Decision 1 (AnthropicAuth export status) was Stephen's correction that
the question should not have been raised at all: the filter "is this
necessary for the refactor?" would have caught it. Stays as-is. The
remaining four all got yes: Auth directory move to Client/Auth,
ApprovalState rename to ApprovalCoordinator, AgentChannel rename to
ControlChannel, and a single setHistory method on Conversation. The
setHistory method becomes the first phase 3 commit because step 6
(CLI swap) consumes it.
The CLI needs to restore a saved conversation into a fresh Conversation
on startup. setHistory replaces the entire internal items array with the
given messages, no merge logic applied (the saved state is already in
valid alternating order), no id tags (those are session-scoped and not
persisted). Clears any existing history first.

Prerequisite for step 6 which wires the CLI to construct blocks directly
instead of going through createAnthropicAgent.

7 new tests: populate empty, replace existing, no merge logic,
push after setHistory, push merge with last user message from
setHistory, cloneForRequest respects compaction blocks, id tags
from before setHistory are cleared. Total: 34 Conversation tests.
The CLI no longer uses createAnthropicAgent or the 14-field
RunAgentQuery options bag. Instead, main.ts constructs the individual
blocks once at setup: AnthropicClient, Conversation, StreamProcessor,
ToolRegistry, ApprovalState, AgentChannel, TurnRunner, QueryRunner,
and a mutable DurableConfig object.

Key wiring decisions:

- Channel is long-lived, shared across queries. Cancel fans out via a
  mutable AbortController reference so the single channel callback can
  reach whatever controller is active for the current query.

- StreamProcessor events are forwarded to the channel once at setup.
  This preserves the AgentMessageHandler receiving all events through
  one path (unchanged).

- Per-query handler via a mutable reference on the channel consumer
  port listener. A fresh AgentMessageHandler is created before each
  query with the current model, avoiding listener add/remove timing
  issues between the channel.send and the promise resolution.

- History persistence moves to the CLI. loadHistory reads JSONL at
  startup into conversation.setHistory. saveHistory writes after each
  query completes. The SDK no longer touches the filesystem for
  conversation data.

- Tool definitions and the transformToolResult hook are constructed
  once at setup (they reference the long-lived RefStore) and reused
  across queries.

- runAgent.ts becomes a thin per-query wrapper: sets the cancel
  function, calls queryRunner.run with per-query input, catches
  errors, clears cancel and completes streaming.

The old AgentRun code path still exists and compiles but is no longer
called. Steps 7 onward delete it.
AgentRun, AnthropicAgent, createAnthropicAgent, and IAnthropicAgent
are no longer called after step 6 swapped the CLI to the new blocks.
AgentRun.spec.ts is deleted because its assertions are covered by
QueryRunner.spec.ts (14 tests) and TurnRunner.spec.ts (5 tests).

index.ts drops the createAnthropicAgent and IAnthropicAgent exports.
interfaces.ts drops the IAnthropicAgent abstract class.

109 tests remain across 7 spec files.
History persistence is now the CLI's responsibility. The CLI reads
JSONL at startup into conversation.setHistory() and writes after
each query -- no SDK-level store needed.

- Remove Conversation.load() method (replaced by setHistory)
- Remove Conversation.load tests (3 tests, 31 remain)
- Remove historyFile from AnthropicAgentOptions
- Delete ConversationStore.ts (file-backed wrapper around Conversation)
StreamProcessor is the long-lived replacement. MessageStream was a
per-stream EventEmitter that allocated fresh state on every API call;
StreamProcessor holds accumulated state across the conversation and
emits the same events through the shared instance.

No code imports the class any longer -- only its own spec file did.
101 tests remain across 6 spec files.
RunAgentQuery, RunAgentResult, and AnthropicAgentOptions belonged
to the createAnthropicAgent bundle (deleted in step 7). Nothing
imports them.

- Remove three type definitions from public/types.ts
- Remove MessagePort import (only used by RunAgentResult.port)
- Remove all three from index.ts import and re-export
- Update Ref.ts JSDoc to reference PerQueryInput instead of RunAgentQuery
The old name described the data shape (state); the new name
describes the role (coordinating approval requests, responses,
and cancellation across the query runner's tool dispatch).
The channel carries control messages (events, approvals,
cancellation) between the SDK and the consumer. The old name
tied it to the deleted agent bundle; the new name describes
the communication pattern.
Auth modules are part of the Anthropic client layer, not
general SDK infrastructure. Nesting under Client/ reflects
the dependency direction: the client owns auth, not the other
way around.
The AgentRun class was removed in Phase 2, but its name lingered in
JSDoc comments and inline notes across 11 source and test files.
Every reference now points to the actual component that owns the
behaviour being described (QueryRunner, TurnRunner, ToolRegistry,
etc.). grep -r AgentRun returns zero hits.
The handler was recreated every loop iteration to pick up a changed
model. Now it holds a reference to the shared DurableConfig and
reads model, cacheTtl, and tools from it directly.

The respond callback was wrapping consumerPort.postMessage with no
added value. The handler now holds the port and posts approval
responses itself.
ControlChannel no longer takes an onMessage callback in its
constructor. It exposes channel.on('message', ...) so callers
listen directly, symmetric with consumerPort.on('message', ...).
The dead ControlChannelFactory is deleted.

AgentMessageHandler takes the consumer port directly instead of
a respond callback, and reads model/cacheTtl/tools from the
shared DurableConfig reference instead of copying them out.
Constructed once before the loop, no per-query recreation.
@shellicar shellicar added this to the 1.0 milestone Apr 10, 2026
@shellicar shellicar added enhancement New feature or request pkg: claude-sdk-cli The new SDK-based CLI pkg: claude-sdk The agent SDK labels Apr 10, 2026
@shellicar shellicar self-assigned this Apr 10, 2026
@shellicar shellicar requested a review from bananabot9000 April 10, 2026 10:12
Session logs from 03-14 through 04-08 replaced with one-line summaries:
all covered merged work with no live decisions. Logs 04-09 and 04-10
trimmed to recovery content only.

Two pieces of reasoning moved from the (now-trimmed) session log into
sdk-shape.md where they belong:
- New principle: the plan describes shape, not a feature list. Block
  descriptions name where responsibilities would live; the refactor
  creates the shape but does not add missing capabilities.
- Tool registry resolve/run closure split documented with reasoning:
  single-parse invariant across the approval gate.

QueryRunner: replaced isFirst flag with [first, ...rest] destructuring.
The registry always constructs BetaTool (name + description + input_schema),
but typed the return as BetaToolUnion — a 22-member union where most members
lack those properties. TypeScript could not narrow the access, causing 6
type errors in the test file.

Changed wireTools to return BetaTool[] in both ToolRegistry and IToolRegistry.
Split the 4-assertion wireTools test into one assertion per test. 104 tests
pass, type-check clean.
Copy link
Copy Markdown
Collaborator

@bananabot9000 bananabot9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor delivers on all three stated problems: SDK mutates the Conversation, SDK is stateful, SDK doesn't touch the filesystem for conversation data.

What works well:

  • Plan documents (sdk-shape.md, sdk-refactor-playbook.md) are the strongest planning artifacts in this repo. The instance-continuity post-mortem is a concrete fix for a concrete failure mode.
  • Clean block decomposition: QueryRunner/TurnRunner/StreamProcessor/ToolRegistry each own exactly one responsibility. No leaking.
  • ToolRegistry resolve/run closure pattern preserves single-parse invariant across the approval gate.
  • CLI wiring in main.ts is explicit and readable. 15-field options → 4-field per-query input.
  • Net -1869 lines. AgentRun, AnthropicAgent, ConversationStore, MessageStream, createAnthropicAgent all deleted. Good deletions.

Observations (non-blocking):

  1. DurableConfig mutated in-place in the main loop — works but the type doesn't signal mutability.
  2. tools lives on both DurableConfig and ToolRegistry — double source of truth worth noting.
  3. QueryRunner constructor: 7 positional args, at the edge.
  4. saveHistory on every query — fine now, note for large contexts.

104 tests. Approve.

Co-Reviewed-By: BananaBot9000 🍌

@shellicar shellicar merged commit 6783e00 into main Apr 10, 2026
4 checks passed
@shellicar shellicar deleted the feature/sdk-stateful-refactor branch April 10, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pkg: claude-sdk The agent SDK pkg: claude-sdk-cli The new SDK-based CLI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SDK lacks agent session pattern, is stateless per call, and touches the filesystem for conversation data

2 participants