Skip to content

systemReminder was re-shown on every query, not just the first#228

Merged
shellicar merged 2 commits intomainfrom
fix/system-reminder-tool-result-injection
Apr 8, 2026
Merged

systemReminder was re-shown on every query, not just the first#228
shellicar merged 2 commits intomainfrom
fix/system-reminder-tool-result-injection

Conversation

@shellicar
Copy link
Copy Markdown
Owner

Problem

Two related bugs where systemReminder (the git delta snapshot) was leaking beyond the first API call of a turn.

Bug 1 — API injection

AgentRun passed systemReminder to every #getMessageStream call in the loop, including tool-result continuations. The agent saw the same git delta repeated in every message during a multi-tool sequence.

Bug 2 — Display

AgentMessageHandler stored gitDelta as a constructor-time field and rendered it on every query_summary message. The CLI header showed the git delta on every loop iteration, not just the first.

Fix

Bug 1: Consume systemReminder with a local variable, pass it into the first #getMessageStream call, then set it to undefined. Subsequent iterations of the loop send undefined.

Bug 2: Thread systemReminder through the query_summary SDK message. AgentRun includes it in the channel send (where the local variable still holds the value, before being cleared). AgentMessageHandler reads msg.systemReminder — only renders when the message actually carries it, which is exactly once per turn. Removes gitDelta from AgentMessageHandlerOptions entirely.

Tests

  • AgentRun: first query_summary carries systemReminder, second does not
  • AgentRun: systemReminder only injected into first API call body, not tool-result continuations
  • AgentMessageHandler: appends reminder when set, omits it when absent

Closes #227

systemReminder is meant to be ephemeral context for the opening call of
a human turn — a nudge to the model that isn't stored in conversation
history. It was instead passed on every iteration of the agent loop,
including tool-result continuations.

Tool-result continuations are not new user turns. They are the model's
own tool-use exchange completing itself. Appending user-turn context
there is semantically wrong, and it changes the request shape on every
loop iteration, breaking prompt cache stability.

The fix consumes systemReminder exactly once: extracted from options
before the loop in AgentRun.execute(), passed into the first
#getMessageStream call, then cleared. Every subsequent call in the same
turn sees undefined.

To make this testable without hitting the Anthropic API, AgentRun's
constructor was changed to accept IMessageStreamer and
IAgentChannelFactory abstractions instead of a raw Anthropic client.
AnthropicAgent creates the concrete implementations and passes them in.
FakeMessageStreamer in the test captures every call body, allowing
direct assertion on what was and wasn't sent.
… first

AgentMessageHandler stored gitDelta at construction time and rendered it
on every query_summary message. The field was set once, never cleared, so
every loop iteration — including tool-result continuations — showed the
git delta header even though the reminder was only meant to be visible
for the first API call of a turn.

Fix: thread systemReminder through the query_summary SDK message itself.
AgentRun already has a local variable that holds the reminder for the
first call and is cleared to undefined after; that same value is now
included in the channel send before the clear. AgentMessageHandler reads
msg.systemReminder, so it only renders when the message actually carries
the value — which is exactly once per turn.

Removes gitDelta from AgentMessageHandlerOptions entirely; the field only
ever existed to feed this display path, and that role is now filled by
the message payload.

Tests added:
- AgentRun: first query_summary carries systemReminder, second does not
- AgentMessageHandler: appends reminder when set, omits it when absent

Closes #227
@shellicar shellicar force-pushed the fix/system-reminder-tool-result-injection branch from 9483977 to 9c9a97f Compare April 8, 2026 16:00
@shellicar shellicar changed the title fix: systemReminder shown on every query instead of just the first systemReminder was re-shown on every query, not just the first Apr 8, 2026
@shellicar shellicar changed the title systemReminder was re-shown on every query, not just the first systemReminder was re-shown on every query, not just the first Apr 8, 2026
@shellicar shellicar requested a review from bananabot9000 April 8, 2026 16:03
@shellicar shellicar self-assigned this Apr 8, 2026
@shellicar shellicar added the bug Something isn't working label Apr 8, 2026
@shellicar shellicar added this to the 1.0 milestone Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@bananabot9000 bananabot9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see what you meant by "insane fix for a simple bug" 🍌

The fix is 3 lines. Consume systemReminder into a local variable, pass it to the first #getMessageStream call, set it to undefined. Done. Bug fixed.

The other 300 lines are testability refactoring. To PROVE the fix works, 006 needed to intercept what gets sent to the Anthropic API. Which meant abstracting the Anthropic client behind IMessageStreamer, abstracting the MessageChannel behind IAgentChannelFactory, creating FakeMessageStreamer that captures call bodies, FakeAgentChannel that captures SDK messages, and scripting multi-step tool-use flows with makeToolUseStream/makeEndTurnStream.

That's the "insane" part — the testing infrastructure is 50x the fix. But now AgentRun is testable without hitting the real API, which pays dividends forever.

Architecture improvements baked in:

  • RequestBuilderOptions extracted from RunAgentQuery — RequestBuilder only sees what it needs, not the full query shape. Clean separation.
  • IMessageStreamer / IAgentChannelFactory — proper DI seams. AgentRun no longer creates its own dependencies.
  • gitDelta removed from AgentMessageHandlerOptions entirely — the display path reads from the message payload now, not constructor state.

The consume-once pattern:

let systemReminder = this.#options.systemReminder;
// ... in loop:
const stream = this.#getMessageStream(messages, systemReminder);
systemReminder = undefined;  // gone after first use

Simple, readable, no flags, no "isFirstIteration" booleans. The variable IS the state.

Tests verify both bugs:

  • API injection: first call has systemReminder in body, second (tool-result continuation) does not
  • Display: first query_summary carries systemReminder, second does not
  • RequestBuilder: injects when set, leaves messages unchanged when undefined

No sensitive data, no reverted changes, no session log artifacts.

81+ PRs reviewed. The goldfish tests the plumbing 🍌

@shellicar shellicar merged commit f131057 into main Apr 8, 2026
4 checks passed
@shellicar shellicar deleted the fix/system-reminder-tool-result-injection branch April 8, 2026 16:07
@shellicar shellicar added pkg: claude-sdk The agent SDK pkg: claude-sdk-cli The new SDK-based CLI labels Apr 8, 2026
shellicar added a commit that referenced this pull request Apr 9, 2026
Session 3 appended to 2026-04-09.md covering the build verification,
post-rebase check, biome CI fix approach, and PR #230.

Current State updated: branch fix/packaging, PR #230 open, recent PRs
#228/#229 added to the post-refactor list.
shellicar added a commit that referenced this pull request Apr 9, 2026
Session 3 appended to 2026-04-09.md covering the build verification,
post-rebase check, biome CI fix approach, and PR #230.

Current State updated: branch fix/packaging, PR #230 open, recent PRs
#228/#229 added to the post-refactor list.
shellicar added a commit that referenced this pull request Apr 9, 2026
* Switch packages from custom build scripts to tsup

Each package had a hand-rolled build.ts that called esbuild directly.
Replaced with tsup.config.ts in each package, which handles ESM + CJS
output and DTS generation in one pass.

Key decisions:

- bundle: true is required because bundle: false only transpiles entry
  files, leaving relative imports unresolved in dist. Consuming bundlers
  follow those imports and find nothing.

- clean: true replaces the @shellicar/build-clean plugin. The plugin
  uses esbuild's metafile to find current build outputs and delete
  everything else. TypeScript-generated DTS files never appear in the
  metafile, so the plugin deleted them every time. clean: true wipes the
  outdir before the build starts, so JS and DTS land together.

- entryNames: '[name]' in esbuildOptions puts JS files at the same level
  as DTS files. tsup's TypeScript pass ignores esbuild's entryNames
  entirely, so DTS always lands flat at dist/esm/[name].d.ts. Matching
  JS placement keeps them co-located and matches the exports map.

- claude-sdk-tools entry is src/entry/*.ts, not src/*.ts. The public
  tool modules live under src/entry/; src/*.ts only contains internal
  utilities.

Exports maps updated to the nested import/require form with types +
default sub-conditions, pointing to the correct dist paths.

tsconfig.json in each package now scopes to src/**/*.ts and excludes
dist/node_modules. tsconfig.check.json symlinks to the root copy.

turbo.json build inputs updated to include tsup.config.ts.

Both apps updated: entryNames '[name]', packages: 'external' instead of
a manual external list, bin paths updated from dist/entry/main.js to
dist/main.js, workspace deps moved to dependencies since they are
externalized and need to be present at runtime.

* Fix biome CI errors

Format fixes in the three tsup.config.ts files (arrow function body
style), indentation in claude-sdk-tools package.json exports, trailing
newline in claude-sdk-cli package.json, import sort order in runAgent.ts.

Remove unused CacheTtl import from AgentRun.ts and reformat a ternary
in MessageStream.ts — both left over from the cache TTL refactor that
came in via rebase.

* Session log and harness update for packaging tsup migration

Session 3 appended to 2026-04-09.md covering the build verification,
post-rebase check, biome CI fix approach, and PR #230.

Current State updated: branch fix/packaging, PR #230 open, recent PRs
#228/#229 added to the post-refactor list.

* Add @shellicar/changes tooling: schema generation, validation, CI integration

Sets up the full changes.jsonl toolchain for this repo:

- changes.config.json defines the valid categories (feature, fix, breaking,
  deprecation, security, performance) with their display names
- scripts/src/generate-schema.ts generates schema/shellicar-changes.json from
  the Zod definitions + config; category is required (repo policy, stricter
  than the base spec)
- scripts/src/validate-changes.ts validates all **/changes.jsonl files against
  the schema via ajv; no-args mode globs the repo, or pass specific paths
- CI (.github/workflows/node.js.yml) runs the validator on every push

Adds changes.jsonl entries for the tsup packaging work (PR #230) in
claude-core, claude-sdk, and claude-sdk-tools.

Updates both CLAUDE.md files: correct category enum, removed the bad
'issue' field from the example, noted that category is required and
that issue links belong in metadata not at the top level, added the
@shellicar/changes tooling section.

* Session log: @shellicar/changes tooling session (2026-04-09 continued)

* Add @shellicar/changes toolchain with per-package changelogs

Builds out the full changes toolchain across the monorepo:

- changes.config.json: Keep a Changelog standard categories
  (added/changed/deprecated/removed/fixed/security). Custom category names
  were replaced because these are the recognised standard and tooling
  elsewhere understands them.

- schema/shellicar-changes.schema.json (renamed from .json): generated
  JSON Schema artifact. The .schema.json suffix is conventional for
  JSON Schema files and makes intent unambiguous in editors.

- scripts/src/generate-schema.ts: generates schema from Zod definitions
  + changes.config.json. Must run from scripts/ directory.

- scripts/src/validate-changes.ts: validates every **/changes.jsonl
  against the schema via ajv. CI runs this on every push.

- scripts/src/generate-changelog.ts: generates CHANGELOG.md for a
  package from its changes.jsonl. Groups entries by category in config
  order within each release. Renders metadata.issue as (#NNN) and
  metadata.ghsa as a linked advisory suffix.

- Release markers gain an optional tag field. When absent the script
  defaults to <shortName>@<version>. The claude-cli app uses explicit
  tags because its historical alphas used unscoped version tags.

- changes.jsonl + CHANGELOG.md added for all five packages/apps.
  apps/claude-cli carries the full reverse-engineered history from the
  root CHANGELOG.md (alpha.67 through alpha.74) so the generated
  CHANGELOG.md is the authoritative source going forward.

- Both CLAUDE.md files updated with a dedicated @shellicar/changes
  section covering config, schema, and all three scripts.

* Fix biome errors in scripts

Formatting applied via --changed --since=origin/main to scope fixes
to only the files we touched. Rewrote the ??= pattern as a plain
if-block: clearer to read, no cleverness required.

* Fix CI: run validate via scripts package, not pnpm tsx from root

tsx is only installed in the scripts workspace package. Running
pnpm tsx from the repo root fails because there is no root-level
tsx binary. Use pnpm --filter scripts run validate instead, which
runs inside the package where tsx is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working pkg: claude-sdk The agent SDK pkg: claude-sdk-cli The new SDK-based CLI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

systemReminder is injected into tool-result continuation calls, not just the first call of a turn

2 participants