Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions .claude/sessions/2026-04-08.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Session 2026-04-08

## What was done

### Fix a broken test

`RequestBuilder.spec.ts` had a failing test: `custom system prompts are appended after the prefix`. The implementation joins custom system prompts into a single combined block (so there's one cache boundary instead of N). The test expected separate array entries. Fixed it to expect `['prefix', '\nsecond\n\nthird']`.

### Add cache_control to the last user message on every API call (PR #205)

The system prompts and tool definitions already had `cache_control`, but the conversation history did not. Without a cache boundary on the user message, the API re-reads the entire conversation on every turn, which means only the fixed prefix (system prompt + tools) benefits from caching.

**`addCacheControlToLastBlock(msg, cacheTtl)`** — attaches `cache_control: { type: 'ephemeral', ttl }` to the last non-thinking content block of a message:
- String content is promoted to a `BetaContentBlockParam[]` array, since `cache_control` has nowhere to live on a plain string
- `findLastIndex` skips `thinking` and `redacted_thinking` blocks — `BetaThinkingBlockParam` has no `cache_control` property; trying to spread one onto it is a TypeScript error
- Returns `msg` unchanged when all blocks are thinking blocks (`findLastIndex === -1`) or the resolved block is somehow null

**`withCachedLastUserMessage(messages, cacheTtl)`** — finds the last user message and applies `addCacheControlToLastBlock` without mutating the caller's array:
- Returns `messages` unchanged when no user messages exist
- Copies the array (`[...messages]`) and replaces the target element with the cached version

**Call site in `buildRequestParams`:** `withCachedLastUserMessage(messages, options.cacheTtl ?? CacheTtl.OneHour)`. The `?? CacheTtl.OneHour` default ensures the cache boundary is always set even when the caller doesn't specify a TTL.

### Tests

Added 9 tests covering all branches:
- Array content → `cache_control` added to the last block
- String content → promoted to array block with `cache_control`
- All-thinking blocks → `findLastIndex === -1`, returned unchanged
- No user messages → returned unchanged
- User message followed by assistant → user gets `cache_control`, assistant does not
- Multiple content blocks → `cache_control` on last block only, not first
- Non-mutation of input array

Also added `getContentCacheControl` helper in the test file to extract `cache_control` from a content block at a given message and block index without `!` assertions.

Tests compare against the full `{ type: 'ephemeral', ttl: CacheTtl.OneHour }` object rather than just the type string, since `ttl` is the field that controls how long the cache entry lives.

## Decisions

**`findLastIndex` over a hand-rolled loop:** The intent reads directly — find the last non-thinking block. A `for` loop going backwards would work but obscures the intent and requires more surface area to get wrong.

**`?? CacheTtl.OneHour` default at the call site:** The cache boundary should always exist on the user message — without it the conversation history is never cached. Making the default explicit at the call site keeps the two functions general (`undefined` TTL means "no TTL field") while ensuring production callers always get a cache hit.

**Non-mutation guarantee:** The caller passes its live messages array. Mutating it would change the caller's state, which violates the function's pure-function contract. The shallow copy (`[...messages]`) is enough since only one element is replaced.

**Thinking block exclusion:** `BetaThinkingBlockParam` does not have a `cache_control` property. The spread `{ ...block, cache_control }` compiles but produces an object whose type no longer matches the union. Skipping thinking blocks avoids the type error and is correct semantically — the API documents cache boundaries on text/tool blocks, not thinking blocks.

## Files changed

- `packages/claude-sdk/src/private/RequestBuilder.ts` — added `addCacheControlToLastBlock`, `withCachedLastUserMessage`; call site updated
- `packages/claude-sdk/test/RequestBuilder.spec.ts` — fixed broken test, added `getContentCacheControl` helper, 9 new message caching tests; added `CacheTtl` and `BetaMessageParam` imports
- `packages/claude-sdk/src/public/types.ts` — minor (no behaviour change)
- `packages/claude-sdk/src/private/AgentRun.ts` — minor (no behaviour change)
- `apps/claude-sdk-cli/src/systemPrompts.ts` — minor (no behaviour change)
10 changes: 5 additions & 5 deletions apps/claude-sdk-cli/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@shellicar/claude-sdk-cli",
"version": "0.0.0",
"version": "1.0.0-alpha.4",
"private": false,
"description": "Interactive CLI for Claude AI built on the Anthropic SDK",
"license": "MIT",
Expand Down Expand Up @@ -42,15 +42,15 @@
"@types/node": "^25.5.2",
"esbuild": "^0.27.5",
"tsx": "^4.21.0",
"vitest": "^4.1.2"
},
"dependencies": {
"@anthropic-ai/sdk": "^0.82.0",
"vitest": "^4.1.2",
"@shellicar/claude-core": "workspace:^",
"@shellicar/claude-sdk": "workspace:^",
"@shellicar/claude-sdk-tools": "workspace:^",
"cli-highlight": "^2.1.11",
"winston": "^3.19.0",
"zod": "^4.3.6"
},
"dependencies": {
"@anthropic-ai/sdk": "^0.82.0"
}
}
7 changes: 4 additions & 3 deletions apps/claude-sdk-cli/src/runAgent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import { systemPrompts } from './systemPrompts.js';

export async function runAgent(agent: IAnthropicAgent, prompt: string, layout: AppLayout, store: RefStore): Promise<void> {
const pipeSource = [Find, ReadFile, Grep, Head, Tail, Range, SearchFiles];
const { tool: Ref, transformToolResult: refTransform } = createRef(store, 2_000);
const { tool: Ref, transformToolResult: refTransform } = createRef(store, 20_000);
const otherTools = [PreviewEdit, EditFile, CreateFile, DeleteFile, DeleteDirectory, Exec, Ref];
const pipe = createPipe(pipeSource);
const tools: AnyToolDefinition[] = [pipe, ...pipeSource, ...otherTools];
Expand All @@ -44,9 +44,10 @@ export async function runAgent(agent: IAnthropicAgent, prompt: string, layout: A

const { port, done } = agent.runAgent({
model,
maxTokens: 32768,
maxTokens: 8000,
messages: [prompt],
systemPrompts,
cacheTtl: '1h',
transformToolResult,
pauseAfterCompact: true,
compactInputTokens: 150_000,
Expand All @@ -58,7 +59,7 @@ export async function runAgent(agent: IAnthropicAgent, prompt: string, layout: A
[AnthropicBeta.ClaudeCodeAuth]: true,
// [AnthropicBeta.InterleavedThinking]: true,
[AnthropicBeta.ContextManagement]: false,
[AnthropicBeta.PromptCachingScope]: true,
[AnthropicBeta.PromptCachingScope]: false,
// [AnthropicBeta.Effort]: true,
[AnthropicBeta.AdvancedToolUse]: true,
// [AnthropicBeta.TokenEfficientTools]: true,
Expand Down
19 changes: 9 additions & 10 deletions apps/claude-sdk-cli/src/systemPrompts.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,17 @@
* Temporary / hardcoded until a proper configuration layer exists.
*/

const gpgSigning = `\
Every git commit in this repo is GPG-signed. The signing flows through Stephen's \
export const systemPrompts = [
`Every git commit in this repo is GPG-signed. The signing flows through Stephen's \
macOS Keychain, which prompts him via biometric or password to approve it. \
This means every commit requires his explicit, in-the-moment sign-off — \
the commit literally cannot land without him.

When making a commit, just run it. The keychain prompt is how Stephen approves it. \
Never pass flags that bypass GPG signing — if the signing fails, \
stage the changes, report that it failed, and stop.`;
stage the changes, report that it failed, and stop.`,

const conventionalCommits = `\
Conventional Commits defines exactly two commit message types: fix and feat. \
`Conventional Commits defines exactly two commit message types: fix and feat. \
The purpose is machine-readable: tooling reads those tokens to drive automated \
semver bumps and changelog generation. That is the entire point of the spec.

Expand All @@ -24,10 +23,9 @@ following its conventions would be adopting the form with none of the function.
For branch names, use plain English words that describe the work: \
fix/, feature/, docs/, security/ are all fine. \
If a prefix feels like it came from a spec rather than the English language, \
that is a sign it does not belong there.`;
that is a sign it does not belong there.`,

const selfNote = `\
The why matters more than the what. Anyone can read what happened; \
`The why matters more than the what. Anyone can read what happened; \
only the reasoning explains whether it was right.

Write reasoning as you go — not for documentation, but because articulating \
Expand All @@ -39,6 +37,7 @@ ground truth. Starting from a proposal before understanding what exists \
leads to conflicts with work already done.

Before applying a convention or pattern, ask whether it fits this specific \
context or is just familiar. Familiarity is not a reason.`;
context or is just familiar. Familiarity is not a reason.`,

export const systemPrompts: string[] = [gpgSigning, conventionalCommits, selfNote];
`When a tool call is rejected, treat it as the user saying "no" - not as a transient failure to retry. Do not attempt the same action again with minor variations.`,
];
2 changes: 1 addition & 1 deletion packages/claude-sdk/src/private/AgentRun.ts
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ export class AgentRun {
pending.splice(index, 1);

if (!response.approved) {
const content = response.reason ?? 'Tool use rejected';
const content = response.reason ?? 'Rejected by user, do not reattempt';
this.#logger?.debug('tool_rejected', { name: toolUse.name, reason: content });
toolResults.push({ type: 'tool_result', tool_use_id: toolUse.id, is_error: true, content });
continue;
Expand Down
64 changes: 59 additions & 5 deletions packages/claude-sdk/src/private/RequestBuilder.ts
Original file line number Diff line number Diff line change
@@ -1,15 +1,56 @@
import type { Anthropic } from '@anthropic-ai/sdk';
import type { BetaMessageStreamParams } from '@anthropic-ai/sdk/resources/beta/messages.js';
import type { BetaCacheControlEphemeral, BetaClearThinking20251015Edit, BetaClearToolUses20250919Edit, BetaCompact20260112Edit, BetaContextManagementConfig, BetaToolUnion } from '@anthropic-ai/sdk/resources/beta.mjs';
import type { BetaCacheControlEphemeral, BetaClearThinking20251015Edit, BetaClearToolUses20250919Edit, BetaCompact20260112Edit, BetaContentBlockParam, BetaContextManagementConfig, BetaTextBlockParam, BetaToolUnion } from '@anthropic-ai/sdk/resources/beta.mjs';
import { AnthropicBeta } from '../public/enums';
import type { RunAgentQuery } from '../public/types';
import { CacheTtl, type RunAgentQuery } from '../public/types';
import { AGENT_SDK_PREFIX } from './consts';

export type RequestParams = {
body: BetaMessageStreamParams;
headers: { 'anthropic-beta': string };
};

function addCacheControlToLastBlock(msg: Anthropic.Beta.Messages.BetaMessageParam, cacheTtl: CacheTtl | undefined): Anthropic.Beta.Messages.BetaMessageParam {
const cache_control = { type: 'ephemeral' as const, ttl: cacheTtl };

if (typeof msg.content === 'string') {
const content: BetaContentBlockParam[] = [{ type: 'text', text: msg.content, cache_control }];
return { ...msg, content };
}

const content = [...msg.content];
const idx = content.findLastIndex((b) => b.type !== 'thinking' && b.type !== 'redacted_thinking');
if (idx === -1) {
return msg;
}

const block = content[idx];
if (block == null || block.type === 'thinking' || block.type === 'redacted_thinking') {
return msg;
}

content[idx] = { ...block, cache_control };
return { ...msg, content };
}

function withCachedLastUserMessage(messages: Anthropic.Beta.Messages.BetaMessageParam[], cacheTtl: CacheTtl | undefined): Anthropic.Beta.Messages.BetaMessageParam[] {
const idx = messages.findLastIndex((m) => m.role === 'user');
if (idx === -1) {
return messages;
}

const msg = messages[idx];
if (msg == null) {
return messages;
}

const cached = addCacheControlToLastBlock(msg, cacheTtl);

const result = [...messages];
result[idx] = cached;
return result;
}

/**
* Pure function — builds the Anthropic API request params from agent options
* and the current message list. No I/O, no client reference, no signal.
Expand Down Expand Up @@ -50,15 +91,28 @@ export function buildRequestParams(options: RunAgentQuery, messages: Anthropic.B
} satisfies BetaCompact20260112Edit);
}

const systemPrompts = [AGENT_SDK_PREFIX, ...(options.systemPrompts ?? [])];
const systemPrompts = [AGENT_SDK_PREFIX];
if (options.systemPrompts != null && options.systemPrompts.length > 0) {
systemPrompts.push(`\n${options.systemPrompts.join('\n\n')}`);
}

const messagesForBody = withCachedLastUserMessage(messages, options.cacheTtl ?? CacheTtl.OneHour);

const lastTool = tools[tools.length - 1];
if (lastTool != null) {
lastTool.cache_control = {
type: 'ephemeral',
ttl: options.cacheTtl,
};
}

const body: BetaMessageStreamParams = {
model: options.model,
max_tokens: options.maxTokens,
tools,
context_management,
system: systemPrompts.map((text) => ({ type: 'text', text })),
messages,
system: systemPrompts.map((text) => ({ type: 'text', text, cache_control: { type: 'ephemeral', ttl: options.cacheTtl } }) satisfies BetaTextBlockParam),
messages: messagesForBody,
stream: true,
} satisfies BetaMessageStreamParams;

Expand Down
5 changes: 4 additions & 1 deletion packages/claude-sdk/src/public/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@ export type AnyToolDefinition = {

export type AnthropicBetaFlags = Partial<Record<AnthropicBeta, boolean>>;

export type CacheTtl = '5m' | '1h';
export enum CacheTtl {
FiveMinutes = '5m',
OneHour = '1h',
}

export type RunAgentQuery = {
model: Model;
Expand Down
Loading
Loading