Skip to content

Add cache_control to last user message block on every API call#205

Merged
shellicar merged 3 commits intomainfrom
fix/cache-control-on-api-calls
Apr 7, 2026
Merged

Add cache_control to last user message block on every API call#205
shellicar merged 3 commits intomainfrom
fix/cache-control-on-api-calls

Conversation

@shellicar
Copy link
Copy Markdown
Owner

The system prompts and tool definitions already had cache_control, but the conversation history did not. Without a cache boundary on the user message, the API re-reads the entire conversation on every turn — only the fixed prefix (system prompt + tools) benefits from caching.

What changed

addCacheControlToLastBlock(msg, cacheTtl) attaches cache_control: { type: 'ephemeral', ttl } to the last non-thinking content block of a message:

  • String content is promoted to a BetaContentBlockParam[] array so cache_control has somewhere to live
  • findLastIndex skips thinking and redacted_thinking blocks — BetaThinkingBlockParam has no cache_control property; spreading onto it is a TypeScript error
  • Returns msg unchanged when all blocks are thinking blocks or the resolved block is null

withCachedLastUserMessage(messages, cacheTtl) finds the last user message and applies the above without mutating the caller's array:

  • Returns messages unchanged when no user messages exist
  • Copies the array ([...messages]) and replaces the target element

Call site in buildRequestParams: uses options.cacheTtl ?? CacheTtl.OneHour so the cache boundary is always set even when the caller doesn't specify a TTL.

Tests

9 new tests covering all branches: array content, string content (promoted), all-thinking blocks (unchanged), no user messages (unchanged), user followed by assistant (only user gets cache_control), multiple blocks (last only), non-mutation. Full { type, ttl } object comparisons throughout.

The Anthropic API supports prompt caching via cache_control on content
blocks. Previously this wasn't being set on the user message, so the
cache boundary was only on the system prompts and tool definitions.

WithCachedLastUserMessage wraps the outgoing message list and attaches
cache_control to the last non-thinking block of the last user message
without mutating the caller's array. String content is promoted to a
text block array so the field has somewhere to live.

Thinking blocks are skipped because BetaThinkingBlockParam has no
cache_control property — the spread would be a type error.
@shellicar shellicar enabled auto-merge (squash) April 7, 2026 12:41
Copy link
Copy Markdown
Collaborator

@bananabot9000 bananabot9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation of the caching fix. Three strategic cache breakpoints (system, last tool, last user message), immutable message handling, thinking-block exclusion. The withCachedLastUserMessage + addCacheControlToLastBlock pair is a huge improvement over the upstream nested ternary approach.

Tests are solid - 9 new tests covering all branches, getContentCacheControl helper avoids casts, proper expect(actual) pattern throughout.

One question: maxTokens: 8000 was mentioned as cutting off thinking - is this the final value?

One note: check the session log for anything sensitive before merge (PR #95 lesson).

🍌 Approved

@shellicar shellicar merged commit eaadf42 into main Apr 7, 2026
4 checks passed
@shellicar shellicar deleted the fix/cache-control-on-api-calls branch April 7, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants