Add cache_control to last user message block on every API call#205
Merged
Add cache_control to last user message block on every API call#205
Conversation
The Anthropic API supports prompt caching via cache_control on content blocks. Previously this wasn't being set on the user message, so the cache boundary was only on the system prompts and tool definitions. WithCachedLastUserMessage wraps the outgoing message list and attaches cache_control to the last non-thinking block of the last user message without mutating the caller's array. String content is promoted to a text block array so the field has somewhere to live. Thinking blocks are skipped because BetaThinkingBlockParam has no cache_control property — the spread would be a type error.
bananabot9000
approved these changes
Apr 7, 2026
Collaborator
bananabot9000
left a comment
There was a problem hiding this comment.
Clean implementation of the caching fix. Three strategic cache breakpoints (system, last tool, last user message), immutable message handling, thinking-block exclusion. The withCachedLastUserMessage + addCacheControlToLastBlock pair is a huge improvement over the upstream nested ternary approach.
Tests are solid - 9 new tests covering all branches, getContentCacheControl helper avoids casts, proper expect(actual) pattern throughout.
One question: maxTokens: 8000 was mentioned as cutting off thinking - is this the final value?
One note: check the session log for anything sensitive before merge (PR #95 lesson).
🍌 Approved
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The system prompts and tool definitions already had
cache_control, but the conversation history did not. Without a cache boundary on the user message, the API re-reads the entire conversation on every turn — only the fixed prefix (system prompt + tools) benefits from caching.What changed
addCacheControlToLastBlock(msg, cacheTtl)attachescache_control: { type: 'ephemeral', ttl }to the last non-thinking content block of a message:BetaContentBlockParam[]array socache_controlhas somewhere to livefindLastIndexskipsthinkingandredacted_thinkingblocks —BetaThinkingBlockParamhas nocache_controlproperty; spreading onto it is a TypeScript errormsgunchanged when all blocks are thinking blocks or the resolved block is nullwithCachedLastUserMessage(messages, cacheTtl)finds the last user message and applies the above without mutating the caller's array:messagesunchanged when no user messages exist[...messages]) and replaces the target elementCall site in
buildRequestParams: usesoptions.cacheTtl ?? CacheTtl.OneHourso the cache boundary is always set even when the caller doesn't specify a TTL.Tests
9 new tests covering all branches: array content, string content (promoted), all-thinking blocks (unchanged), no user messages (unchanged), user followed by assistant (only user gets cache_control), multiple blocks (last only), non-mutation. Full
{ type, ttl }object comparisons throughout.