fix(session): include cache.write tokens in isOverflow() context calc…#12074
fix(session): include cache.write tokens in isOverflow() context calc…#12074NamedIdentity wants to merge 1 commit intoanomalyco:devfrom
Conversation
…ulation isOverflow() excludes cache_creation_input_tokens from its token count, causing compaction to trigger late for Anthropic models using prompt caching. Anthropic partitions input tokens into three disjoint categories: total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens The overflow check only summed two of three, missing cache.write entirely. Debug data showed 24% reported vs 54% actual utilization at a single turn, with 62,900 cache_creation_input_tokens invisible to the overflow check. This adds input.tokens.cache.write to the sum. No impact on non-caching providers (cache.write defaults to 0).
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
The following comment was made by an LLM, it may be inaccurate: Found Related PRPR #6562: fix(session): prevent context overflow by adding safety margin to compaction check Why it's related: No duplicate PRs found addressing the exact same fix. |
PR: fix(session): include cache.write tokens in isOverflow() context calculation
Relates to #10017
Relates to #10634
Title
fix(session): include cache.write tokens in isOverflow() context calculationSummary
isOverflow()excludescache.write(cache_creation_input_tokens) from its token count, causing compaction to trigger late — sometimes well past the intended threshold. This is a one-line fix addinginput.tokens.cache.writeto the sum on line 35 ofcompaction.ts.Problem
When Anthropic models use prompt caching, the API response partitions total input tokens into three disjoint categories:
This is documented by Anthropic: these three fields are mutually exclusive partitions — a token counted in
cache_creation_input_tokensis NOT also counted ininput_tokens.OpenCode correctly extracts all three into
MessageV2.Assistant.tokens:input←input_tokenscache.read←cache_read_input_tokenscache.write←cache_creation_input_tokensHowever,
isOverflow()only sums two of the three input partitions:This means
cache.writetokens — which represent real input tokens occupying context space — are invisible to the overflow check. When a session has significant cache creation activity,isOverflow()systematically underreports context utilization and compaction triggers late.Evidence
Debug log from a real session shows the magnitude of the discrepancy at a single turn:
input_tokenscache_read_input_tokenscache_creation_input_tokensoutput_tokensThe 62,900
cache_creation_input_tokenswere completely excluded from the overflow calculation. The function reported ~24% utilization when the actual context usage was ~54%.This pattern compounds across turns. As the session progresses and cache stabilizes (more reads, fewer writes), the gap narrows — but by then the session may already be well past the point where compaction should have triggered.
Community Signals
Multiple community PRs have worked around symptoms consistent with late compaction:
These workarounds address the symptom (compaction triggers too late) without identifying the root cause (cache.write exclusion from the token count).
The Fix
One-line change — add
input.tokens.cache.writeto the token sum:Why this is correct
Anthropic's token partitioning is disjoint:
input_tokens,cache_read_input_tokens, andcache_creation_input_tokensare mutually exclusive. All three represent real tokens sent to the model that occupy context window space.The type system already has the data:
MessageV2.Assistant.tokens.cache.writeis defined and populated. TheStepFinishPartschema also trackscache.write. The data flows correctly through the system — it's just not used in the overflow check.No impact on non-caching providers: For providers that don't support prompt caching,
cache.writeis0, so the sum is unchanged.Reasoning tokens are intentionally excluded: Note that
tokens.reasoningis also not included in the count. This is correct — reasoning tokens are billed separately and don't occupy the input context window. The current code correctly excludes reasoning but incorrectly excludes cache.write.Testing Notes
cache.writevalues alongside existing token logging can help verify the fix is working as expected.Risk Assessment
Low risk. This is a pure additive fix to arithmetic that was already intended to sum all input tokens. It makes
isOverflow()consistent with the token partitioning model that the rest of the codebase already implements correctly.The primary behavioral change is that compaction will trigger earlier for Anthropic sessions — which is the correct behavior. Sessions that previously ran past the overflow threshold due to excluded cache.write tokens will now trigger compaction at the intended point.