Skip to content

fix(session): include cache.write tokens in isOverflow() context calc…#12074

Open
NamedIdentity wants to merge 1 commit intoanomalyco:devfrom
NamedIdentity:fix/isoverflow-cache-write
Open

fix(session): include cache.write tokens in isOverflow() context calc…#12074
NamedIdentity wants to merge 1 commit intoanomalyco:devfrom
NamedIdentity:fix/isoverflow-cache-write

Conversation

@NamedIdentity
Copy link

PR: fix(session): include cache.write tokens in isOverflow() context calculation

Relates to #10017
Relates to #10634

Title

fix(session): include cache.write tokens in isOverflow() context calculation

Summary

isOverflow() excludes cache.write (cache_creation_input_tokens) from its token count, causing compaction to trigger late — sometimes well past the intended threshold. This is a one-line fix adding input.tokens.cache.write to the sum on line 35 of compaction.ts.

Problem

When Anthropic models use prompt caching, the API response partitions total input tokens into three disjoint categories:

total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens

This is documented by Anthropic: these three fields are mutually exclusive partitions — a token counted in cache_creation_input_tokens is NOT also counted in input_tokens.

OpenCode correctly extracts all three into MessageV2.Assistant.tokens:

  • inputinput_tokens
  • cache.readcache_read_input_tokens
  • cache.writecache_creation_input_tokens

However, isOverflow() only sums two of the three input partitions:

// compaction.ts line 35 (current)
const count = input.tokens.input + input.tokens.cache.read + input.tokens.output

This means cache.write tokens — which represent real input tokens occupying context space — are invisible to the overflow check. When a session has significant cache creation activity, isOverflow() systematically underreports context utilization and compaction triggers late.

Evidence

Debug log from a real session shows the magnitude of the discrepancy at a single turn:

Metric Value
input_tokens 6,321
cache_read_input_tokens 44,539
cache_creation_input_tokens 62,900
output_tokens 2,746
isOverflow() count (current) 53,606 (input + cache.read + output)
Actual input tokens 116,506 (input + cache.read + cache.write + output)
Reported utilization ~24% of usable capacity
Actual utilization ~54% of usable capacity

The 62,900 cache_creation_input_tokens were completely excluded from the overflow calculation. The function reported ~24% utilization when the actual context usage was ~54%.

This pattern compounds across turns. As the session progresses and cache stabilizes (more reads, fewer writes), the gap narrows — but by then the session may already be well past the point where compaction should have triggered.

Community Signals

Multiple community PRs have worked around symptoms consistent with late compaction:

These workarounds address the symptom (compaction triggers too late) without identifying the root cause (cache.write exclusion from the token count).

The Fix

One-line change — add input.tokens.cache.write to the token sum:

// packages/opencode/src/session/compaction.ts, line 35
- const count = input.tokens.input + input.tokens.cache.read + input.tokens.output
+ const count = input.tokens.input + input.tokens.cache.read + input.tokens.cache.write + input.tokens.output

Why this is correct

  1. Anthropic's token partitioning is disjoint: input_tokens, cache_read_input_tokens, and cache_creation_input_tokens are mutually exclusive. All three represent real tokens sent to the model that occupy context window space.

  2. The type system already has the data: MessageV2.Assistant.tokens.cache.write is defined and populated. The StepFinishPart schema also tracks cache.write. The data flows correctly through the system — it's just not used in the overflow check.

  3. No impact on non-caching providers: For providers that don't support prompt caching, cache.write is 0, so the sum is unchanged.

  4. Reasoning tokens are intentionally excluded: Note that tokens.reasoning is also not included in the count. This is correct — reasoning tokens are billed separately and don't occupy the input context window. The current code correctly excludes reasoning but incorrectly excludes cache.write.

Testing Notes

  • Anthropic models with prompt caching: Verify compaction triggers at appropriate context utilization. Sessions should no longer run significantly past the intended overflow threshold before compaction kicks in.
  • Non-caching providers: Verify no behavioral change (cache.write = 0, sum unchanged).
  • TUI context display: The TUI percentage display uses a different calculation path and is not affected by this change.
  • Debug logging: Adding temporary logging of cache.write values alongside existing token logging can help verify the fix is working as expected.

Risk Assessment

Low risk. This is a pure additive fix to arithmetic that was already intended to sum all input tokens. It makes isOverflow() consistent with the token partitioning model that the rest of the codebase already implements correctly.

The primary behavioral change is that compaction will trigger earlier for Anthropic sessions — which is the correct behavior. Sessions that previously ran past the overflow threshold due to excluded cache.write tokens will now trigger compaction at the intended point.

…ulation

isOverflow() excludes cache_creation_input_tokens from its token count,
causing compaction to trigger late for Anthropic models using prompt caching.

Anthropic partitions input tokens into three disjoint categories:
  total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens

The overflow check only summed two of three, missing cache.write entirely.
Debug data showed 24% reported vs 54% actual utilization at a single turn,
with 62,900 cache_creation_input_tokens invisible to the overflow check.

This adds input.tokens.cache.write to the sum. No impact on non-caching
providers (cache.write defaults to 0).
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

The following comment was made by an LLM, it may be inaccurate:

Found Related PR

PR #6562: fix(session): prevent context overflow by adding safety margin to compaction check

Why it's related:
Your PR #12074 identifies the root cause of late compaction triggers — the exclusion of cache.write tokens from the isOverflow() calculation. PR #6562 addresses the symptom of this same issue by adding a configurable safety buffer to lower the compaction trigger point. Once your fix is merged, PR #6562's workaround may become unnecessary or need adjustment, as compaction will trigger earlier automatically when cache.write tokens are properly counted.

No duplicate PRs found addressing the exact same fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments