Skip to content

fix: correct token double-counting for Anthropic and Bedrock providers#2861

Merged
tusharmath merged 3 commits intomainfrom
fix/token-counting-bugs
Apr 6, 2026
Merged

fix: correct token double-counting for Anthropic and Bedrock providers#2861
tusharmath merged 3 commits intomainfrom
fix/token-counting-bugs

Conversation

@amitksingh1490
Copy link
Copy Markdown
Contributor

@amitksingh1490 amitksingh1490 commented Apr 6, 2026

Summary

Fix token double-counting for Anthropic providers and incorrect prompt_tokens mapping for Bedrock, bringing token reporting.

Context

  1. Anthropic double-counting: Anthropic streams usage as cumulative values across message_start and message_delta events (ref). Our code was using accumulate() (sum) to combine them, producing 1 + N instead of the correct N for output tokens.

  2. Bedrock prompt_tokens: prompt_tokens was set to total_tokens instead of input_tokens, inflating the reported input count by including output tokens.

Changes

Bug 1 – Anthropic cumulative usage (6 providers affected)

  • crates/forge_domain/src/message.rs: Added Usage::merge() which uses max() per token field instead of +. Cost is still summed since cost events are additive.
  • crates/forge_domain/src/context.rs: Added TokenCount::max() for comparing two token counts by inner value while preserving Actual/Approx semantics.
  • crates/forge_domain/src/result_stream_ext.rs: Changed partial-usage branch from accumulate() to merge(). Also fixed cost-only events to sum costs instead of replacing.

Affected providers: anthropic, claude_code, anthropic_compatible, vertex_ai_anthropic, minimax, alibaba_coding.

Bug 2 – Bedrock prompt_tokens

  • crates/forge_repo/src/provider/bedrock.rs: Changed u.total_tokensu.input_tokens.

Affected providers: all Bedrock models.

Key Implementation Details

The distinction between accumulate and merge:

Method Strategy Use case
accumulate() Sum all fields Session-level totals across independent requests
merge() Max per token field, sum cost Combining partial streaming events within one response

Documentation references

Testing

All existing tests updated + new tests added:

  • test_into_full_anthropic_streaming_usage_merge – covers real Anthropic pattern where message_start has output_tokens=1
  • test_into_full_anthropic_streaming_usage_merge_zero_output – covers Vertex AI pattern where message_start has output_tokens=0
  • test_usage_merge_anthropic_cumulative – unit test for merge logic
  • test_usage_merge_preserves_costs – verifies cost summation in merge
cargo test -p forge_domain --lib
cargo test -p forge_app --lib
cargo test -p forge_repo --lib

All 1,381 lib tests pass.

Bug 1 - Anthropic double-counting:
Anthropic streams usage as CUMULATIVE values across message_start and
message_delta events. The code was using accumulate (sum) to combine them,
causing output_tokens to be over-counted (1 + N instead of N) when
message_start includes output_tokens=1.

Fix: Introduced Usage::merge() which uses max() instead of sum for token
fields. This correctly handles cumulative values - the larger value wins.
ref: https://platform.claude.com/docs/en/build-with-claude/streaming#event-types

Affected providers: anthropic, claude_code, anthropic_compatible,
vertex_ai_anthropic, minimax, alibaba_coding, opencode_zen (claude-* models)

Bug 2 - Bedrock prompt_tokens:
Bedrock was setting prompt_tokens to total_tokens instead of input_tokens,
inflating the reported input token count by including output tokens.

Fix: Changed u.total_tokens to u.input_tokens.

Affected providers: All bedrock models.

Also fixed: cost-only events now properly accumulate costs (sum) instead of
replacing them.

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
@github-actions github-actions bot added the type: fix Iterations on existing features or infrastructure. label Apr 6, 2026
Comment thread crates/forge_domain/src/context.rs
The previous implementation used Deref comparison which returned the
original variant unchanged. When Actual(200) was compared with Approx(100),
it returned Actual(200) - violating the documented contract that the result
should be Approx if either input is Approx.

Now uses explicit match to ensure Approx propagation matches documentation.

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
@tusharmath tusharmath merged commit caf374e into main Apr 6, 2026
14 checks passed
@tusharmath tusharmath deleted the fix/token-counting-bugs branch April 6, 2026 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants