feat: add StreamChunk::Usage variant for streaming token usage by nazq · Pull Request #96 · graniet/llm

nazq · 2025-12-19T14:57:49Z

Summary

Adds StreamChunk::Usage(Usage) variant to expose token usage data during streaming
Parses usage from Anthropic message_delta events
Parses usage from OpenAI final chunks (when stream_options.include_usage is set)
Usage chunk is emitted immediately before Done for predictable consumer patterns

Motivation

When using chat_stream_with_tools, we need a way to get token usage data (input_tokens, output_tokens, cache_read_tokens, etc.) that's available in non-streaming ChatResponse::usage(). Both Anthropic and OpenAI include usage in their final streaming events, but this data wasn't being exposed.

Changes

File	Description
`src/chat/mod.rs`	Added `StreamChunk::Usage(Usage)` variant
`src/backends/anthropic.rs`	Added `usage` field to `AnthropicStreamResponse`, created `convert_anthropic_usage()` helper, updated parser to emit Usage before Done
`src/providers/openai_compatible.rs`	Added `usage` field to `OpenAIToolStreamChunk`, updated parser to emit Usage (handles both inline and separate chunk cases)
`tests/test_backends.rs`	Added Usage case to match statement in integration tests

API Behavior

Anthropic:

message_delta events include cumulative usage field
Usage is emitted as StreamChunk::Usage immediately before StreamChunk::Done

OpenAI:

Final chunk contains usage when stream_options.include_usage: true (already configured)
Handles both cases: usage in same chunk as finish_reason, or in separate empty-choices chunk

Usage Example

while let Some(chunk) = stream.next().await {
    match chunk? {
        StreamChunk::Text(t) => print!("{}", t),
        StreamChunk::Usage(usage) => {
            println!("Tokens: {} in, {} out", usage.prompt_tokens, usage.completion_tokens);
            if let Some(details) = usage.prompt_tokens_details {
                if let Some(cached) = details.cached_tokens {
                    println!("Cache hits: {}", cached);
                }
            }
        }
        StreamChunk::Done { stop_reason } => break,
        _ => {}
    }
}

Test Plan

Build passes
Clippy passes (no warnings)
30 unit tests pass (including 5 new usage tests)
Integration tests pass
Anthropic usage parsing with cache tokens
OpenAI usage parsing with prompt_tokens_details

Copilot

Pull request overview

This PR adds streaming token usage support by introducing a new StreamChunk::Usage(Usage) variant. This enables consumers to receive token usage data (including cache hits/misses) during streaming operations, which was previously only available in non-streaming responses.

Key Changes:

Added StreamChunk::Usage(Usage) variant to expose token usage during streaming
Implemented usage parsing from both Anthropic message_delta events and OpenAI final chunks
Usage is consistently emitted immediately before the Done chunk

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`src/chat/mod.rs`	Added new `StreamChunk::Usage(Usage)` enum variant for streaming token usage
`src/backends/anthropic.rs`	Added usage field to `AnthropicStreamResponse`, implemented `convert_anthropic_usage()` helper, updated parser to return `Vec<StreamChunk>` and emit usage before Done
`src/providers/openai_compatible.rs`	Added usage field to `OpenAIToolStreamChunk`, implemented logic to handle both inline and separate chunk usage patterns
`tests/test_backends.rs`	Updated integration test to handle new Usage variant in match statement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Emits Usage chunk from both Anthropic and OpenAI streaming responses: - Anthropic: extracts usage from message_delta event - OpenAI: extracts usage from final chunk (requires stream_options.include_usage) - Usage is emitted immediately before Done chunk - Includes cache token support via prompt_tokens_details

graniet requested a review from Copilot December 19, 2025 21:00

Copilot started reviewing on behalf of graniet December 19, 2025 21:00 View session

Copilot AI reviewed Dec 19, 2025

View reviewed changes

nazq force-pushed the stream_usage branch from e55c90b to 7ab493e Compare December 31, 2025 16:33

nazq mentioned this pull request Dec 31, 2025

feat: Add opt-in metrics collection for chat requests #102

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add StreamChunk::Usage variant for streaming token usage#96

feat: add StreamChunk::Usage variant for streaming token usage#96
nazq wants to merge 1 commit intograniet:mainfrom
nazq:stream_usage

nazq commented Dec 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

nazq commented Dec 19, 2025

Summary

Motivation

Changes

API Behavior

Usage Example

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant