Skip to content

feat: LE-374 token usage tracking for LLM and Agent components#11891

Merged
viktoravelino merged 52 commits into
release-1.9.0from
feature/le-374
Apr 1, 2026
Merged

feat: LE-374 token usage tracking for LLM and Agent components#11891
viktoravelino merged 52 commits into
release-1.9.0from
feature/le-374

Conversation

@viktoravelino
Copy link
Copy Markdown
Collaborator

@viktoravelino viktoravelino commented Feb 24, 2026

Jira: LE-374

Summary

  • Add end-to-end token usage tracking (input/output/total) for LLM and Agent components across all supported providers (OpenAI, Anthropic, Ollama, and any provider that follows the LangChain usage_metadata standard)
  • Display token counts on node badges (with Coins icon and tooltip breakdown) and in chat messages (alongside run duration)
  • Introduce a thread-safe TokenUsageCallbackHandler that accumulates tokens across the multiple LLM calls agents make per run
  • Enable token usage in streaming mode for OpenAI and Anthropic via stream_usage=True
  • Accumulate serial LLM token usage on chat messages so the total reflects all upstream LLM calls
image image

Details

Backend

Token capture:

  • TokenUsageCallbackHandler — a new BaseCallbackHandler that accumulates tokens across agent LLM calls. Supports 4 extraction strategies: llm_output["token_usage"] (OpenAI legacy), usage_metadata (LangChain standard / Ollama), response_metadata["token_usage"] (OpenAI via LC), response_metadata["usage"] (Anthropic)
  • LCModelComponent.extract_usage() — extended to check message.usage_metadata first (LangChain standard), fixing Ollama support. Falls back to OpenAI and Anthropic response_metadata formats
  • Centralized token usage extraction into a shared token_usage schema module

Pipeline integration:

  • agent.py — wires TokenUsageCallbackHandler into agent execution callbacks, stores usage on Message.properties.usage after run completes
  • model.py — stores _token_usage on the component for both streaming and non-streaming paths; _handle_stream now propagates the AIMessage back so usage can be extracted even when not connected to a chat output
  • vertex/base.py — extracts _token_usage from custom component during finalize_build(), accumulates upstream token usage for output vertices
  • schema.py / schemas.py — carries token_usage through ResultDataResultDataResponse → API response with validation
  • chat_output.py — accumulates token usage from all upstream LLM vertices into the chat message

Streaming token usage:

  • openai_chat_model.py — sets stream_usage=True on ChatOpenAI so the API includes token counts in streaming chunks
  • anthropic.py — sets stream_usage=True explicitly on ChatAnthropic (was default, now explicit for safety)
  • component.py — new _accumulate_usage() method handles providers that split usage across multiple streaming chunks (e.g., Anthropic sends input_tokens on message_start and output_tokens on message_delta); both sync and async iterator paths now accumulate instead of overwriting

Frontend

Types & utilities:

  • UsageType added to PropertiesType in chat types
  • token_usage added to VertexDataTypeAPI
  • formatTokenCount() utility — formats counts as "500", "1.5K", "2.5M"

Node UI:

  • Duration badge shows Coins icon + token count + | + duration (e.g., 🪙 2.5K | 3.2s)
  • Tooltip shows input/output token breakdown with Coins icons
  • Dark tooltip background for better contrast

Chat messages:

  • Status header shows Finished on the left, 🪙 2.5K | 3.2s on the right
  • Token display uses font-mono text-accent-emerald-foreground styling

Test plan

Automated tests

  • Backend: cd src/lfx && uv sync && uv run pytest tests/unit/base/agents/test_token_callback.py tests/unit/graph/test_token_usage_accumulation.py tests/unit/custom/custom_component/test_accumulate_usage.py tests/unit/schema/test_token_usage.py -v
  • Frontend: cd src/frontend && npx vitest run src/utils/__tests__/format-token-count.test.ts

Manual E2E testing

Test 1 — Language Model (non-streaming):

  • Create a flow: Chat Input → OpenAI → Chat Output
  • Disable streaming on the OpenAI component
  • Run the flow — verify node badge shows 🪙 <count> | <time> and tooltip shows input/output breakdown
  • Verify chat message shows Finished on the left, 🪙 <count> | <time> on the right

Test 2 — Language Model (streaming):

  • Same flow with streaming enabled — verify same token display behavior

Test 3 — Agent component:

  • Create a flow: Chat Input → Agent (with tools) → Chat Output
  • Run with a message that triggers tool use — verify accumulated tokens across all LLM calls

Test 4 — Agent without Chat Output:

  • Create a flow: Chat Input → Agent (no Chat Output connected)
  • Run the flow — should complete without errors, node badge shows tokens

Test 5 — Ollama provider:

  • Create a flow with Ollama model — verify token counts appear (uses usage_metadata path)

Test 6 — Anthropic provider:

  • Create a flow with Anthropic (streaming and non-streaming) — verify correct input + output token breakdown

Test 7 — Serial LLMs:

  • Create a flow: Chat Input → OpenAI → OpenAI → Chat Output
  • Run and verify chat message shows accumulated total from both LLMs

Summary by CodeRabbit

  • New Features

    • Token usage tracking with formatted display (K/M notation) in chat messages and node status
    • Accumulated token metrics from upstream components visible throughout the flow
  • Tests

    • Added comprehensive unit tests for token usage extraction, accumulation, and callback handling

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47c5d588-8dc7-47d0-8350-49bdb01d0bc4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR introduces comprehensive token usage tracking throughout Langflow. It adds token_usage fields to API response schemas, implements token extraction and accumulation from LLM calls via callbacks and streaming, updates all starter project ChatOutput templates to propagate upstream token usage, and adds frontend components to display token metrics in chat messages and node status displays.

Changes

Cohort / File(s) Summary
Backend API Schema & Response Types
src/backend/base/langflow/api/v1/schemas.py
Added `token_usage: dict
Frontend Type Definitions
src/frontend/src/types/chat/index.ts, src/frontend/src/types/api/index.ts
Introduced new UsageType with optional input_tokens, output_tokens, total_tokens fields; extended PropertiesType and VertexDataTypeAPI with optional usage field.
Token Usage Extraction & Accumulation
src/lfx/src/lfx/schema/token_usage.py
New module implementing unified token usage extraction from LangChain messages and chunks with multi-strategy support (usage_metadata, OpenAI, Anthropic formats); provides accumulation logic across chunks with null-safe defaults.
Token Usage in Agent Flow
src/lfx/src/lfx/base/agents/token_callback.py, src/lfx/src/lfx/base/agents/agent.py
New TokenUsageCallbackHandler class for thread-safe accumulation of LLM token usage; integrated into agent run flow to extract and store usage on result and stored messages.
Model Component Updates
src/lfx/src/lfx/components/openai/openai_chat_model.py, src/lfx/src/lfx/components/anthropic/anthropic.py, src/lfx/src/lfx/base/models/model.py, src/backend/tests/unit/components/languagemodels/test_openai_model.py
Added stream_usage=True parameter to ChatOpenAI and ChatAnthropic constructors; updated extract_usage return type from `dict
Vertex & Graph Token Usage Aggregation
src/lfx/src/lfx/graph/vertex/base.py, src/lfx/src/lfx/graph/vertex/vertex_types.py, src/lfx/src/lfx/graph/schema.py
Added methods to traverse upstream vertices, accumulate their token usage, and extract component usage; integrated token usage into ResultData output in finalize_build.
ChatOutput Token Usage Propagation
src/lfx/src/lfx/components/input_output/chat_output.py
Enhanced message_response to accumulate upstream token usage and assign to message properties; updates stored message and emits event when usage data present.
Custom Component Token Usage Handling
src/lfx/src/lfx/custom/custom_component/component.py
Updated streaming paths to use Usage type directly instead of dict; refactored usage extraction to use centralized helpers; added _token_usage attribute; changed _stream_message and _handle_async_iterator return types to include `Usage
Starter Project ChatOutput Templates
src/backend/base/langflow/initial_setup/starter_projects/*.json (28 files)
Updated ChatOutput component metadata hashes and/or code implementations across all starter projects (Basic Prompt Chaining, Basic Prompting, Blog Writer, Document Q&A, Financial Report Parser, Instagram Copywriter, Price Deal Finder, Pokédex Agent, Portfolio Website Code Generator, Research Agent, etc.) to support upstream token usage accumulation and message property enrichment.
Frontend UI Components
src/frontend/src/CustomNodes/GenericNode/components/NodeStatus/components/build-status-display.tsx, src/frontend/src/CustomNodes/GenericNode/components/NodeStatus/index.tsx
Added TokenUsageDisplay component to render input/output token counts with Coins icon; updated node status tooltip to conditionally display formatted token counts alongside duration; adjusted typography and layout.
Frontend Chat Message Display
src/frontend/src/components/core/playgroundComponent/chat-view/chat-messages/components/bot-message.tsx
Added computed formattedTokenCount from chat properties; displays token count with Coins icon alongside elapsed time in finished message state when data available.
Token Count Formatting Utility
src/frontend/src/utils/format-token-count.ts, src/frontend/src/utils/__tests__/format-token-count.test.ts
New utility function to format token counts as human-readable strings (K for thousands, M for millions) with decimal precision; includes comprehensive test suite with edge cases and formatting rules.
Test Coverage
src/lfx/tests/unit/base/agents/test_token_callback.py, src/lfx/tests/unit/custom/custom_component/test_accumulate_usage.py, src/lfx/tests/unit/graph/test_token_usage_accumulation.py, src/lfx/tests/unit/schema/test_token_usage.py
Comprehensive test suites for token callback handler (multi-strategy extraction, thread-safety, accumulation), usage accumulation logic, upstream vertex traversal and deduplication, and usage extraction from multiple LLM provider formats.
Metadata & Configuration
.gitignore, .secrets.baseline, src/lfx/src/lfx/_assets/component_index.json, src/lfx/src/lfx/_assets/stable_hash_history.json
Added CLAUDE.local.md to .gitignore; updated stable hash history entries for AnthropicModel, ChatOutput, and OpenAIModel; updated component index metadata.

Sequence Diagram

sequenceDiagram
    actor User
    participant FrontendChat as Frontend Chat
    participant Agent as Agent
    participant LLMModel as LLM Model
    participant TokenCallback as Token Callback
    participant ChatOutput as ChatOutput
    participant MessageStore as Message Store
    participant FrontendDisplay as Frontend Display

    User->>FrontendChat: Send message
    FrontendChat->>Agent: run_agent()
    
    Agent->>TokenCallback: Create handler & register
    Agent->>LLMModel: Invoke LLM with stream_usage=True
    
    LLMModel-->>LLMModel: Stream tokens
    LLMModel->>TokenCallback: on_llm_end(response)
    TokenCallback->>TokenCallback: Extract usage (multi-strategy)
    TokenCallback->>TokenCallback: Accumulate tokens
    
    Agent->>ChatOutput: Process message
    ChatOutput->>TokenCallback: get_usage()
    TokenCallback-->>ChatOutput: Return accumulated Usage
    
    ChatOutput->>ChatOutput: _accumulate_upstream_token_usage()
    ChatOutput->>ChatOutput: Assign to message.properties.usage
    ChatOutput->>MessageStore: Store message with usage
    
    Agent->>Agent: Retrieve usage from handler
    Agent->>Agent: Assign to result.properties.usage
    Agent-->>FrontendChat: Return result with token_usage
    
    FrontendChat->>FrontendDisplay: Display token count
    FrontendDisplay-->>User: Show tokens (K/M formatted)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Suggested labels

refactor

Suggested reviewers

  • erichare
  • HzaRashid
  • Adam-Aghili
  • ogabrielluiz
🚥 Pre-merge checks | ✅ 5 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test File Naming And Structure ⚠️ Warning Frontend test file lacks explicit purpose comments for each test case as required by coding guidelines and flagged in review comments. Add JSDoc or inline comments above each it() block in format-token-count.test.ts explaining test purpose, scenario, and expected outcome.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding token usage tracking for LLM and Agent components, which aligns with the comprehensive set of backend and frontend changes throughout the pull request.
Test Coverage For New Implementations ✅ Passed PR includes comprehensive test coverage for all new implementations: backend test files cover token usage schema, callback handler, accumulation logic, and graph operations; frontend tests verify formatTokenCount utility across multiple scenarios; updated OpenAI model tests validate stream_usage parameter.
Test Quality And Coverage ✅ Passed PR includes comprehensive test coverage across 5 new backend test files with 199-292 lines each using proper pytest async patterns, parametrized tests, fixtures, and detailed assertions validating token accumulation logic, thread-safety, edge cases, and API integrations. Frontend includes 44-line format test covering boundary conditions. Modified backend tests verify stream_usage parameter correctness.
Excessive Mock Usage Warning ✅ Passed The new test suite demonstrates excellent mock usage discipline with only 5 mock instances across 742 total lines of tests, representing less than 1% mock density. Real implementations are used for core logic testing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/le-374

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 24, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 26%
26.71% (27095/101434) 63.43% (3368/5309) 28.96% (643/2220)

Unit Test Results

Tests Skipped Failures Errors Time
2795 0 💤 0 ❌ 0 🔥 4m 2s ⏱️

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 59.13706% with 161 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.43%. Comparing base (ccc7ffa) to head (f143d30).
⚠️ Report is 9 commits behind head on release-1.9.0.

Files with missing lines Patch % Lines
...chat-view/chat-messages/components/bot-message.tsx 27.94% 49 Missing ⚠️
...nts/NodeStatus/components/build-status-display.tsx 11.36% 39 Missing ⚠️
...mNodes/GenericNode/components/NodeStatus/index.tsx 20.00% 24 Missing ⚠️
src/lfx/src/lfx/base/models/model.py 15.38% 11 Missing ⚠️
.../chat-view/chat-messages/hooks/use-chat-history.ts 41.17% 10 Missing ⚠️
src/frontend/src/types/chat/index.ts 0.00% 8 Missing ⚠️
src/lfx/src/lfx/schema/token_usage.py 92.10% 1 Missing and 5 partials ⚠️
...c/lfx/src/lfx/custom/custom_component/component.py 63.63% 4 Missing ⚠️
...l/components/chatView/chatMessage/chat-message.tsx 0.00% 2 Missing ⚠️
src/frontend/src/types/api/index.ts 0.00% 2 Missing ⚠️
... and 4 more

❌ Your project status has failed because the head coverage (45.49%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                @@
##           release-1.9.0   #11891      +/-   ##
=================================================
+ Coverage          48.24%   48.43%   +0.18%     
=================================================
  Files               1869     1874       +5     
  Lines             163692   164154     +462     
  Branches           22596    24005    +1409     
=================================================
+ Hits               78975    79504     +529     
+ Misses             83826    83741      -85     
- Partials             891      909      +18     
Flag Coverage Δ
backend 53.90% <100.00%> (-0.03%) ⬇️
frontend 47.83% <33.65%> (+0.19%) ⬆️
lfx 45.49% <86.33%> (+0.41%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../base/langflow/services/tracing/native_callback.py 88.97% <100.00%> (-2.16%) ⬇️
src/frontend/src/utils/format-token-count.ts 100.00% <100.00%> (ø)
src/lfx/src/lfx/base/agents/token_callback.py 100.00% <100.00%> (ø)
src/lfx/src/lfx/graph/schema.py 75.00% <100.00%> (+1.00%) ⬆️
src/lfx/src/lfx/graph/vertex/vertex_types.py 43.58% <100.00%> (+0.20%) ⬆️
src/lfx/src/lfx/base/agents/agent.py 38.50% <90.00%> (+10.81%) ⬆️
src/lfx/src/lfx/graph/vertex/base.py 64.56% <97.56%> (+2.71%) ⬆️
...l/components/chatView/chatMessage/chat-message.tsx 0.00% <0.00%> (ø)
src/frontend/src/types/api/index.ts 0.00% <0.00%> (ø)
src/frontend/src/types/messages/index.ts 0.00% <0.00%> (ø)
... and 9 more

... and 164 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@viktoravelino viktoravelino self-assigned this Feb 24, 2026
Track input/output/total tokens across LLM providers (OpenAI, Anthropic,
Ollama) and display them on both node badges and chat messages.

Backend: thread-safe callback handler for agent token accumulation,
usage_metadata extraction for Ollama/LangChain standard, pipeline
integration from component through vertex to API response.

Frontend: token count formatting utility, Coins icon badge on nodes
with tooltip breakdown, chat message status with token display.
viktoravelino and others added 20 commits March 2, 2026 14:14
Add upstream token usage accumulation so chat messages display the
total tokens from all LLMs in the pipeline, not just the last one.
Output vertex node badges hide token counts since the accumulated
total is shown on the chat message instead.
Enable stream_usage=True on OpenAI and Anthropic model constructors so
the API includes token counts in streaming chunks.

Fix _handle_stream to propagate the AIMessage back to _get_chat_result
when not connected to a chat output, so usage can be extracted from the
invoke fallback path.

Accumulate usage across multiple streaming chunks instead of overwriting,
since Anthropic splits input/output tokens across separate events.
Extract duplicated token usage logic from Component, LCModelComponent,
TokenUsageCallbackHandler, and Vertex into a shared lfx.schema.token_usage
module. Replace loose dict typing with the existing Usage Pydantic model
throughout the token tracking pipeline. Declare _token_usage on Component
__init__ instead of dynamically injecting it.
@viktoravelino viktoravelino changed the title Feature: le-374 feat: token usage tracking for LLM and Agent components Mar 3, 2026
@viktoravelino viktoravelino changed the title feat: token usage tracking for LLM and Agent components feat: LE-374 token usage tracking for LLM and Agent components Mar 3, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 3, 2026
@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Mar 26, 2026
Copy link
Copy Markdown
Collaborator

@Adam-Aghili Adam-Aghili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM the bug I was seeing is also reproducable on release-1.9.0

Adds the 4 recommended test scenarios identified in Cristhianzl's review
of PR #11891 (token usage tracking):

- TestStreamingTokenAccumulation: verifies extract_usage_from_chunk() +
  accumulate_usage() correctly accumulates across multiple streaming chunks
  (OpenAI, Anthropic, and usage_metadata formats)
- TestChatOutputTokenUsageAccumulation: verifies message_response() sets
  upstream token usage on the message and updates the stored message when
  applicable
- TestAgentTokenCallbackWiring: verifies TokenUsageCallbackHandler is wired
  into run_agent() callbacks and its result is stored on _token_usage
- TestResultDataResponseTokenUsageValidator: verifies the field_validator
  converts Usage Pydantic models to dicts and passes through None/dict values
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 31, 2026
…feature/le-374

# Conflicts:
#	.secrets.baseline
#	src/lfx/src/lfx/_assets/component_index.json
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 31, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 31, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 31, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 31, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 31, 2026
@github-actions github-actions Bot removed the enhancement New feature or request label Mar 31, 2026
autofix-ci Bot and others added 6 commits March 31, 2026 18:17
…er row

Position the EditMessageButton toolbar using \`bottom-full\` instead of \`-top-4\` so it always sits fully above the message container. This prevents the button bar from overlapping the 'Finished in' usage/time row in bot messages.
…feature/le-374

# Conflicts:
#	.secrets.baseline
… feature/le-374

# Conflicts:
#	src/lfx/src/lfx/_assets/component_index.json
…ground

- Wrap "Finished in" stat in a ShadTooltip showing last run time, duration, input/output token breakdown
- Fix node status success background color from bg-success-background to bg-zinc-700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants