feat: LE-374 token usage tracking for LLM and Agent components by viktoravelino · Pull Request #11891 · langflow-ai/langflow

viktoravelino · 2026-02-24T22:48:10Z

Jira: LE-374

Summary

Add end-to-end token usage tracking (input/output/total) for LLM and Agent components across all supported providers (OpenAI, Anthropic, Ollama, and any provider that follows the LangChain usage_metadata standard)
Display token counts on node badges (with Coins icon and tooltip breakdown) and in chat messages (alongside run duration)
Introduce a thread-safe TokenUsageCallbackHandler that accumulates tokens across the multiple LLM calls agents make per run
Enable token usage in streaming mode for OpenAI and Anthropic via stream_usage=True
Accumulate serial LLM token usage on chat messages so the total reflects all upstream LLM calls

Details

Backend

Token capture:

TokenUsageCallbackHandler — a new BaseCallbackHandler that accumulates tokens across agent LLM calls. Supports 4 extraction strategies: llm_output["token_usage"] (OpenAI legacy), usage_metadata (LangChain standard / Ollama), response_metadata["token_usage"] (OpenAI via LC), response_metadata["usage"] (Anthropic)
LCModelComponent.extract_usage() — extended to check message.usage_metadata first (LangChain standard), fixing Ollama support. Falls back to OpenAI and Anthropic response_metadata formats
Centralized token usage extraction into a shared token_usage schema module

Pipeline integration:

agent.py — wires TokenUsageCallbackHandler into agent execution callbacks, stores usage on Message.properties.usage after run completes
model.py — stores _token_usage on the component for both streaming and non-streaming paths; _handle_stream now propagates the AIMessage back so usage can be extracted even when not connected to a chat output
vertex/base.py — extracts _token_usage from custom component during finalize_build(), accumulates upstream token usage for output vertices
schema.py / schemas.py — carries token_usage through ResultData → ResultDataResponse → API response with validation
chat_output.py — accumulates token usage from all upstream LLM vertices into the chat message

Streaming token usage:

openai_chat_model.py — sets stream_usage=True on ChatOpenAI so the API includes token counts in streaming chunks
anthropic.py — sets stream_usage=True explicitly on ChatAnthropic (was default, now explicit for safety)
component.py — new _accumulate_usage() method handles providers that split usage across multiple streaming chunks (e.g., Anthropic sends input_tokens on message_start and output_tokens on message_delta); both sync and async iterator paths now accumulate instead of overwriting

Frontend

Types & utilities:

UsageType added to PropertiesType in chat types
token_usage added to VertexDataTypeAPI
formatTokenCount() utility — formats counts as "500", "1.5K", "2.5M"

Node UI:

Duration badge shows Coins icon + token count + | + duration (e.g., 🪙 2.5K | 3.2s)
Tooltip shows input/output token breakdown with Coins icons
Dark tooltip background for better contrast

Chat messages:

Status header shows Finished on the left, 🪙 2.5K | 3.2s on the right
Token display uses font-mono text-accent-emerald-foreground styling

Test plan

Automated tests

Backend: cd src/lfx && uv sync && uv run pytest tests/unit/base/agents/test_token_callback.py tests/unit/graph/test_token_usage_accumulation.py tests/unit/custom/custom_component/test_accumulate_usage.py tests/unit/schema/test_token_usage.py -v
Frontend: cd src/frontend && npx vitest run src/utils/__tests__/format-token-count.test.ts

Manual E2E testing

Test 1 — Language Model (non-streaming):

Create a flow: Chat Input → OpenAI → Chat Output
Disable streaming on the OpenAI component
Run the flow — verify node badge shows 🪙 <count> | <time> and tooltip shows input/output breakdown
Verify chat message shows Finished on the left, 🪙 <count> | <time> on the right

Test 2 — Language Model (streaming):

Same flow with streaming enabled — verify same token display behavior

Test 3 — Agent component:

Create a flow: Chat Input → Agent (with tools) → Chat Output
Run with a message that triggers tool use — verify accumulated tokens across all LLM calls

Test 4 — Agent without Chat Output:

Create a flow: Chat Input → Agent (no Chat Output connected)
Run the flow — should complete without errors, node badge shows tokens

Test 5 — Ollama provider:

Create a flow with Ollama model — verify token counts appear (uses usage_metadata path)

Test 6 — Anthropic provider:

Create a flow with Anthropic (streaming and non-streaming) — verify correct input + output token breakdown

Test 7 — Serial LLMs:

Create a flow: Chat Input → OpenAI → OpenAI → Chat Output
Run and verify chat message shows accumulated total from both LLMs

Summary by CodeRabbit

New Features
- Token usage tracking with formatted display (K/M notation) in chat messages and node status
- Accumulated token metrics from upstream components visible throughout the flow
Tests
- Added comprehensive unit tests for token usage extraction, accumulation, and callback handling

coderabbitai · 2026-02-24T22:48:20Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47c5d588-8dc7-47d0-8350-49bdb01d0bc4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This PR introduces comprehensive token usage tracking throughout Langflow. It adds token_usage fields to API response schemas, implements token extraction and accumulation from LLM calls via callbacks and streaming, updates all starter project ChatOutput templates to propagate upstream token usage, and adds frontend components to display token metrics in chat messages and node status displays.

Changes

Cohort / File(s)	Summary
Backend API Schema & Response Types `src/backend/base/langflow/api/v1/schemas.py`	Added `token_usage: dict
Frontend Type Definitions `src/frontend/src/types/chat/index.ts`, `src/frontend/src/types/api/index.ts`	Introduced new `UsageType` with optional `input_tokens`, `output_tokens`, `total_tokens` fields; extended `PropertiesType` and `VertexDataTypeAPI` with optional `usage` field.
Token Usage Extraction & Accumulation `src/lfx/src/lfx/schema/token_usage.py`	New module implementing unified token usage extraction from LangChain messages and chunks with multi-strategy support (usage_metadata, OpenAI, Anthropic formats); provides accumulation logic across chunks with null-safe defaults.
Token Usage in Agent Flow `src/lfx/src/lfx/base/agents/token_callback.py`, `src/lfx/src/lfx/base/agents/agent.py`	New `TokenUsageCallbackHandler` class for thread-safe accumulation of LLM token usage; integrated into agent run flow to extract and store usage on result and stored messages.
Model Component Updates `src/lfx/src/lfx/components/openai/openai_chat_model.py`, `src/lfx/src/lfx/components/anthropic/anthropic.py`, `src/lfx/src/lfx/base/models/model.py`, `src/backend/tests/unit/components/languagemodels/test_openai_model.py`	Added `stream_usage=True` parameter to ChatOpenAI and ChatAnthropic constructors; updated `extract_usage` return type from `dict
Vertex & Graph Token Usage Aggregation `src/lfx/src/lfx/graph/vertex/base.py`, `src/lfx/src/lfx/graph/vertex/vertex_types.py`, `src/lfx/src/lfx/graph/schema.py`	Added methods to traverse upstream vertices, accumulate their token usage, and extract component usage; integrated token usage into `ResultData` output in `finalize_build`.
ChatOutput Token Usage Propagation `src/lfx/src/lfx/components/input_output/chat_output.py`	Enhanced message_response to accumulate upstream token usage and assign to message properties; updates stored message and emits event when usage data present.
Custom Component Token Usage Handling `src/lfx/src/lfx/custom/custom_component/component.py`	Updated streaming paths to use `Usage` type directly instead of dict; refactored usage extraction to use centralized helpers; added `_token_usage` attribute; changed `_stream_message` and `_handle_async_iterator` return types to include `Usage
Starter Project ChatOutput Templates `src/backend/base/langflow/initial_setup/starter_projects/*.json` (28 files)	Updated ChatOutput component metadata hashes and/or code implementations across all starter projects (Basic Prompt Chaining, Basic Prompting, Blog Writer, Document Q&A, Financial Report Parser, Instagram Copywriter, Price Deal Finder, Pokédex Agent, Portfolio Website Code Generator, Research Agent, etc.) to support upstream token usage accumulation and message property enrichment.
Frontend UI Components `src/frontend/src/CustomNodes/GenericNode/components/NodeStatus/components/build-status-display.tsx`, `src/frontend/src/CustomNodes/GenericNode/components/NodeStatus/index.tsx`	Added `TokenUsageDisplay` component to render input/output token counts with Coins icon; updated node status tooltip to conditionally display formatted token counts alongside duration; adjusted typography and layout.
Frontend Chat Message Display `src/frontend/src/components/core/playgroundComponent/chat-view/chat-messages/components/bot-message.tsx`	Added computed `formattedTokenCount` from chat properties; displays token count with Coins icon alongside elapsed time in finished message state when data available.
Token Count Formatting Utility `src/frontend/src/utils/format-token-count.ts`, `src/frontend/src/utils/__tests__/format-token-count.test.ts`	New utility function to format token counts as human-readable strings (K for thousands, M for millions) with decimal precision; includes comprehensive test suite with edge cases and formatting rules.
Test Coverage `src/lfx/tests/unit/base/agents/test_token_callback.py`, `src/lfx/tests/unit/custom/custom_component/test_accumulate_usage.py`, `src/lfx/tests/unit/graph/test_token_usage_accumulation.py`, `src/lfx/tests/unit/schema/test_token_usage.py`	Comprehensive test suites for token callback handler (multi-strategy extraction, thread-safety, accumulation), usage accumulation logic, upstream vertex traversal and deduplication, and usage extraction from multiple LLM provider formats.
Metadata & Configuration `.gitignore`, `.secrets.baseline`, `src/lfx/src/lfx/_assets/component_index.json`, `src/lfx/src/lfx/_assets/stable_hash_history.json`	Added CLAUDE.local.md to .gitignore; updated stable hash history entries for AnthropicModel, ChatOutput, and OpenAIModel; updated component index metadata.

Sequence Diagram

sequenceDiagram
    actor User
    participant FrontendChat as Frontend Chat
    participant Agent as Agent
    participant LLMModel as LLM Model
    participant TokenCallback as Token Callback
    participant ChatOutput as ChatOutput
    participant MessageStore as Message Store
    participant FrontendDisplay as Frontend Display

    User->>FrontendChat: Send message
    FrontendChat->>Agent: run_agent()
    
    Agent->>TokenCallback: Create handler & register
    Agent->>LLMModel: Invoke LLM with stream_usage=True
    
    LLMModel-->>LLMModel: Stream tokens
    LLMModel->>TokenCallback: on_llm_end(response)
    TokenCallback->>TokenCallback: Extract usage (multi-strategy)
    TokenCallback->>TokenCallback: Accumulate tokens
    
    Agent->>ChatOutput: Process message
    ChatOutput->>TokenCallback: get_usage()
    TokenCallback-->>ChatOutput: Return accumulated Usage
    
    ChatOutput->>ChatOutput: _accumulate_upstream_token_usage()
    ChatOutput->>ChatOutput: Assign to message.properties.usage
    ChatOutput->>MessageStore: Store message with usage
    
    Agent->>Agent: Retrieve usage from handler
    Agent->>Agent: Assign to result.properties.usage
    Agent-->>FrontendChat: Return result with token_usage
    
    FrontendChat->>FrontendDisplay: Display token count
    FrontendDisplay-->>User: Show tokens (K/M formatted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

fix: avoid updating Message if ChatOutput is connected to ChatInput #10586: Modifies ChatOutput.message_response in src/lfx/.../chat_output.py with similar token usage propagation logic.
deps: upgrade altk #10804: Updates ChatOutput implementations across starter project JSONs, sharing the same component modernization goals.
fix: Image upload for Gemini/Anthropic #10867: Extends ChatOutput message_response with session_id preservation and related message handling, intersecting with token usage feature.

Suggested labels

refactor

Suggested reviewers

erichare
HzaRashid
Adam-Aghili
ogabrielluiz

🚥 Pre-merge checks | ✅ 5 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test File Naming And Structure	⚠️ Warning	Frontend test file lacks explicit purpose comments for each test case as required by coding guidelines and flagged in review comments.	Add JSDoc or inline comments above each it() block in format-token-count.test.ts explaining test purpose, scenario, and expected outcome.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding token usage tracking for LLM and Agent components, which aligns with the comprehensive set of backend and frontend changes throughout the pull request.
Test Coverage For New Implementations	✅ Passed	PR includes comprehensive test coverage for all new implementations: backend test files cover token usage schema, callback handler, accumulation logic, and graph operations; frontend tests verify formatTokenCount utility across multiple scenarios; updated OpenAI model tests validate stream_usage parameter.
Test Quality And Coverage	✅ Passed	PR includes comprehensive test coverage across 5 new backend test files with 199-292 lines each using proper pytest async patterns, parametrized tests, fixtures, and detailed assertions validating token accumulation logic, thread-safety, edge cases, and API integrations. Frontend includes 44-line format test covering boundary conditions. Modified backend tests verify stream_usage parameter correctness.
Excessive Mock Usage Warning	✅ Passed	The new test suite demonstrates excellent mock usage discipline with only 5 mock instances across 742 total lines of tests, representing less than 1% mock density. Real implementations are used for core logic testing.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/le-374

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-24T22:50:24Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	26.71% (27095/101434)	63.43% (3368/5309)	28.96% (643/2220)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
2795	0 💤	0 ❌	0 🔥	4m 2s ⏱️

codecov · 2026-02-24T23:08:08Z

Codecov Report

❌ Patch coverage is 59.13706% with 161 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.43%. Comparing base (ccc7ffa) to head (f143d30).
⚠️ Report is 9 commits behind head on release-1.9.0.

Files with missing lines	Patch %	Lines
...chat-view/chat-messages/components/bot-message.tsx	27.94%	49 Missing ⚠️
...nts/NodeStatus/components/build-status-display.tsx	11.36%	39 Missing ⚠️
...mNodes/GenericNode/components/NodeStatus/index.tsx	20.00%	24 Missing ⚠️
src/lfx/src/lfx/base/models/model.py	15.38%	11 Missing ⚠️
.../chat-view/chat-messages/hooks/use-chat-history.ts	41.17%	10 Missing ⚠️
src/frontend/src/types/chat/index.ts	0.00%	8 Missing ⚠️
src/lfx/src/lfx/schema/token_usage.py	92.10%	1 Missing and 5 partials ⚠️
...c/lfx/src/lfx/custom/custom_component/component.py	63.63%	4 Missing ⚠️
...l/components/chatView/chatMessage/chat-message.tsx	0.00%	2 Missing ⚠️
src/frontend/src/types/api/index.ts	0.00%	2 Missing ⚠️
... and 4 more

❌ Your project status has failed because the head coverage (45.49%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@                Coverage Diff                @@
##           release-1.9.0   #11891      +/-   ##
=================================================
+ Coverage          48.24%   48.43%   +0.18%     
=================================================
  Files               1869     1874       +5     
  Lines             163692   164154     +462     
  Branches           22596    24005    +1409     
=================================================
+ Hits               78975    79504     +529     
+ Misses             83826    83741      -85     
- Partials             891      909      +18

Flag	Coverage Δ
backend	`53.90% <100.00%> (-0.03%)`	⬇️
frontend	`47.83% <33.65%> (+0.19%)`	⬆️
lfx	`45.49% <86.33%> (+0.41%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../base/langflow/services/tracing/native_callback.py	`88.97% <100.00%> (-2.16%)`	⬇️
src/frontend/src/utils/format-token-count.ts	`100.00% <100.00%> (ø)`
src/lfx/src/lfx/base/agents/token_callback.py	`100.00% <100.00%> (ø)`
src/lfx/src/lfx/graph/schema.py	`75.00% <100.00%> (+1.00%)`	⬆️
src/lfx/src/lfx/graph/vertex/vertex_types.py	`43.58% <100.00%> (+0.20%)`	⬆️
src/lfx/src/lfx/base/agents/agent.py	`38.50% <90.00%> (+10.81%)`	⬆️
src/lfx/src/lfx/graph/vertex/base.py	`64.56% <97.56%> (+2.71%)`	⬆️
...l/components/chatView/chatMessage/chat-message.tsx	`0.00% <0.00%> (ø)`
src/frontend/src/types/api/index.ts	`0.00% <0.00%> (ø)`
src/frontend/src/types/messages/index.ts	`0.00% <0.00%> (ø)`
... and 9 more

... and 164 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Track input/output/total tokens across LLM providers (OpenAI, Anthropic, Ollama) and display them on both node badges and chat messages. Backend: thread-safe callback handler for agent token accumulation, usage_metadata extraction for Ollama/LangChain standard, pipeline integration from component through vertex to API response. Frontend: token count formatting utility, Coins icon badge on nodes with tooltip breakdown, chat message status with token display.

Add upstream token usage accumulation so chat messages display the total tokens from all LLMs in the pipeline, not just the last one. Output vertex node badges hide token counts since the accumulated total is shown on the chat message instead.

Enable stream_usage=True on OpenAI and Anthropic model constructors so the API includes token counts in streaming chunks. Fix _handle_stream to propagate the AIMessage back to _get_chat_result when not connected to a chat output, so usage can be extracted from the invoke fallback path. Accumulate usage across multiple streaming chunks instead of overwriting, since Anthropic splits input/output tokens across separate events.

Extract duplicated token usage logic from Component, LCModelComponent, TokenUsageCallbackHandler, and Vertex into a shared lfx.schema.token_usage module. Replace loose dict typing with the existing Usage Pydantic model throughout the token tracking pipeline. Declare _token_usage on Component __init__ instead of dynamically injecting it.

… feature/le-374

…e-374

… feature/le-374

Adam-Aghili

LGTM the bug I was seeing is also reproducable on release-1.9.0

Adds the 4 recommended test scenarios identified in Cristhianzl's review of PR #11891 (token usage tracking): - TestStreamingTokenAccumulation: verifies extract_usage_from_chunk() + accumulate_usage() correctly accumulates across multiple streaming chunks (OpenAI, Anthropic, and usage_metadata formats) - TestChatOutputTokenUsageAccumulation: verifies message_response() sets upstream token usage on the message and updates the stored message when applicable - TestAgentTokenCallbackWiring: verifies TokenUsageCallbackHandler is wired into run_agent() callbacks and its result is stored on _token_usage - TestResultDataResponseTokenUsageValidator: verifies the field_validator converts Usage Pydantic models to dicts and passes through None/dict values

…feature/le-374 # Conflicts: # .secrets.baseline # src/lfx/src/lfx/_assets/component_index.json

…feature/le-374

… feature/le-374

This reverts commit c618b12.

…er row Position the EditMessageButton toolbar using \`bottom-full\` instead of \`-top-4\` so it always sits fully above the message container. This prevents the button bar from overlapping the 'Finished in' usage/time row in bot messages.

…feature/le-374 # Conflicts: # .secrets.baseline

… feature/le-374 # Conflicts: # src/lfx/src/lfx/_assets/component_index.json

…ground - Wrap "Finished in" stat in a ShadTooltip showing last run time, duration, input/output token breakdown - Fix node status success background color from bg-success-background to bg-zinc-700

viktoravelino self-assigned this Feb 24, 2026

viktoravelino force-pushed the feature/le-374 branch from ad87a6d to 94090a5 Compare March 2, 2026 15:04

viktoravelino and others added 20 commits March 2, 2026 14:14

chore: add CLAUDE.local.md to .gitignore

14bec1b

chore: update starter project templates for token usage tracking

0fd72a4

[autofix.ci] apply automated fixes

f60bab3

[autofix.ci] apply automated fixes (attempt 2/3)

3a7820f

Merge branch 'feature/le-374' of github.com:langflow-ai/langflow into…

13ff6ff

… feature/le-374

Merge branch 'main' of github.com:langflow-ai/langflow into feature/l…

5d92a67

…e-374

[autofix.ci] apply automated fixes

e80c91a

[autofix.ci] apply automated fixes (attempt 2/3)

9866f36

Merge branch 'main' into feature/le-374

392179e

Merge branch 'main' of github.com:langflow-ai/langflow into feature/l…

28c5314

…e-374

Merge branch 'feature/le-374' of github.com:langflow-ai/langflow into…

55505e1

… feature/le-374

[autofix.ci] apply automated fixes

18eb2b2

feat: add validation for token_usage field in ResultDataResponse

3ce5ed8

Merge branch 'feature/le-374' of github.com:langflow-ai/langflow into…

78574d3

… feature/le-374

feat: enable stream_usage in OpenAI model tests

62284d5

Merge branch 'main' into feature/le-374

1cc6767

[autofix.ci] apply automated fixes

5f8a602

viktoravelino changed the title ~~Feature: le-374~~ feat: token usage tracking for LLM and Agent components Mar 3, 2026

viktoravelino changed the title ~~feat: token usage tracking for LLM and Agent components~~ feat: LE-374 token usage tracking for LLM and Agent components Mar 3, 2026

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 3, 2026

Cristhianzl approved these changes Mar 26, 2026

View reviewed changes

github-actions Bot added the lgtm This PR has been approved by a maintainer label Mar 26, 2026

Adam-Aghili approved these changes Mar 26, 2026

View reviewed changes