feat: LE-374 token usage tracking for LLM and Agent components#11891
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR introduces comprehensive token usage tracking throughout Langflow. It adds Changes
Sequence DiagramsequenceDiagram
actor User
participant FrontendChat as Frontend Chat
participant Agent as Agent
participant LLMModel as LLM Model
participant TokenCallback as Token Callback
participant ChatOutput as ChatOutput
participant MessageStore as Message Store
participant FrontendDisplay as Frontend Display
User->>FrontendChat: Send message
FrontendChat->>Agent: run_agent()
Agent->>TokenCallback: Create handler & register
Agent->>LLMModel: Invoke LLM with stream_usage=True
LLMModel-->>LLMModel: Stream tokens
LLMModel->>TokenCallback: on_llm_end(response)
TokenCallback->>TokenCallback: Extract usage (multi-strategy)
TokenCallback->>TokenCallback: Accumulate tokens
Agent->>ChatOutput: Process message
ChatOutput->>TokenCallback: get_usage()
TokenCallback-->>ChatOutput: Return accumulated Usage
ChatOutput->>ChatOutput: _accumulate_upstream_token_usage()
ChatOutput->>ChatOutput: Assign to message.properties.usage
ChatOutput->>MessageStore: Store message with usage
Agent->>Agent: Retrieve usage from handler
Agent->>Agent: Assign to result.properties.usage
Agent-->>FrontendChat: Return result with token_usage
FrontendChat->>FrontendDisplay: Display token count
FrontendDisplay-->>User: Show tokens (K/M formatted)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is ❌ Your project status has failed because the head coverage (45.49%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.9.0 #11891 +/- ##
=================================================
+ Coverage 48.24% 48.43% +0.18%
=================================================
Files 1869 1874 +5
Lines 163692 164154 +462
Branches 22596 24005 +1409
=================================================
+ Hits 78975 79504 +529
+ Misses 83826 83741 -85
- Partials 891 909 +18
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Track input/output/total tokens across LLM providers (OpenAI, Anthropic, Ollama) and display them on both node badges and chat messages. Backend: thread-safe callback handler for agent token accumulation, usage_metadata extraction for Ollama/LangChain standard, pipeline integration from component through vertex to API response. Frontend: token count formatting utility, Coins icon badge on nodes with tooltip breakdown, chat message status with token display.
ad87a6d to
94090a5
Compare
Add upstream token usage accumulation so chat messages display the total tokens from all LLMs in the pipeline, not just the last one. Output vertex node badges hide token counts since the accumulated total is shown on the chat message instead.
Enable stream_usage=True on OpenAI and Anthropic model constructors so the API includes token counts in streaming chunks. Fix _handle_stream to propagate the AIMessage back to _get_chat_result when not connected to a chat output, so usage can be extracted from the invoke fallback path. Accumulate usage across multiple streaming chunks instead of overwriting, since Anthropic splits input/output tokens across separate events.
Extract duplicated token usage logic from Component, LCModelComponent, TokenUsageCallbackHandler, and Vertex into a shared lfx.schema.token_usage module. Replace loose dict typing with the existing Usage Pydantic model throughout the token tracking pipeline. Declare _token_usage on Component __init__ instead of dynamically injecting it.
Adam-Aghili
left a comment
There was a problem hiding this comment.
LGTM the bug I was seeing is also reproducable on release-1.9.0
Adds the 4 recommended test scenarios identified in Cristhianzl's review of PR #11891 (token usage tracking): - TestStreamingTokenAccumulation: verifies extract_usage_from_chunk() + accumulate_usage() correctly accumulates across multiple streaming chunks (OpenAI, Anthropic, and usage_metadata formats) - TestChatOutputTokenUsageAccumulation: verifies message_response() sets upstream token usage on the message and updates the stored message when applicable - TestAgentTokenCallbackWiring: verifies TokenUsageCallbackHandler is wired into run_agent() callbacks and its result is stored on _token_usage - TestResultDataResponseTokenUsageValidator: verifies the field_validator converts Usage Pydantic models to dicts and passes through None/dict values
…feature/le-374 # Conflicts: # .secrets.baseline # src/lfx/src/lfx/_assets/component_index.json
This reverts commit c618b12.
…er row Position the EditMessageButton toolbar using \`bottom-full\` instead of \`-top-4\` so it always sits fully above the message container. This prevents the button bar from overlapping the 'Finished in' usage/time row in bot messages.
…feature/le-374 # Conflicts: # .secrets.baseline
… feature/le-374 # Conflicts: # src/lfx/src/lfx/_assets/component_index.json
…ground - Wrap "Finished in" stat in a ShadTooltip showing last run time, duration, input/output token breakdown - Fix node status success background color from bg-success-background to bg-zinc-700
Jira: LE-374
Summary
usage_metadatastandard)TokenUsageCallbackHandlerthat accumulates tokens across the multiple LLM calls agents make per runstream_usage=TrueDetails
Backend
Token capture:
TokenUsageCallbackHandler— a newBaseCallbackHandlerthat accumulates tokens across agent LLM calls. Supports 4 extraction strategies:llm_output["token_usage"](OpenAI legacy),usage_metadata(LangChain standard / Ollama),response_metadata["token_usage"](OpenAI via LC),response_metadata["usage"](Anthropic)LCModelComponent.extract_usage()— extended to checkmessage.usage_metadatafirst (LangChain standard), fixing Ollama support. Falls back to OpenAI and Anthropicresponse_metadataformatstoken_usageschema modulePipeline integration:
agent.py— wiresTokenUsageCallbackHandlerinto agent execution callbacks, stores usage onMessage.properties.usageafter run completesmodel.py— stores_token_usageon the component for both streaming and non-streaming paths;_handle_streamnow propagates theAIMessageback so usage can be extracted even when not connected to a chat outputvertex/base.py— extracts_token_usagefrom custom component duringfinalize_build(), accumulates upstream token usage for output verticesschema.py/schemas.py— carriestoken_usagethroughResultData→ResultDataResponse→ API response with validationchat_output.py— accumulates token usage from all upstream LLM vertices into the chat messageStreaming token usage:
openai_chat_model.py— setsstream_usage=TrueonChatOpenAIso the API includes token counts in streaming chunksanthropic.py— setsstream_usage=Trueexplicitly onChatAnthropic(was default, now explicit for safety)component.py— new_accumulate_usage()method handles providers that split usage across multiple streaming chunks (e.g., Anthropic sendsinput_tokensonmessage_startandoutput_tokensonmessage_delta); both sync and async iterator paths now accumulate instead of overwritingFrontend
Types & utilities:
UsageTypeadded toPropertiesTypein chat typestoken_usageadded toVertexDataTypeAPIformatTokenCount()utility — formats counts as"500","1.5K","2.5M"Node UI:
|+ duration (e.g.,🪙 2.5K | 3.2s)Chat messages:
Finishedon the left,🪙 2.5K | 3.2son the rightfont-mono text-accent-emerald-foregroundstylingTest plan
Automated tests
cd src/lfx && uv sync && uv run pytest tests/unit/base/agents/test_token_callback.py tests/unit/graph/test_token_usage_accumulation.py tests/unit/custom/custom_component/test_accumulate_usage.py tests/unit/schema/test_token_usage.py -vcd src/frontend && npx vitest run src/utils/__tests__/format-token-count.test.tsManual E2E testing
Test 1 — Language Model (non-streaming):
Chat Input → OpenAI → Chat Output🪙 <count> | <time>and tooltip shows input/output breakdownFinishedon the left,🪙 <count> | <time>on the rightTest 2 — Language Model (streaming):
Test 3 — Agent component:
Chat Input → Agent (with tools) → Chat OutputTest 4 — Agent without Chat Output:
Chat Input → Agent(no Chat Output connected)Test 5 — Ollama provider:
usage_metadatapath)Test 6 — Anthropic provider:
Test 7 — Serial LLMs:
Chat Input → OpenAI → OpenAI → Chat OutputSummary by CodeRabbit
New Features
Tests