feat: display llm usage data#784
Conversation
|
LF PR approved awaiting merge! |
edwinjosechittilappilly
left a comment
There was a problem hiding this comment.
LGTM,
I believe we can merge this PR and the funtionality will ne enabled once the usuage Data is available in Responses API in LF after LF upgrade.
There was a problem hiding this comment.
Pull request overview
Adds end-to-end support for exposing and displaying LLM token usage (from response.completed / Responses API usage payloads) across streaming, persisted chat history, and UI rendering.
Changes:
- Backend: capture
usagefrom streamedresponse.completedevents, persist it on assistant messages, and expose it via the v1 chat GET endpoint. - Frontend: introduce a
TokenUsagetype, capture usage from streaming events, and render token usage in assistant messages.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/api/v1/chat.py | Extends v1 conversation response mapping to include per-message token usage when available. |
| src/agent.py | Captures usage from response.completed during streaming; persists usage into message response_data; adds logging. |
| frontend/hooks/useChatStreaming.ts | Captures response.completed usage during streaming and attaches it to the final assistant message. |
| frontend/app/chat/page.tsx | Extracts usage from historical response_data, passes usage into AssistantMessage, and sets usage on non-streaming results. |
| frontend/app/chat/_types/types.ts | Adds TokenUsage and Message.usage typing. |
| frontend/app/chat/_components/token-usage.tsx | New UI component to display token usage. |
| frontend/app/chat/_components/assistant-message.tsx | Renders TokenUsage for completed (non-streaming) assistant messages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (msg.response_data && typeof msg.response_data === "object") { | ||
| const responseData = | ||
| typeof msg.response_data === "string" | ||
| ? JSON.parse(msg.response_data) | ||
| : msg.response_data; | ||
| if (responseData.usage) { | ||
| message.usage = responseData.usage; | ||
| } | ||
| } |
There was a problem hiding this comment.
The current guard if (msg.response_data && typeof msg.response_data === "object") makes the subsequent typeof msg.response_data === "string" ? JSON.parse(...) branch unreachable, so usage will never be extracted when response_data is actually a string. Also, JSON.parse here can throw and break conversation loading if response_data is non-JSON. Consider widening the guard to accept string | object and wrapping parsing in a try/catch (or a small safe-parse helper) before reading .usage.
| }) | ||
| } | ||
| # Include token usage if available (from Responses API) | ||
| usage = msg.get("response_data", {}).get("usage") if isinstance(msg.get("response_data"), dict) else None |
There was a problem hiding this comment.
response_data from Langflow/history may be serialized as a JSON string (the frontend already treats it as possibly-string). This code only extracts usage when response_data is a dict, so usage will be silently omitted for string payloads. Consider normalizing response_data once (e.g., parse JSON strings when possible) and then reading usage from the normalized object; also avoid calling msg.get("response_data") multiple times in the same expression for clarity.
| usage = msg.get("response_data", {}).get("usage") if isinstance(msg.get("response_data"), dict) else None | |
| response_data = msg.get("response_data") | |
| if isinstance(response_data, str): | |
| try: | |
| response_data = json.loads(response_data) | |
| except Exception: | |
| # If parsing fails, leave response_data as-is (usage will be omitted) | |
| response_data = None | |
| usage = response_data.get("usage") if isinstance(response_data, dict) else None |
| # Detect response.completed event and log usage | ||
| if isinstance(chunk_data, dict) and chunk_data.get("type") == "response.completed": | ||
| response_data = chunk_data.get("response", {}) | ||
| usage = response_data.get("usage") | ||
| if usage: | ||
| logger.info( | ||
| "Stream usage data", | ||
| input_tokens=usage.get("input_tokens"), | ||
| output_tokens=usage.get("output_tokens"), | ||
| total_tokens=usage.get("total_tokens"), | ||
| ) |
There was a problem hiding this comment.
Logging token usage at info level for every streamed response.completed event can generate a lot of log volume in production and may be inconsistent with nearby per-chunk logging (which is debug). Consider lowering this to debug (or gating behind a feature flag / sampling) to reduce operational noise while still allowing investigation when needed.
| # Capture usage from response.completed event | ||
| if chunk_data.get("type") == "response.completed": | ||
| response_obj = chunk_data.get("response", {}) | ||
| usage_data = response_obj.get("usage") | ||
| except: | ||
| pass |
There was a problem hiding this comment.
The bare except: pass here will also swallow asyncio.CancelledError and any unexpected decoding/parsing errors, which can break cooperative cancellation and make stream issues extremely hard to diagnose. Prefer except Exception as e with at least a debug log, and let CancelledError propagate.
| except: | ||
| pass |
There was a problem hiding this comment.
Same issue as the other stream: except: pass will swallow asyncio.CancelledError and hide JSON decoding problems, which can lead to stuck/cancel-ignoring requests and makes debugging difficult. Prefer catching Exception (and logging) while allowing cancellation to propagate.
| except: | |
| pass | |
| except Exception as e: | |
| logger.warning(f"Failed to parse langflow chunk: {e}") |
Depends on an updated langflow responses endpoint:
langflow-ai/langflow#11302