Consolidate text extraction into a single source of truth#79
Consolidate text extraction into a single source of truth#79krisztianfekete merged 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR consolidates user/assistant text extraction logic by routing converters through shared “extract from span attributes” helpers, aiming to reduce duplicated parsing across ADK and GenAI trace formats.
Changes:
- Updated GenAI converter to use
extract_user_text_from_attrs()/extract_agent_response_from_attrs()instead of inline message parsing. - Updated ADK converter to prefer the shared extraction helpers before falling back to direct
llm_request/llm_responseparsing. - Adjusted shared extraction to prefer the last assistant message in
gen_ai.output.messages.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/agentevals/genai_converter.py |
Replaces inline user/assistant message parsing with shared extraction helpers. |
src/agentevals/extraction.py |
Tweaks agent-response extraction to scan output messages in reverse (prefer latest). |
src/agentevals/converter.py |
Uses shared extraction helpers for ADK conversion prior to legacy parsing fallbacks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _extract_user_content(first_call_llm: Span) -> genai_types.Content: | ||
| """Extract user input from the first call_llm span's llm_request tag.""" | ||
| text = extract_user_text_from_attrs(first_call_llm.tags) | ||
| if text: | ||
| return genai_types.Content( | ||
| role="user", | ||
| parts=[genai_types.Part(text=text)], | ||
| ) |
There was a problem hiding this comment.
The docstring says user input is extracted from the first call_llm span’s llm_request tag, but the function now first uses extract_user_text_from_attrs() (which may read either ADK llm_request or gen_ai.input.messages). Please update the docstring to reflect the new “span attributes / shared extractor” behavior to avoid misleading future readers.
| messages_raw = attrs.get(OTEL_GENAI_OUTPUT_MESSAGES) | ||
| if messages_raw: | ||
| messages = parse_json_attr(messages_raw, "gen_ai.output.messages") | ||
| if isinstance(messages, list): | ||
| for msg in messages: | ||
| for msg in reversed(messages): | ||
| if isinstance(msg, dict) and msg.get("role") in ASSISTANT_ROLES: | ||
| text = extract_text_from_message(msg) | ||
| if text: |
There was a problem hiding this comment.
extract_agent_response_from_attrs() now iterates over gen_ai.output.messages in reverse order (preferring the last assistant/model message). There isn’t a unit test covering the multi-message case in tests/test_extraction.py, so this behavior change could regress unnoticed. Please add a test asserting the last assistant message with text is selected when multiple assistant messages are present.
| text = extract_user_text_from_attrs(llm_span.tags) | ||
| if text: | ||
| return text | ||
| raise ValueError(f"LLM span {llm_span.span_id}: no user message found in span attributes") |
There was a problem hiding this comment.
The raised ValueError here is now fairly generic (“no user message found in span attributes”), but extraction checks multiple possible sources (e.g., gen_ai.input.messages and ADK llm_request). Consider including the specific attribute keys checked (or at least gen_ai.input.messages) in the error message to make trace/debugging failures more actionable.
| raise ValueError(f"LLM span {llm_span.span_id}: no user message found in span attributes") | |
| raise ValueError( | |
| f"LLM span {llm_span.span_id}: no user message found in span attributes; " | |
| f"expected user message in attribute {OTEL_GENAI_INPUT_MESSAGES!r}" | |
| ) |
As title says.