Consolidate duplicated tool call extractions#80
Conversation
There was a problem hiding this comment.
Pull request overview
This PR consolidates duplicated tool-call/tool-result extraction logic by introducing span-level wrapper helpers in extraction.py and migrating the existing converters to use the shared extraction path.
Changes:
- Added
extract_tool_call_from_span()/extract_tool_result_from_span()convenience wrappers to reuse the existing attribute-based extractors. - Updated
src/agentevals/genai_converter.pyto use shared extraction for tool calls/results and reduce local parsing logic. - Updated
src/agentevals/converter.pyto use the shared extraction helpers for tool spans.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/agentevals/genai_converter.py | Replaces inline tool call/result parsing with shared span-level extractors and keeps a local dedup strategy. |
| src/agentevals/extraction.py | Adds span-level wrapper functions delegating to existing attribute-based tool extraction. |
| src/agentevals/converter.py | Switches ADK tool span extraction to use the shared span-level extractors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Use the real semconv tool_call_id for dedup (None when not present on | ||
| # the span), rather than the shared function's span_id/"unknown" fallback. | ||
| real_tool_call_id = tool_span.get_tag(OTEL_GENAI_TOOL_CALL_ID) | ||
|
|
||
| args_raw = tool_span.get_tag(OTEL_GENAI_TOOL_CALL_ARGUMENTS, "{}") | ||
| args = parse_json_attr(args_raw, "gen_ai.tool.call.arguments") | ||
| if not isinstance(args, dict): | ||
| args = {} | ||
|
|
||
| if not args: | ||
| input_msgs_raw = tool_span.get_tag(OTEL_GENAI_INPUT_MESSAGES) | ||
| if input_msgs_raw: | ||
| args, _ = extract_tool_call_args_from_messages(input_msgs_raw, tool_name) | ||
|
|
||
| tc = _ToolCall(name=tool_name, args=args, id=tool_call_id) | ||
| if tool_call_id: | ||
| tool_calls_by_id[tool_call_id] = tc | ||
| tc = _ToolCall(name=tool_call["name"], args=tool_call["args"], id=real_tool_call_id) | ||
| if real_tool_call_id: | ||
| tool_calls_by_id[real_tool_call_id] = tc | ||
| else: |
There was a problem hiding this comment.
real_tool_call_id is taken only from the semconv tag and then used for both the _ToolCall.id and response IDs. This drops the tool_call["id"] value returned by extract_tool_call_from_span(), which can include a real tool_call_id extracted from gen_ai.input.messages when the span tag is missing. In that case you’ll end up with (a) a no-id tool call from the tool span and (b) an id’d tool call from the LLM message, preventing dedup/linking. Consider using a single dedup_id that prefers the span tag when present, else falls back to tool_call["id"] when it’s not a placeholder, and use that consistently for both _ToolCall.id and _ToolResponse.id.
| tool_result = extract_tool_result_from_span(tool_span) | ||
| if tool_result: | ||
| logger.debug(f"Tool {tool_call['name']} result: {str(tool_result['response'])[:100]}") | ||
| tool_responses.append( | ||
| _ToolResponse( | ||
| name=tool_name, | ||
| response=result_data, | ||
| id=tool_call_id, | ||
| name=tool_call["name"], | ||
| response=tool_result["response"], | ||
| id=real_tool_call_id, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Switching to extract_tool_result_from_span() changes behavior for empty JSON tool results. extract_tool_result_from_attrs() currently returns None when the parsed response is an empty dict (because it checks if parsed:), so a tool span with gen_ai.tool.call.result == "{}" (or an empty dict response) will now produce no _ToolResponse, whereas the previous code would still record an empty response. Consider adjusting the shared extractor to treat the presence of the raw attribute as a result even if the parsed dict is empty, or otherwise ensure empty-but-present results aren’t dropped here.
| tool_result = extract_tool_result_from_span(tool_span) | ||
| fr = None | ||
| if response_raw: | ||
| response_data = parse_json(response_raw) | ||
| # Response format varies: MCP uses {"content": [...], "isError": false}, | ||
| # other tools return flat dicts. We pass through as-is. | ||
| if tool_result: | ||
| fr = genai_types.FunctionResponse( | ||
| name=tool_name, | ||
| response=response_data if response_data else {}, | ||
| id=tool_call_id, | ||
| name=tool_call["name"], | ||
| response=tool_result["response"], | ||
| id=tool_call["id"], | ||
| ) |
There was a problem hiding this comment.
extract_tool_result_from_span() can currently return None for an empty-but-present tool response (e.g. raw attribute value "{}"), because extract_tool_result_from_attrs() treats an empty parsed dict as falsy. This means _extract_from_tool_span() may omit a FunctionResponse that would previously have been emitted whenever gcp.vertex.agent.tool_response was present. Consider fixing the shared extractor (preferred) or handling the empty-dict case so ADK conversions don’t silently drop tool responses.
As title says.