Consolidate duplicated tool call extractions by krisztianfekete · Pull Request #80 · agentevals-dev/agentevals

krisztianfekete · 2026-03-31T11:00:14Z

As title says.

Copilot

Pull request overview

This PR consolidates duplicated tool-call/tool-result extraction logic by introducing span-level wrapper helpers in extraction.py and migrating the existing converters to use the shared extraction path.

Changes:

Added extract_tool_call_from_span() / extract_tool_result_from_span() convenience wrappers to reuse the existing attribute-based extractors.
Updated src/agentevals/genai_converter.py to use shared extraction for tool calls/results and reduce local parsing logic.
Updated src/agentevals/converter.py to use the shared extraction helpers for tool spans.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
src/agentevals/genai_converter.py	Replaces inline tool call/result parsing with shared span-level extractors and keeps a local dedup strategy.
src/agentevals/extraction.py	Adds span-level wrapper functions delegating to existing attribute-based tool extraction.
src/agentevals/converter.py	Switches ADK tool span extraction to use the shared span-level extractors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T11:05:46Z

+        # Use the real semconv tool_call_id for dedup (None when not present on
+        # the span), rather than the shared function's span_id/"unknown" fallback.
+        real_tool_call_id = tool_span.get_tag(OTEL_GENAI_TOOL_CALL_ID)

-        args_raw = tool_span.get_tag(OTEL_GENAI_TOOL_CALL_ARGUMENTS, "{}")
-        args = parse_json_attr(args_raw, "gen_ai.tool.call.arguments")
-        if not isinstance(args, dict):
-            args = {}
-
-        if not args:
-            input_msgs_raw = tool_span.get_tag(OTEL_GENAI_INPUT_MESSAGES)
-            if input_msgs_raw:
-                args, _ = extract_tool_call_args_from_messages(input_msgs_raw, tool_name)
-
-        tc = _ToolCall(name=tool_name, args=args, id=tool_call_id)
-        if tool_call_id:
-            tool_calls_by_id[tool_call_id] = tc
+        tc = _ToolCall(name=tool_call["name"], args=tool_call["args"], id=real_tool_call_id)
+        if real_tool_call_id:
+            tool_calls_by_id[real_tool_call_id] = tc
        else:


real_tool_call_id is taken only from the semconv tag and then used for both the _ToolCall.id and response IDs. This drops the tool_call["id"] value returned by extract_tool_call_from_span(), which can include a real tool_call_id extracted from gen_ai.input.messages when the span tag is missing. In that case you’ll end up with (a) a no-id tool call from the tool span and (b) an id’d tool call from the LLM message, preventing dedup/linking. Consider using a single dedup_id that prefers the span tag when present, else falls back to tool_call["id"] when it’s not a placeholder, and use that consistently for both _ToolCall.id and _ToolResponse.id.

Copilot · 2026-03-31T11:05:46Z

+        tool_result = extract_tool_result_from_span(tool_span)
+        if tool_result:
+            logger.debug(f"Tool {tool_call['name']} result: {str(tool_result['response'])[:100]}")
            tool_responses.append(
                _ToolResponse(
-                    name=tool_name,
-                    response=result_data,
-                    id=tool_call_id,
+                    name=tool_call["name"],
+                    response=tool_result["response"],
+                    id=real_tool_call_id,
                )
            )


Switching to extract_tool_result_from_span() changes behavior for empty JSON tool results. extract_tool_result_from_attrs() currently returns None when the parsed response is an empty dict (because it checks if parsed:), so a tool span with gen_ai.tool.call.result == "{}" (or an empty dict response) will now produce no _ToolResponse, whereas the previous code would still record an empty response. Consider adjusting the shared extractor to treat the presence of the raw attribute as a result even if the parsed dict is empty, or otherwise ensure empty-but-present results aren’t dropped here.

Copilot · 2026-03-31T11:05:46Z

+    tool_result = extract_tool_result_from_span(tool_span)
    fr = None
-    if response_raw:
-        response_data = parse_json(response_raw)
-        # Response format varies: MCP uses {"content": [...], "isError": false},
-        # other tools return flat dicts. We pass through as-is.
+    if tool_result:
        fr = genai_types.FunctionResponse(
-            name=tool_name,
-            response=response_data if response_data else {},
-            id=tool_call_id,
+            name=tool_call["name"],
+            response=tool_result["response"],
+            id=tool_call["id"],
        )


extract_tool_result_from_span() can currently return None for an empty-but-present tool response (e.g. raw attribute value "{}"), because extract_tool_result_from_attrs() treats an empty parsed dict as falsy. This means _extract_from_tool_span() may omit a FunctionResponse that would previously have been emitted whenever gcp.vertex.agent.tool_response was present. Consider fixing the shared extractor (preferred) or handling the empty-dict case so ADK conversions don’t silently drop tool responses.

consolidate duplicated tool call extractions

e2f1333

krisztianfekete requested a review from Copilot March 31, 2026 11:00

Copilot started reviewing on behalf of krisztianfekete March 31, 2026 11:00 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

address review comments

6109dc5

krisztianfekete merged commit b3446cf into main Mar 31, 2026
4 checks passed

krisztianfekete deleted the chore/cleanup-tool-extraction branch March 31, 2026 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate duplicated tool call extractions#80

Consolidate duplicated tool call extractions#80
krisztianfekete merged 2 commits intomainfrom
chore/cleanup-tool-extraction

krisztianfekete commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krisztianfekete commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants