Skip to content

Consolidate duplicated tool call extractions#80

Merged
krisztianfekete merged 2 commits intomainfrom
chore/cleanup-tool-extraction
Mar 31, 2026
Merged

Consolidate duplicated tool call extractions#80
krisztianfekete merged 2 commits intomainfrom
chore/cleanup-tool-extraction

Conversation

@krisztianfekete
Copy link
Copy Markdown
Contributor

As title says.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates duplicated tool-call/tool-result extraction logic by introducing span-level wrapper helpers in extraction.py and migrating the existing converters to use the shared extraction path.

Changes:

  • Added extract_tool_call_from_span() / extract_tool_result_from_span() convenience wrappers to reuse the existing attribute-based extractors.
  • Updated src/agentevals/genai_converter.py to use shared extraction for tool calls/results and reduce local parsing logic.
  • Updated src/agentevals/converter.py to use the shared extraction helpers for tool spans.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/agentevals/genai_converter.py Replaces inline tool call/result parsing with shared span-level extractors and keeps a local dedup strategy.
src/agentevals/extraction.py Adds span-level wrapper functions delegating to existing attribute-based tool extraction.
src/agentevals/converter.py Switches ADK tool span extraction to use the shared span-level extractors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agentevals/genai_converter.py Outdated
Comment on lines 390 to 397
# Use the real semconv tool_call_id for dedup (None when not present on
# the span), rather than the shared function's span_id/"unknown" fallback.
real_tool_call_id = tool_span.get_tag(OTEL_GENAI_TOOL_CALL_ID)

args_raw = tool_span.get_tag(OTEL_GENAI_TOOL_CALL_ARGUMENTS, "{}")
args = parse_json_attr(args_raw, "gen_ai.tool.call.arguments")
if not isinstance(args, dict):
args = {}

if not args:
input_msgs_raw = tool_span.get_tag(OTEL_GENAI_INPUT_MESSAGES)
if input_msgs_raw:
args, _ = extract_tool_call_args_from_messages(input_msgs_raw, tool_name)

tc = _ToolCall(name=tool_name, args=args, id=tool_call_id)
if tool_call_id:
tool_calls_by_id[tool_call_id] = tc
tc = _ToolCall(name=tool_call["name"], args=tool_call["args"], id=real_tool_call_id)
if real_tool_call_id:
tool_calls_by_id[real_tool_call_id] = tc
else:
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real_tool_call_id is taken only from the semconv tag and then used for both the _ToolCall.id and response IDs. This drops the tool_call["id"] value returned by extract_tool_call_from_span(), which can include a real tool_call_id extracted from gen_ai.input.messages when the span tag is missing. In that case you’ll end up with (a) a no-id tool call from the tool span and (b) an id’d tool call from the LLM message, preventing dedup/linking. Consider using a single dedup_id that prefers the span tag when present, else falls back to tool_call["id"] when it’s not a placeholder, and use that consistently for both _ToolCall.id and _ToolResponse.id.

Copilot uses AI. Check for mistakes.
Comment on lines +400 to 409
tool_result = extract_tool_result_from_span(tool_span)
if tool_result:
logger.debug(f"Tool {tool_call['name']} result: {str(tool_result['response'])[:100]}")
tool_responses.append(
_ToolResponse(
name=tool_name,
response=result_data,
id=tool_call_id,
name=tool_call["name"],
response=tool_result["response"],
id=real_tool_call_id,
)
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to extract_tool_result_from_span() changes behavior for empty JSON tool results. extract_tool_result_from_attrs() currently returns None when the parsed response is an empty dict (because it checks if parsed:), so a tool span with gen_ai.tool.call.result == "{}" (or an empty dict response) will now produce no _ToolResponse, whereas the previous code would still record an empty response. Consider adjusting the shared extractor to treat the presence of the raw attribute as a result even if the parsed dict is empty, or otherwise ensure empty-but-present results aren’t dropped here.

Copilot uses AI. Check for mistakes.
Comment on lines +235 to 242
tool_result = extract_tool_result_from_span(tool_span)
fr = None
if response_raw:
response_data = parse_json(response_raw)
# Response format varies: MCP uses {"content": [...], "isError": false},
# other tools return flat dicts. We pass through as-is.
if tool_result:
fr = genai_types.FunctionResponse(
name=tool_name,
response=response_data if response_data else {},
id=tool_call_id,
name=tool_call["name"],
response=tool_result["response"],
id=tool_call["id"],
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_tool_result_from_span() can currently return None for an empty-but-present tool response (e.g. raw attribute value "{}"), because extract_tool_result_from_attrs() treats an empty parsed dict as falsy. This means _extract_from_tool_span() may omit a FunctionResponse that would previously have been emitted whenever gcp.vertex.agent.tool_response was present. Consider fixing the shared extractor (preferred) or handling the empty-dict case so ADK conversions don’t silently drop tool responses.

Copilot uses AI. Check for mistakes.
@krisztianfekete krisztianfekete merged commit b3446cf into main Mar 31, 2026
4 checks passed
@krisztianfekete krisztianfekete deleted the chore/cleanup-tool-extraction branch March 31, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants