Skip to content

Expose all tool_trajectory_avg_score match types#89

Merged
krisztianfekete merged 5 commits intoagentevals-dev:mainfrom
ossama-ferjani:Expose-all-tool_trajectory_avg_score-match-types
Apr 1, 2026
Merged

Expose all tool_trajectory_avg_score match types#89
krisztianfekete merged 5 commits intoagentevals-dev:mainfrom
ossama-ferjani:Expose-all-tool_trajectory_avg_score-match-types

Conversation

@ossama-ferjani
Copy link
Copy Markdown

fix #84 :
Expose all tool_trajectory_avg_score match types: EXACT, IN_ORDER, or ANY_ORDER.

Copy link
Copy Markdown
Contributor

@krisztianfekete krisztianfekete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, added some comments!
Can you also add tests please?

Comment thread samples/eval_set_multi_tool.json Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the synthetic JSON files from sample, and add tests instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed please check test format!

Comment thread src/agentevals/builtin_metrics.py Outdated
if match_type
else ToolTrajectoryCriterion.MatchType.EXACT
)
criterion = ToolTrajectoryCriterion(threshold=effective_threshold, matchType=_match)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this consistent with all other ADK objects in this file that are constructed with snake_case kwargs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/agentevals/cli.py
)
@click.option(
"--trajectory-match-type",
type=click.Choice(["EXACT", "IN_ORDER", "ANY_ORDER"], case_sensitive=False),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please perform the same validation at all the layers this can be a problem in the codebase?

Comment thread src/agentevals/runner.py Outdated
threshold: float | None,
eval_semaphore: asyncio.Semaphore,
trajectory_match_type: str | None = None,
eval_semaphore: asyncio.Semaphore = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add trajectory_match_type: str | None = None after the existing optional parameters instead or reordering or changing the signature?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! let's keep order

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't have to change the semaphore anymore.

Comment thread ui/src/context/TraceProvider.tsx Outdated
metrics: state.selectedMetrics,
judgeModel: state.judgeModel,
threshold: state.threshold,
trajectoryMatchType: state.trajectoryMatchType !== 'EXACT' ? state.trajectoryMatchType : undefined,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's always send this to avoid breaking this if we change what's the default on the backend side.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fixed this makes sense!

@ossama-ferjani
Copy link
Copy Markdown
Author

@krisztianfekete please let me know if there are other fixes I can do here!

Comment thread src/agentevals/runner.py Outdated
threshold: float | None,
eval_semaphore: asyncio.Semaphore,
trajectory_match_type: str | None = None,
eval_semaphore: asyncio.Semaphore = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't have to change the semaphore anymore.

Comment thread src/agentevals/api/streaming_routes.py Outdated
eval_set_id: str
metrics: list[str] = ["tool_trajectory_avg_score"]
judge_model: str = "gemini-2.5-flash"
trajectory_match_type: str | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a literal here?

@krisztianfekete krisztianfekete merged commit 22bbedf into agentevals-dev:main Apr 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose all tool_trajectory_avg_score match types

3 participants