Improve trace loading logic by krisztianfekete · Pull Request #123 · agentevals-dev/agentevals

krisztianfekete · 2026-04-16T14:42:44Z

This PR adds the ability to evaluate traces without file uploads.

EvalParams base class in config.py that pulls out the evaluation-only settings (metrics, judge model, threshold, etc.) from EvalRunConfig. The existing config inherits from it so nothing breaks. Accepts both camelCase and snake_case for API consumers.
run_evaluation_from_traces() that takes pre-loaded Trace objects and an EvalParams directly, skipping all file I/O. run_evaluation() now delegates to it under the hood.
OtlpJsonLoader.load_from_dict() for parsing OTLP JSON from a dict instead of a file path. Also made _extract_attributes() smarter: it now handles flat dicts and nested dicts (auto-flattened to dot-notation) alongside the standard OTLP attribute arrays.
Two new endpoints: POST /evaluate/json and POST /evaluate/json/stream that accept traces as a JSON body instead of multipart form uploads. Same SSE event format as the existing streaming endpoint.

Copilot

Pull request overview

This PR expands the evaluation pipeline to support running evaluations on pre-loaded Trace objects and accepting OTLP traces via JSON request bodies, reducing reliance on file I/O and multipart uploads.

Changes:

Introduces EvalParams (evaluation-only config) and refactors the runner to evaluate from in-memory traces via run_evaluation_from_traces().
Adds OtlpJsonLoader.load_from_dict() and extends OTLP attribute extraction to handle flat/nested dict attribute formats (with dot-notation flattening).
Adds new API endpoints POST /evaluate/json and POST /evaluate/json/stream to evaluate OTLP traces provided as JSON.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/agentevals/runner.py	Refactors evaluation to support pre-loaded traces; keeps file-based API delegating to the new core path.
src/agentevals/loader/otlp.py	Adds dict-based OTLP loading and more flexible attribute extraction (including nested dict flattening).
src/agentevals/config.py	Splits evaluation parameters into `EvalParams` and makes `EvalRunConfig` inherit from it; enables camelCase aliases.
src/agentevals/api/routes.py	Adds JSON-body evaluation endpoints (sync + SSE) that bypass multipart uploads.
src/agentevals/api/models.py	Adds `EvaluateJsonRequest` request model wiring JSON traces + config + optional eval set.

Comments suppressed due to low confidence (1)

src/agentevals/config.py:130

The JSON endpoints now rely on EvalParams validation, but threshold and metrics no longer get the input checks that /evaluate applies (non-empty metrics list, threshold in [0,1]). Consider moving those validations into EvalParams (field constraints/validators) so file-based and JSON-based evaluation behave consistently and return 4xx validation errors instead of failing later during evaluation.

    metrics: list[str] = Field(
        default_factory=lambda: ["tool_trajectory_avg_score"],
        description="List of built-in metric names to evaluate.",
    )

    custom_evaluators: list[CustomEvaluatorDef] = Field(
        default_factory=list,
        description="Custom evaluator definitions.",
    )

    judge_model: str | None = Field(
        default=None,
        description="LLM model for judge-based metrics.",
    )

    threshold: float | None = Field(
        default=None,
        description="Score threshold for pass/fail.",
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

improve trace load logic

3fcfafc

krisztianfekete changed the title ~~Improve trace load logic~~ Improve trace loading logic Apr 16, 2026

krisztianfekete requested a review from Copilot April 16, 2026 14:42

Copilot started reviewing on behalf of krisztianfekete April 16, 2026 14:43 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread src/agentevals/loader/otlp.py

Comment thread src/agentevals/config.py

Comment thread src/agentevals/api/routes.py Outdated

Comment thread src/agentevals/api/models.py

Comment thread src/agentevals/api/routes.py Outdated

address review comments

93e3e52

krisztianfekete force-pushed the feature/trace-loader-extensions branch from b4d3f32 to 93e3e52 Compare April 16, 2026 15:00

krisztianfekete requested a review from peterj April 16, 2026 15:00

krisztianfekete marked this pull request as ready for review April 16, 2026 15:00