Skip to content

Improve trace loading logic#123

Merged
krisztianfekete merged 3 commits intomainfrom
feature/trace-loader-extensions
Apr 17, 2026
Merged

Improve trace loading logic#123
krisztianfekete merged 3 commits intomainfrom
feature/trace-loader-extensions

Conversation

@krisztianfekete
Copy link
Copy Markdown
Contributor

This PR adds the ability to evaluate traces without file uploads.

  • EvalParams base class in config.py that pulls out the evaluation-only settings (metrics, judge model, threshold, etc.) from EvalRunConfig. The existing config inherits from it so nothing breaks. Accepts both camelCase and snake_case for API consumers.
  • run_evaluation_from_traces() that takes pre-loaded Trace objects and an EvalParams directly, skipping all file I/O. run_evaluation() now delegates to it under the hood.
  • OtlpJsonLoader.load_from_dict() for parsing OTLP JSON from a dict instead of a file path. Also made _extract_attributes() smarter: it now handles flat dicts and nested dicts (auto-flattened to dot-notation) alongside the standard OTLP attribute arrays.
  • Two new endpoints: POST /evaluate/json and POST /evaluate/json/stream that accept traces as a JSON body instead of multipart form uploads. Same SSE event format as the existing streaming endpoint.

@krisztianfekete krisztianfekete changed the title Improve trace load logic Improve trace loading logic Apr 16, 2026
@krisztianfekete krisztianfekete requested a review from Copilot April 16, 2026 14:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the evaluation pipeline to support running evaluations on pre-loaded Trace objects and accepting OTLP traces via JSON request bodies, reducing reliance on file I/O and multipart uploads.

Changes:

  • Introduces EvalParams (evaluation-only config) and refactors the runner to evaluate from in-memory traces via run_evaluation_from_traces().
  • Adds OtlpJsonLoader.load_from_dict() and extends OTLP attribute extraction to handle flat/nested dict attribute formats (with dot-notation flattening).
  • Adds new API endpoints POST /evaluate/json and POST /evaluate/json/stream to evaluate OTLP traces provided as JSON.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/agentevals/runner.py Refactors evaluation to support pre-loaded traces; keeps file-based API delegating to the new core path.
src/agentevals/loader/otlp.py Adds dict-based OTLP loading and more flexible attribute extraction (including nested dict flattening).
src/agentevals/config.py Splits evaluation parameters into EvalParams and makes EvalRunConfig inherit from it; enables camelCase aliases.
src/agentevals/api/routes.py Adds JSON-body evaluation endpoints (sync + SSE) that bypass multipart uploads.
src/agentevals/api/models.py Adds EvaluateJsonRequest request model wiring JSON traces + config + optional eval set.
Comments suppressed due to low confidence (1)

src/agentevals/config.py:130

  • The JSON endpoints now rely on EvalParams validation, but threshold and metrics no longer get the input checks that /evaluate applies (non-empty metrics list, threshold in [0,1]). Consider moving those validations into EvalParams (field constraints/validators) so file-based and JSON-based evaluation behave consistently and return 4xx validation errors instead of failing later during evaluation.
    metrics: list[str] = Field(
        default_factory=lambda: ["tool_trajectory_avg_score"],
        description="List of built-in metric names to evaluate.",
    )

    custom_evaluators: list[CustomEvaluatorDef] = Field(
        default_factory=list,
        description="Custom evaluator definitions.",
    )

    judge_model: str | None = Field(
        default=None,
        description="LLM model for judge-based metrics.",
    )

    threshold: float | None = Field(
        default=None,
        description="Score threshold for pass/fail.",
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agentevals/loader/otlp.py
Comment thread src/agentevals/config.py
Comment thread src/agentevals/api/routes.py Outdated
Comment thread src/agentevals/api/models.py
Comment thread src/agentevals/api/routes.py Outdated
@krisztianfekete krisztianfekete force-pushed the feature/trace-loader-extensions branch from b4d3f32 to 93e3e52 Compare April 16, 2026 15:00
@krisztianfekete krisztianfekete requested a review from peterj April 16, 2026 15:00
@krisztianfekete krisztianfekete marked this pull request as ready for review April 16, 2026 15:00
Comment thread src/agentevals/api/routes.py
Comment thread src/agentevals/api/routes.py
Comment thread src/agentevals/api/routes.py Outdated
Comment thread src/agentevals/api/routes.py
Comment thread src/agentevals/api/routes.py
@krisztianfekete krisztianfekete requested a review from peterj April 17, 2026 07:13
@krisztianfekete krisztianfekete merged commit 572321b into main Apr 17, 2026
4 checks passed
@krisztianfekete krisztianfekete deleted the feature/trace-loader-extensions branch April 17, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants