Skip to content

feature: add evaluator metadata support to V2 API#393

Open
uros-ivetic wants to merge 10 commits intomainfrom
uros/epd-2243-allow-v2-to-receive-and-display-custom-metadata
Open

feature: add evaluator metadata support to V2 API#393
uros-ivetic wants to merge 10 commits intomainfrom
uros/epd-2243-allow-v2-to-receive-and-display-custom-metadata

Conversation

@uros-ivetic
Copy link
Copy Markdown

@uros-ivetic uros-ivetic commented Oct 6, 2025

Greptile Overview

Updated On: 2025-10-06 08:44:20 UTC

Summary

This PR adds support for custom evaluator metadata in the V2 testing API, enabling evaluators to pass structured data beyond scores and results that can be displayed in the testing interface. The changes focus on two key files in the V2 testing infrastructure:

The modification extends the existing evaluation data flow pattern by adding a new evaluator_id_to_metadata parameter that maps evaluator IDs to their custom metadata dictionaries. In api.py, the send_create_result function signature is updated to accept this metadata parameter and include it in the JSON payload sent to the API endpoint. The run_manager.py file is updated to extract metadata from EvaluationWithId objects through the _evaluations_to_maps method, following the same pattern used for results, reasons, and scores.

This change integrates seamlessly with the existing codebase architecture, where the V2 API sends all evaluation data in a single composite request. The implementation maintains consistency with how other evaluation properties are handled - creating separate dictionaries for each property type and passing them through the established data flow pipeline. The metadata feature enables evaluators to provide additional context, debugging information, or custom visualization data that enhances the testing interface's capabilities.

Important Files Changed

Changed Files
Filename Score Overview
autoblocks/_impl/testing/v2/api.py 5/5 Adds evaluator_id_to_metadata parameter to send_create_result function for API integration
autoblocks/_impl/testing/v2/run_manager.py 4/5 Extends _evaluations_to_maps method to extract and pass metadata from evaluation objects

Confidence score: 4/5

  • This PR is safe to merge with low risk as it adds new functionality without modifying existing logic
  • Score reflects clean implementation following established patterns, though the required parameter addition suggests coordinated updates needed elsewhere
  • Pay attention to ensuring all callers of send_create_result are updated to provide the new required parameter

Sequence Diagram

sequenceDiagram
    participant User
    participant RunManager
    participant GlobalState
    participant EvaluationSystem
    participant API
    
    User->>RunManager: "create RunManager(app_slug, environment, run_message)"
    RunManager->>RunManager: "initialize run_id with cuid_generator()"
    
    User->>RunManager: "start()"
    RunManager->>GlobalState: "init()"
    RunManager->>RunManager: "set started_at timestamp"
    
    User->>RunManager: "add_result(test_case, output, duration_ms, evaluators)"
    RunManager->>RunManager: "create TestCaseContext"
    RunManager->>EvaluationSystem: "_compute_evaluations(test_case_ctx, output, evaluators)"
    EvaluationSystem->>EvaluationSystem: "run evaluators with concurrency control"
    EvaluationSystem-->>RunManager: "return List[EvaluationWithId]"
    RunManager->>RunManager: "_evaluations_to_maps(evals)"
    RunManager->>RunManager: "serialize input/output to JSON"
    RunManager->>API: "send_create_result(..., evaluator_id_to_metadata, ...)"
    API->>API: "post_to_api(/testing/results)"
    API-->>RunManager: "return response with executionId"
    RunManager-->>User: "return execution_id"
    
    User->>RunManager: "end()"
    RunManager->>RunManager: "set ended_at timestamp"
    RunManager->>RunManager: "enable can_create_human_review"
    
    User->>RunManager: "create_human_review(name, assignees, rubric_id)"
    RunManager->>API: "send_create_human_review_job(...)"
    API->>API: "post_to_api(/apps/{app_slug}/human-review/jobs)"
    API-->>RunManager: "return success"
    RunManager-->>User: "complete"
Loading

Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@josorio-autoblocks josorio-autoblocks changed the title feature: add evaluator metadata support to V2 API for custom data displa feature: add evaluator metadata support to V2 API Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant