feature: add evaluator metadata support to V2 API#393
Open
uros-ivetic wants to merge 10 commits intomainfrom
Open
feature: add evaluator metadata support to V2 API#393uros-ivetic wants to merge 10 commits intomainfrom
uros-ivetic wants to merge 10 commits intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Greptile Overview
Updated On: 2025-10-06 08:44:20 UTC
Summary
This PR adds support for custom evaluator metadata in the V2 testing API, enabling evaluators to pass structured data beyond scores and results that can be displayed in the testing interface. The changes focus on two key files in the V2 testing infrastructure:The modification extends the existing evaluation data flow pattern by adding a new
evaluator_id_to_metadataparameter that maps evaluator IDs to their custom metadata dictionaries. Inapi.py, thesend_create_resultfunction signature is updated to accept this metadata parameter and include it in the JSON payload sent to the API endpoint. Therun_manager.pyfile is updated to extract metadata fromEvaluationWithIdobjects through the_evaluations_to_mapsmethod, following the same pattern used for results, reasons, and scores.This change integrates seamlessly with the existing codebase architecture, where the V2 API sends all evaluation data in a single composite request. The implementation maintains consistency with how other evaluation properties are handled - creating separate dictionaries for each property type and passing them through the established data flow pipeline. The metadata feature enables evaluators to provide additional context, debugging information, or custom visualization data that enhances the testing interface's capabilities.
Important Files Changed
Changed Files
Confidence score: 4/5
Sequence Diagram
sequenceDiagram participant User participant RunManager participant GlobalState participant EvaluationSystem participant API User->>RunManager: "create RunManager(app_slug, environment, run_message)" RunManager->>RunManager: "initialize run_id with cuid_generator()" User->>RunManager: "start()" RunManager->>GlobalState: "init()" RunManager->>RunManager: "set started_at timestamp" User->>RunManager: "add_result(test_case, output, duration_ms, evaluators)" RunManager->>RunManager: "create TestCaseContext" RunManager->>EvaluationSystem: "_compute_evaluations(test_case_ctx, output, evaluators)" EvaluationSystem->>EvaluationSystem: "run evaluators with concurrency control" EvaluationSystem-->>RunManager: "return List[EvaluationWithId]" RunManager->>RunManager: "_evaluations_to_maps(evals)" RunManager->>RunManager: "serialize input/output to JSON" RunManager->>API: "send_create_result(..., evaluator_id_to_metadata, ...)" API->>API: "post_to_api(/testing/results)" API-->>RunManager: "return response with executionId" RunManager-->>User: "return execution_id" User->>RunManager: "end()" RunManager->>RunManager: "set ended_at timestamp" RunManager->>RunManager: "enable can_create_human_review" User->>RunManager: "create_human_review(name, assignees, rubric_id)" RunManager->>API: "send_create_human_review_job(...)" API->>API: "post_to_api(/apps/{app_slug}/human-review/jobs)" API-->>RunManager: "return success" RunManager-->>User: "complete"