Skip to content

Testbench Operator #21

@fmallmann

Description

@fmallmann

Test Workflows should be created within the Testbench based on an "Experiment" Custom Resource Definition (CRD). A new Testbench Operator needs to be developed—analogous to the agent-runtime-operator—to generate Test Workflows and their corresponding triggers based on Experiment Custom Resources (CRs).

Configuration Parameters
The Experiment CRD should make the following parameters configurable:

name: A unique name for the experiment.
dataset: The URL to the dataset used for evaluation.
metrics: A list of metric names (e.g., "faithfulness", "answer_relevancy") to be used by the evaluation script.
llm_as_a_judge_model: The model used as "LLM as a Judge" for RAGAS evaluations.
agent: A reference to the deployed agent under test (e.g., via its deployment name).

Tasks
Specify the Experiment CRD: Define the schema and structure of the new resource.
Implement the Operator: Develop the logic to transform an Experiment CR into a functional Test Workflow.
Trigger Creation: The operator must create a trigger that automatically executes the Test Workflow whenever the deployment of the referenced agent changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions