-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Test Workflows should be created within the Testbench based on an "Experiment" Custom Resource Definition (CRD). A new Testbench Operator needs to be developed—analogous to the agent-runtime-operator—to generate Test Workflows and their corresponding triggers based on Experiment Custom Resources (CRs).
Configuration Parameters
The Experiment CRD should make the following parameters configurable:
name: A unique name for the experiment.
dataset: The URL to the dataset used for evaluation.
metrics: A list of metric names (e.g., "faithfulness", "answer_relevancy") to be used by the evaluation script.
llm_as_a_judge_model: The model used as "LLM as a Judge" for RAGAS evaluations.
agent: A reference to the deployed agent under test (e.g., via its deployment name).
Tasks
Specify the Experiment CRD: Define the schema and structure of the new resource.
Implement the Operator: Develop the logic to transform an Experiment CR into a functional Test Workflow.
Trigger Creation: The operator must create a trigger that automatically executes the Test Workflow whenever the deployment of the referenced agent changes.