Operator: Scaffold Go project with Operator SDK and define Experiment CRD types by Copilot · Pull Request #32 · agentic-layer/testbench

Copilot · 2026-02-25T12:37:47Z

Bootstrap the testbench operator with a complete Experiment CRD schema to enable Kubernetes-native agent evaluation workflows.

CRD Structure

ExperimentSpec defines evaluation configuration:

agentRef: Target agent (name, namespace)
dataset: S3/URL source for test data (mutually exclusive with inline scenarios)
scenarios: Inline test scenarios with steps, metrics, and reference data
llmAsAJudgeModel: Model for metric evaluation (e.g., gemini-2.5-flash-lite)
defaultThreshold: Global pass/fail threshold (0.0-1.0, default: 0.9)
trigger: Execution policy (enabled, event, concurrencyPolicy)

ExperimentStatus tracks reconciliation state:

conditions: Standard Kubernetes conditions
generatedResources: Created TestWorkflow resources
lastExecution: Testkube execution metadata (ID, number, timestamps, status)

Validation

CEL rule: Dataset and scenarios mutually exclusive via !(has(self.dataset) && has(self.scenarios))
Numeric constraints: Threshold fields validated to [0.0, 1.0] range
Enum constraints: ConcurrencyPolicy limited to Allow, Forbid, Replace
Array constraints: Scenario steps require minItems: 1

Technical Notes

Uses runtime.RawExtension for flexible JSON fields (ToolCall args, Step customValues, Metric parameters)
Requires crd:allowDangerousTypes=true in Makefile for float64 threshold fields
Generated CRD: config/crd/bases/testbench.agentic-layer.ai_experiments.yaml
Sample CR demonstrates S3 dataset source pattern

apiVersion: testbench.agentic-layer.ai/v1alpha1
kind: Experiment
spec:
  agentRef:
    name: weather-agent
    namespace: sample-agents
  llmAsAJudgeModel: gemini-2.5-flash-lite
  defaultThreshold: 0.9
  dataset:
    s3:
      bucket: testbench
      key: dataset.csv
  trigger:
    enabled: true
    concurrencyPolicy: Forbid

Original prompt

This section details on the original issue you should resolve

<issue_title>Operator: Scaffold Go project with Operator SDK and define Experiment CRD types</issue_title>
<issue_description>## Parent Issue
Part of #27

Description

Bootstrap the Go operator project using Operator SDK and define the Experiment CRD API types.

Scope

Initialize Go module with operator-sdk init (domain agentic-layer.ai, group testbench)

Create the Experiment API type (v1alpha1) with operator-sdk create api

Define all Go structs:

ExperimentSpec: agentRef, dataset, llmAsAJudgeModel, defaultThreshold, scenarios, trigger

AgentRef: name, namespace

DatasetSource: s3 (bucket, key), url

Scenario, Step, Metric, Reference, ToolCall structs

TriggerSpec: enabled, event, concurrencyPolicy

Define ExperimentStatus struct with conditions, generatedResources, lastExecution

Add kubebuilder validation markers:

defaultThreshold and metric threshold validated to 0-1 range

dataset and scenarios mutually exclusive (CEL validation rule)

Required fields marked appropriately

Generate CRD manifests with make manifests

Basic Makefile targets: build, manifests, generate, docker-build

Acceptance Criteria

operator/ directory with working Go module

api/v1alpha1/experiment_types.go with all struct definitions

CRD YAML generated in operator/config/crd/bases/

OpenAPI validation for threshold ranges and mutual exclusivity

go build ./... passes

make manifests generates valid CRD

References

agent-runtime-operator for project structure reference

Issue Testbench Operator: Reconcile Experiment CRD to TestWorkflows #27 CRD specification section</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Operator: Scaffold Go project with Operator SDK and define Experiment CRD types #28

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Initialize Go module with operator-sdk (domain: agentic-layer.ai) - Create Experiment API v1alpha1 with all required types - Define ExperimentSpec with agentRef, dataset, llmAsAJudgeModel, defaultThreshold, scenarios, trigger - Define supporting types: AgentRef, DatasetSource, S3Source, Scenario, Step, Metric, Reference, ToolCall, TriggerSpec - Define ExperimentStatus with conditions, generatedResources, lastExecution - Add kubebuilder validation markers: - Threshold ranges (0.0-1.0) on defaultThreshold and metric threshold - MinItems validation on scenario steps - Enum validation on concurrencyPolicy (Allow, Forbid, Replace) - CEL validation rule for dataset/scenarios mutual exclusivity - Generate CRD manifests with make manifests - Update sample CR with realistic example - Verify go build ./... passes successfully Co-authored-by: fmallmann <30110193+fmallmann@users.noreply.github.com>

Initial plan

09c1dec

Copilot AI assigned Copilot and fmallmann Feb 25, 2026

Copilot started work on behalf of fmallmann February 25, 2026 12:38 View session

Copilot AI changed the title ~~[WIP] Add initial Go project scaffolding with Experiment CRD types~~ Operator: Scaffold Go project with Operator SDK and define Experiment CRD types Feb 25, 2026

Copilot finished work on behalf of fmallmann February 25, 2026 12:46

Copilot AI requested a review from fmallmann February 25, 2026 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator: Scaffold Go project with Operator SDK and define Experiment CRD types#32

Operator: Scaffold Go project with Operator SDK and define Experiment CRD types#32
Copilot wants to merge 2 commits intomainfrom
copilot/scaffold-go-operator-project

Copilot AI commented Feb 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CRD Structure

Validation

Technical Notes

Description

Scope

Acceptance Criteria

References

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 25, 2026 •

edited

Loading