agentevals-dev · krisztianfekete · Apr 1, 2026 · Apr 1, 2026
diff --git a/examples/README.md b/examples/README.md
@@ -102,6 +102,12 @@ Detection checks for `gen_ai.request.model` / `gen_ai.input.messages` (GenAI sem
 
 The zero-code and SDK examples implement the same toy agent (dice rolling + prime checking) so you can compare the two approaches directly.
 
+## Kubernetes
+
+| Example | Description |
+|---------|-------------|
+| [kubernetes/](./kubernetes/) | Deploy agentevals with kagent on Kubernetes, using an OTel Collector as a gRPC to HTTP bridge. Includes a walkthrough for comparing two kagent agents (different models) and evaluating them with tool trajectory and response match scores. |
+
 ## Advanced: GenAI Semantic Convention Patterns
 
 > [!TIP]

diff --git a/examples/kubernetes/README.md b/examples/kubernetes/README.md
@@ -0,0 +1,238 @@
+# Kubernetes: Evaluating kagent with agentevals
+
+Run agentevals alongside [kagent](https://github.com/kagent-dev/kagent) on Kubernetes to evaluate AI agent conversations in real time. This example deploys three components:
+
+1. **agentevals** receives OTLP traces over HTTP and serves the evaluation UI
+2. **OTel Collector** bridges the protocol gap: kagent exports traces via gRPC, but agentevals only supports OTLP/HTTP today, so the Collector converts gRPC to HTTP
+3. **kagent** provides Kubernetes-native AI agents with built-in OTel instrumentation (gRPC export only)
+
+```
+kagent (gRPC :4317) --> OTel Collector --> agentevals (HTTP :4318)
+                                              |
+                                         UI on :8001
+```
+
+## Prerequisites
+
+- A running Kubernetes cluster (kind, minikube, EKS, GKE, etc.)
+- `helm` v3 installed
+- `kubectl` configured for your cluster
+- An OpenAI API key (`OPENAI_API_KEY`)
+
+## Deploy
+
+### 1. agentevals
+
+```bash
+helm install agentevals ./charts/agentevals \
+  --set tag=0.6.3
+```
+
+This creates a single pod exposing:
+
+| Port | Purpose |
+|------|---------|
+| 8001 | Web UI and API |
+| 4318 | OTLP HTTP receiver (traces and logs) |
+| 8080 | MCP (Streamable HTTP) |
+
+### 2. OTel Collector (gRPC to HTTP bridge)
+
+kagent exports traces over gRPC (port 4317), but agentevals accepts OTLP over HTTP (port 4318). The OTel Collector bridges the two protocols.
+
+```bash
+helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
+helm repo update
+
+helm upgrade --install otel-collector open-telemetry/opentelemetry-collector \
+  --namespace kagent --create-namespace \
+  --set mode=deployment \
+  --set replicaCount=1 \
+  --set image.repository=otel/opentelemetry-collector \
+  --set ports.otlp.enabled=true \
+  --set ports.otlp-http.enabled=false \
+  --set config.exporters.otlphttp.endpoint="http://agentevals.default.svc.cluster.local:4318" \
+  --set config.exporters.otlphttp.compression="none" \
+  --set config.service.pipelines.traces.receivers[0]=otlp \
+  --set config.service.pipelines.traces.exporters[0]=otlphttp \
+  --set config.service.pipelines.logs.receivers[0]=otlp \
+  --set config.service.pipelines.logs.exporters[0]=otlphttp
+```
+
+> **Note:** If you deployed agentevals in a namespace other than `default`, update the `endpoint` value accordingly: `http://agentevals.<namespace>.svc.cluster.local:4318`.
+
+### 3. kagent
+
+Install the CRDs first, then the kagent operator with OTel tracing enabled:
+
+```bash
+helm install kagent-crds oci://ghcr.io/kagent-dev/kagent/helm/kagent-crds \
+  --namespace kagent \
+  --create-namespace
+
+helm upgrade --install kagent oci://ghcr.io/kagent-dev/kagent/helm/kagent \
+  --namespace kagent \
+  --set providers.default=openAI \
+  --set providers.openAI.apiKey=$OPENAI_API_KEY \
+  --set agents.kgateway-agent.enabled=false \
+  --set agents.istio-agent.enabled=false \
+  --set agents.promql-agent.enabled=false \
+  --set agents.observability-agent.enabled=false \
+  --set agents.argo-rollouts-agent.enabled=false \
+  --set agents.cilium-policy-agent.enabled=false \
+  --set agents.cilium-manager-agent.enabled=false \
+  --set agents.cilium-debug-agent.enabled=false \
+  --set otel.tracing.enabled=true \
+  --set otel.tracing.exporter.otlp.endpoint="otel-collector-opentelemetry-collector.kagent.svc.cluster.local:4317" \
+  --set otel.tracing.exporter.otlp.insecure=true
+```
+
+This installs kagent with only the default Helm agent (`helm-agent`) and the K8s troubleshooter enabled, and points its OTel exporter at the Collector.
+
+### Verify the deployment
+
+```bash
+kubectl get pods -A -l 'app.kubernetes.io/name in (agentevals, kagent, opentelemetry-collector)'
+```
+
+All pods should be `Running` before continuing.
+
+## Walkthrough: Comparing models with kagent and agentevals
+
+This walkthrough shows how to evaluate two kagent agents side by side: the default Helm agent running `gpt-4.1-mini` and a new agent running `gpt-5`. You will chat with both agents, watch their traces stream into agentevals, select the better session as the evaluation baseline, and score both on tool trajectory and response match.
+
+### Step 1. Access the UIs
+
+Port-forward both services to your local machine:
+
+```bash
+# Terminal 1: agentevals UI
+kubectl port-forward svc/agentevals 8001:8001
+
+# Terminal 2: kagent UI
+kubectl port-forward -n kagent svc/kagent 8083:8083
+```
+
+Open **http://localhost:8083** for the kagent UI and **http://localhost:8001** for the agentevals UI.
+
+### Step 2. Create a GPT-5 agent
+
+kagent ships with a default `helm-agent` configured to use `gpt-4.1-mini`. Create a second agent that uses `gpt-5` so you can compare the two.
+
+**Option A: via the kagent UI**
+
+1. Open http://localhost:8083
+2. Navigate to the Agents page
+3. Click **Create Agent**
+4. Copy the configuration from the existing `helm-agent` (same system prompt, same tools)
+5. Change the model to `gpt-5`
+6. Name it `helm-agent-gpt5`
+7. Save
+
+**Option B: via a CRD**
+
+Apply the following manifest (adjust the system prompt if needed):
+
+```yaml
+apiVersion: kagent.dev/v1alpha1
+kind: Agent
+metadata:
+  name: helm-agent-gpt5
+  namespace: kagent
+spec:
+  description: "Helm agent (GPT-5) for model comparison"
+  modelConfig:
+    model: gpt-5
+    apiKeySecretRef:
+      name: kagent-openai
+      key: OPENAI_API_KEY
+  systemPrompt: |
+    You are a Kubernetes Helm expert. You help users manage Helm charts,
+    releases, and repositories. Use your tools to inspect and manage
+    Helm resources in the cluster.
+  tools:
+    - name: helm-list
+    - name: helm-status
+    - name: helm-get-values
+    - name: helm-history
+```
+
+```bash
+kubectl apply -f helm-agent-gpt5.yaml
+```
+
+### Step 3. Open agentevals Live view
+
+1. Go to http://localhost:8001
+2. Click **Live** in the sidebar to open the live streaming view
+3. Leave this tab open. Sessions will appear as traces arrive.
+
+### Step 4. Chat with both agents
+
+Switch to the kagent UI (http://localhost:8083) and have the same conversation with each agent. For example:
+
+**With `helm-agent` (gpt-4.1-mini):**
+
+1. Select `helm-agent` from the agent list
+2. Start a new conversation
+3. Ask: *"List all Helm releases across all namespaces and tell me which ones have pending upgrades"*
+4. Follow up: *"Show me the values for the agentevals release"*
+
+**With `helm-agent-gpt5` (gpt-5):**
+
+1. Select `helm-agent-gpt5` from the agent list
+2. Start a new conversation
+3. Ask the same questions in the same order
+
+### Step 5. Watch traces in agentevals
+
+Switch back to the agentevals Live view at http://localhost:8001. You will see two sessions appear, one for each conversation. Each session shows:
+
+- **Status** transitioning from ACTIVE to COMPLETED as the conversation ends
+- **Span count** incrementing in real time as the agent makes LLM calls and tool invocations
+- **Model name** visible in the session metadata
+
+### Step 6. Select the GPT-5 session as the eval set
+
+Once both sessions are complete:
+
+1. Click on the `helm-agent-gpt5` session card to open its trace details
+2. Review the conversation: check that it called the right tools and produced correct responses
+3. Click **Use as Eval Set** to mark this session as the evaluation baseline
+4. Give it a name like `helm-agent-comparison`
+
+This captures the GPT-5 session's tool trajectory and final responses as the golden reference.
+
+### Step 7. Evaluate both sessions
+
+1. Go back to the sessions list
+2. Select both sessions (the `gpt-4.1-mini` session and the `gpt-5` session)
+3. Click **Evaluate**
+4. Select the `helm-agent-comparison` eval set
+5. Choose the metrics:
+   - **tool_trajectory_avg_score**: Did the agent call the correct tools in the correct order?
+   - **response_match_score**: Did the agent produce responses consistent with the golden reference?
+6. Run the evaluation
+
+### What to look for
+
+| Metric | What it tells you |
+|--------|------------------|
+| `tool_trajectory_avg_score` | Whether the agent followed the expected sequence of Helm tool calls (`helm-list`, then `helm-get-values`). A score of 1.0 means it matched exactly. |
+| `response_match_score` | How closely the agent's final answers matched the GPT-5 baseline. Useful for catching regressions when switching to a cheaper model. |
+
+Compare the two sessions in the results table:
+
+- **Token usage**: The session metadata includes total token counts. If `gpt-5` consumed fewer tokens while achieving the same trajectory score, it may be the better choice for this use case.
+- **Tool trajectory**: If one agent called extra tools or skipped expected ones, the trajectory score reflects that.
+- **Response quality**: A lower response match score on the `gpt-4.1-mini` session highlights where the cheaper model diverged from the GPT-5 baseline.
+
+## Cleanup
+
+```bash
+helm uninstall kagent -n kagent
+helm uninstall kagent-crds -n kagent
+helm uninstall otel-collector -n kagent
+helm uninstall agentevals
+kubectl delete namespace kagent
+```