Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,33 @@ make dev-frontend # start Vite dev server (port 5173) with HMR
make dev-bundle # build UI, serve full bundled experience at port 8001 via uv run
```

Standard development uses `dev-backend` + `dev-frontend` in separate terminals. The Vite dev server proxies nothing the frontend calls the backend at `http://localhost:8001` directly via CORS.
Standard development uses `dev-backend` + `dev-frontend` in separate terminals. The Vite dev server proxies nothing; the frontend calls the backend at `http://localhost:8001` directly via CORS.

`dev-bundle` is useful for testing the bundled UI experience without building a wheel. It copies `ui/dist` into the source tree temporarily and cleans up when the server exits.

### Postgres backend (optional, for `/api/runs`)

The default in-memory backend keeps `make dev-backend` zero-config. To exercise the async run pipeline locally, bring up a Postgres alongside the app:

```bash
make pg-up # start postgres:17-alpine in a docker container (port 5432, ephemeral via --rm)
make migrate # apply the agentevals schema
make dev-backend-pg # pg-up + migrate + serve --dev with backend=postgres wired up
make pg-down # stop the container; data is discarded with --rm
```

Override the defaults via `PG_PORT=5433 make pg-up` etc. The `migrate` target is idempotent (a second invocation is a no-op).

Once running, submit a run with:

```bash
curl -X POST http://localhost:8001/api/runs \
-H 'content-type: application/json' \
-d '{"spec": {"approach": "trace_replay", "target": {"kind": "inline", "inline": {...}}, "evalConfig": {"metrics": ["tool_trajectory_avg_score"]}}}'
```

Then poll `GET /api/runs/{runId}` and `GET /api/runs/{runId}/results`. Without `storage.backend=postgres`, the `/api/runs` endpoints return 503 with a hint pointing at the env var.

### Building

```bash
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ COPY src ./src

COPY --from=ui /build/ui/dist ./src/agentevals/_static

RUN uv sync --frozen --no-dev --extra live \
RUN uv sync --frozen --no-dev --extra live --extra postgres \
&& groupadd --gid 1000 app \
&& useradd --uid 1000 --gid app --home-dir /app --no-log-init app \
&& chown -R app:app /app
Expand Down
33 changes: 32 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,14 @@ HELM_CHART_DIR ?= charts/agentevals
HELM_CHART_OCI_URL ?= $(HELM_REPO)/helm
HELM_CHART_VERSION ?= $(VERSION)

.PHONY: build build-bundle build-docker build-ui release clean dev-backend dev-frontend dev-bundle test test-unit test-integration test-e2e helm-lint helm-template helm-test helm-cleanup helm-package helm-publish
.PHONY: build build-bundle build-docker build-ui release clean dev-backend dev-backend-pg dev-frontend dev-bundle pg-up pg-down migrate test test-unit test-integration test-e2e helm-lint helm-template helm-test helm-cleanup helm-package helm-publish

PG_CONTAINER ?= agentevals-pg
PG_PORT ?= 5432
PG_USER ?= agentevals
PG_PASSWORD ?= agentevals
PG_DATABASE ?= agentevals
PG_DSN ?= postgresql://$(PG_USER):$(PG_PASSWORD)@localhost:$(PG_PORT)/$(PG_DATABASE)

build:
uv build
Expand Down Expand Up @@ -53,6 +60,30 @@ release: clean build-ui
dev-backend:
uv run agentevals serve --dev

pg-up:
@if [ -z "$$(docker ps -q -f name=^/$(PG_CONTAINER)$$)" ]; then \
docker run -d --rm --name $(PG_CONTAINER) \
-e POSTGRES_USER=$(PG_USER) \
-e POSTGRES_PASSWORD=$(PG_PASSWORD) \
-e POSTGRES_DB=$(PG_DATABASE) \
-p $(PG_PORT):5432 postgres:17-alpine; \
else \
echo "container $(PG_CONTAINER) already running"; \
fi
@until docker exec $(PG_CONTAINER) pg_isready -U $(PG_USER) >/dev/null 2>&1; do sleep 1; done
@echo "Postgres ready at $(PG_DSN)"

pg-down:
-docker stop $(PG_CONTAINER)

migrate:
AGENTEVALS_DATABASE_URL=$(PG_DSN) uv run agentevals migrate up

dev-backend-pg: pg-up migrate
AGENTEVALS_STORAGE_BACKEND=postgres \
AGENTEVALS_DATABASE_URL=$(PG_DSN) \
uv run agentevals serve --dev

dev-frontend:
cd ui && npm run dev

Expand Down
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,24 @@ The source for the chart lives in [`charts/agentevals/`](charts/agentevals/) if

See the [Kubernetes example](examples/kubernetes/README.md) for an end-to-end walkthrough deploying agentevals alongside kagent and an OTel Collector on Kubernetes.

#### Postgres backend (`/api/runs`)

By default the chart deploys agentevals with an in-memory backend; runs and results are not persisted. To enable the async `POST /api/runs` pipeline with durable Postgres-backed state:

```bash
# Bundled Postgres (dev / evaluation only):
helm install agentevals oci://ghcr.io/agentevals-dev/agentevals/helm/agentevals \
--set storage.backend=postgres \
--set database.postgres.bundled.enabled=true

# Or supply an external Postgres DSN:
helm install agentevals oci://ghcr.io/agentevals-dev/agentevals/helm/agentevals \
--set storage.backend=postgres \
--set database.postgres.url='postgresql://user:pass@host:5432/dbname'
```

When `storage.backend=postgres` the app applies any pending schema migrations on startup (advisory-lock protected, safe across replicas) and starts an in-process worker that processes the run queue. Without `storage.backend=postgres` the `/api/runs` endpoints return 503 with a hint pointing at the env var.

## MCP Server

Exposes evaluation tools to MCP clients. A `.mcp.json` at the project root lets Claude Code pick it up automatically.
Expand Down
33 changes: 33 additions & 0 deletions charts/agentevals/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,43 @@ app.kubernetes.io/name: {{ include "agentevals.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{- /*
Selector labels scoped to the main app Pod and its Service. Carries the
``app.kubernetes.io/component: agentevals`` discriminator so the agentevals
Service does not also match the bundled Postgres Pod (which carries
``app.kubernetes.io/component: database`` instead).
*/ -}}
{{- define "agentevals.app.selectorLabels" -}}
{{ include "agentevals.selectorLabels" . }}
app.kubernetes.io/component: agentevals
{{- end }}

{{- define "agentevals.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "agentevals.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}

{{/*
Service name for the bundled Postgres instance.
*/}}
{{- define "agentevals.postgresqlServiceName" -}}
{{- printf "%s-postgresql" (include "agentevals.fullname" .) -}}
{{- end -}}

{{/*
Bundled Postgres image reference (registry/repository/name:tag).
*/}}
{{- define "agentevals.postgresql.image" -}}
{{- $pg := .Values.database.postgres.bundled -}}
{{- printf "%s/%s/%s:%s" $pg.image.registry $pg.image.repository $pg.image.name $pg.image.tag -}}
{{- end -}}

{{/*
Secret name holding POSTGRES_PASSWORD for the bundled Postgres instance.
*/}}
{{- define "agentevals.passwordSecretName" -}}
{{- printf "%s-postgresql" (include "agentevals.fullname" .) -}}
{{- end -}}
27 changes: 25 additions & 2 deletions charts/agentevals/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "agentevals.selectorLabels" . | nindent 6 }}
{{- include "agentevals.app.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "agentevals.selectorLabels" . | nindent 8 }}
{{- include "agentevals.app.selectorLabels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
Expand Down Expand Up @@ -65,6 +65,29 @@ spec:
- name: HOME
value: "/tmp/agentevals-home"
{{- end }}
{{- if eq .Values.storage.backend "postgres" }}
- name: AGENTEVALS_STORAGE_BACKEND
value: "postgres"
- name: AGENTEVALS_DATABASE_SCHEMA
value: {{ .Values.database.postgres.schema | quote }}
{{- if .Values.database.postgres.urlFile }}
- name: AGENTEVALS_DATABASE_URL_FILE
value: {{ .Values.database.postgres.urlFile | quote }}
{{- else if .Values.database.postgres.url }}
- name: AGENTEVALS_DATABASE_URL
value: {{ .Values.database.postgres.url | quote }}
{{- else if .Values.database.postgres.bundled.enabled }}
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "agentevals.passwordSecretName" . }}
key: POSTGRES_PASSWORD
- name: AGENTEVALS_DATABASE_URL
value: {{ printf "postgresql://agentevals:$(POSTGRES_PASSWORD)@%s.%s.svc.cluster.local:5432/agentevals?sslmode=disable" (include "agentevals.postgresqlServiceName" .) (include "agentevals.namespace" .) | quote }}
{{- else }}
{{ fail "storage.backend=postgres requires database.postgres.url, database.postgres.urlFile, or database.postgres.bundled.enabled=true" }}
{{- end }}
{{- end }}
{{- with .Values.env }}
{{- toYaml . | nindent 12 }}
{{- end }}
Expand Down
13 changes: 13 additions & 0 deletions charts/agentevals/templates/postgresql-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{{- if and (eq .Values.storage.backend "postgres") .Values.database.postgres.bundled.enabled (not .Values.database.postgres.url) (not .Values.database.postgres.urlFile) }}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "agentevals.passwordSecretName" . }}
namespace: {{ include "agentevals.namespace" . }}
labels:
{{- include "agentevals.labels" . | nindent 4 }}
app.kubernetes.io/component: database
type: Opaque
data:
POSTGRES_PASSWORD: {{ "agentevals" | b64enc | quote }}
{{- end }}
142 changes: 142 additions & 0 deletions charts/agentevals/templates/postgresql.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
{{- if and (eq .Values.storage.backend "postgres") .Values.database.postgres.bundled.enabled (not .Values.database.postgres.url) (not .Values.database.postgres.urlFile) }}
{{- $pg := .Values.database.postgres.bundled }}
{{- $fullname := include "agentevals.postgresqlServiceName" . }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ $fullname }}
namespace: {{ include "agentevals.namespace" . }}
labels:
{{- include "agentevals.labels" . | nindent 4 }}
app.kubernetes.io/component: database
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ $fullname }}
namespace: {{ include "agentevals.namespace" . }}
labels:
{{- include "agentevals.labels" . | nindent 4 }}
app.kubernetes.io/component: database
spec:
accessModes:
- ReadWriteOnce
{{- if $pg.storageClassName }}
storageClassName: {{ $pg.storageClassName | quote }}
{{- end }}
resources:
requests:
storage: {{ $pg.storage | quote }}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ $fullname }}
namespace: {{ include "agentevals.namespace" . }}
labels:
{{- include "agentevals.labels" . | nindent 4 }}
app.kubernetes.io/component: database
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
{{- include "agentevals.selectorLabels" . | nindent 6 }}
app.kubernetes.io/component: database
template:
metadata:
labels:
{{- include "agentevals.selectorLabels" . | nindent 8 }}
app.kubernetes.io/component: database
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ $fullname }}
securityContext:
fsGroup: 999
runAsUser: 999
runAsGroup: 999
runAsNonRoot: true
containers:
- name: postgresql
image: {{ include "agentevals.postgresql.image" . }}
imagePullPolicy: {{ $pg.image.pullPolicy }}
securityContext:
allowPrivilegeEscalation: false
ports:
- name: postgresql
containerPort: 5432
protocol: TCP
env:
- name: POSTGRES_DB
value: "agentevals"
- name: POSTGRES_USER
value: "agentevals"
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "agentevals.passwordSecretName" . }}
key: POSTGRES_PASSWORD
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
livenessProbe:
exec:
command:
- pg_isready
- -U
- agentevals
- -d
- agentevals
initialDelaySeconds: 20
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
exec:
command:
- pg_isready
- -U
- agentevals
- -d
- agentevals
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
{{- with $pg.resources }}
resources:
{{- toYaml . | nindent 12 }}
{{- end }}
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumes:
- name: data
persistentVolumeClaim:
claimName: {{ $fullname }}
---
apiVersion: v1
kind: Service
metadata:
name: {{ $fullname }}
namespace: {{ include "agentevals.namespace" . }}
labels:
{{- include "agentevals.labels" . | nindent 4 }}
app.kubernetes.io/component: database
spec:
type: ClusterIP
ports:
- name: postgresql
port: 5432
targetPort: postgresql
protocol: TCP
selector:
{{- include "agentevals.selectorLabels" . | nindent 4 }}
app.kubernetes.io/component: database
{{- end }}
2 changes: 1 addition & 1 deletion charts/agentevals/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ spec:
targetPort: mcp
protocol: TCP
selector:
{{- include "agentevals.selectorLabels" . | nindent 4 }}
{{- include "agentevals.app.selectorLabels" . | nindent 4 }}
Loading