Testbench

Kubernetes-native agent evaluation system that executes test datasets via A2A protocol, evaluates responses using pluggable metrics (RAGAS by default), and publishes scores via OpenTelemetry.

Documentation

Full documentation is available at docs.agentic-layer.ai.

Prerequisites

Python 3.12+ and uv
Kubernetes cluster (e.g. kind) with Tilt
Testkube CLI
GOOGLE_API_KEY for LLM-as-a-judge evaluation via Gemini models

Getting Started

# 1. Start local infrastructure (AI Gateway, OTLP collector, sample agents, Testkube)
#    Create a .env file with GOOGLE_API_KEY=your-key first
tilt up

# 2. Run the example evaluation workflow
kubectl testkube run tw example-workflow --watch

See the how-to guide for detailed pipeline usage including dataset format, metric configuration, and custom workflows.

Development

Command	Description
`uv run poe check`	Run all quality checks (tests, mypy, bandit, ruff)
`uv run poe test`	Unit tests
`uv run poe format`	Format with Ruff
`uv run poe lint`	Lint and auto-fix with Ruff
`uv run poe ruff`	Both format and lint
`uv run poe mypy`	Static type checking
`uv run poe bandit`	Security vulnerability scanning

E2E Testing

Requires the Tilt environment running (tilt up).

# Configure (optional — defaults target the Tilt environment)
export E2E_DATASET_URL="http://data-server.data-server:8000/dataset.csv"
export E2E_AGENT_URL="http://weather-agent.sample-agents:8000"
export E2E_MODEL="gemini-2.5-flash-lite"

# Run
uv run poe test_e2e

Contributing

See Contribution Guide for details on contributing and the process for submitting pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github		.github
chart		chart
deploy/local		deploy/local
docs		docs
examples		examples
scripts		scripts
tests		tests
tests_e2e		tests_e2e
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Tiltfile		Tiltfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Testbench

Documentation

Prerequisites

Getting Started

Development

E2E Testing

Contributing

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

agentic-layer/testbench

Folders and files

Latest commit

History

Repository files navigation

Testbench

Documentation

Prerequisites

Getting Started

Development

E2E Testing

Contributing

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages