Kubernetes-native agent evaluation system that executes test datasets via A2A protocol, evaluates responses using pluggable metrics (RAGAS by default), and publishes scores via OpenTelemetry.
Full documentation is available at docs.agentic-layer.ai.
- Python 3.12+ and uv
- Kubernetes cluster (e.g. kind) with Tilt
- Testkube CLI
GOOGLE_API_KEYfor LLM-as-a-judge evaluation via Gemini models
# 1. Start local infrastructure (AI Gateway, OTLP collector, sample agents, Testkube)
# Create a .env file with GOOGLE_API_KEY=your-key first
tilt up
# 2. Run the example evaluation workflow
kubectl testkube run tw example-workflow --watchSee the how-to guide for detailed pipeline usage including dataset format, metric configuration, and custom workflows.
| Command | Description |
|---|---|
uv run poe check |
Run all quality checks (tests, mypy, bandit, ruff) |
uv run poe test |
Unit tests |
uv run poe format |
Format with Ruff |
uv run poe lint |
Lint and auto-fix with Ruff |
uv run poe ruff |
Both format and lint |
uv run poe mypy |
Static type checking |
uv run poe bandit |
Security vulnerability scanning |
Requires the Tilt environment running (tilt up).
# Configure (optional — defaults target the Tilt environment)
export E2E_DATASET_URL="http://data-server.data-server:8000/dataset.csv"
export E2E_AGENT_URL="http://weather-agent.sample-agents:8000"
export E2E_MODEL="gemini-2.5-flash-lite"
# Run
uv run poe test_e2eSee Contribution Guide for details on contributing and the process for submitting pull requests.