diff --git a/docs/contributing/README.md b/docs/contributing/README.md index e6155b33a..63921fba6 100644 --- a/docs/contributing/README.md +++ b/docs/contributing/README.md @@ -106,6 +106,151 @@ sudo mv ./bin/holodeck /usr/local/bin/holodeck - Update existing tests when modifying features - Run the full test suite with `make test` +## E2E Testing + +Holodeck's end-to-end tests run on real AWS infrastructure. They are organized +into two tiers that control when tests execute in CI. + +### E2E Test Structure + +**Smoke tier (pre-merge)** — `.github/workflows/e2e-smoke.yaml` + +Runs on every PR push. Covers two label filters: + +- `default && !rpm` — standard single-node environment without RPM distros +- `cluster && minimal` — smallest valid multinode cluster + +Each job takes roughly 20 minutes, giving fast feedback before merge. + +**Full tier (post-merge)** — `.github/workflows/e2e.yaml` + +Runs only when a commit lands on `main` or a `release-*` branch. +Covers 13 label filters plus a separate arm64 job and an +integration-test job that exercises holodeck as a GitHub Action. + +| Label filter | What it covers | +|---|---| +| `legacy` | Kubernetes using a legacy version | +| `dra` | Dynamic Resource Allocation enabled | +| `kernel` | Kernel features / custom kernel | +| `ctk-git` | Container Toolkit installed from git source | +| `k8s-git` | Kubernetes built from git (kubeadm) | +| `k8s-kind-git` | Kubernetes built from git (KIND) | +| `k8s-latest` | Kubernetes tracking master branch | +| `cluster && gpu && !minimal && !ha && !dedicated` | Standard GPU cluster | +| `cluster && dedicated` | Cluster with dedicated CPU control-plane | +| `cluster && ha` | HA cluster (3 control-plane nodes) | +| `rpm-rocky` | Rocky Linux 9 — multiple container runtimes | +| `rpm-al2023` | Amazon Linux 2023 — multiple container runtimes | +| `rpm-fedora` | Fedora 42 — multiple container runtimes | + +The `arm64` job is a separate workflow job (not a matrix entry) that only +runs on `main`. It uses `--label-filter='arm64'` — a test must carry +`Label("arm64")` to be selected. + +### Label Taxonomy + +Tests are tagged with Ginkgo `Label()` annotations. Each test can carry +multiple labels; CI selects tests using boolean filter expressions. + +**Single-node labels** (defined in `tests/aws_test.go`): + +| Label | Description | +|---|---| +| `default` | Basic AWS environment, default configuration | +| `legacy` | Legacy Kubernetes version | +| `dra` | Dynamic Resource Allocation | +| `kernel` | Custom kernel features | +| `ctk-git` | CTK from git source | +| `k8s-git` | Kubernetes from git (kubeadm) | +| `k8s-kind-git` | Kubernetes from git (KIND) | +| `k8s-latest` | Kubernetes master branch | +| `rpm` | Any RPM-based distribution | +| `rpm-rocky` | Rocky Linux 9 | +| `rpm-al2023` | Amazon Linux 2023 | +| `rpm-fedora` | Fedora 42 | +| `post-merge` | Excluded from smoke tier; full tier only | + +**Cluster labels** (defined in `tests/aws_cluster_test.go`): + +| Label | Description | +|---|---| +| `cluster` | Multinode cluster test | +| `multinode` | Two or more nodes | +| `gpu` | GPU worker nodes | +| `minimal` | Smallest valid configuration (1 CP + 1 worker) | +| `dedicated` | Dedicated CPU control-plane node | +| `ha` | High-availability control plane (3 nodes) | +| `rpm` | RPM-based cluster OS | +| `rpm-rocky` | Rocky Linux 9 cluster | +| `rpm-al2023` | Amazon Linux 2023 cluster | +| `post-merge` | Excluded from smoke tier; full tier only | + +The `post-merge` label is the mechanism that keeps a test out of the smoke +tier. The smoke workflow's label filter `"default && !rpm"` already excludes +RPM tests, but adding `post-merge` makes the intent explicit and ensures the +test is skipped by any future smoke filter that might otherwise match it. + +### How to Add New Tests + +1. **Single-node test** — add an `Entry(...)` to the `DescribeTable` in + `tests/aws_test.go`. +1. **Cluster test** — add an `Entry(...)` to `tests/aws_cluster_test.go`. +1. Create the corresponding fixture file under `tests/data/`. +1. Assign Ginkgo labels with `Label("label1", "label2", ...)` as the last + argument of the `Entry`. +1. If the test is an edge case, platform-specific variant, or is expensive + (> 30 min), add `"post-merge"` to its label list so it runs only in the + full tier. + +Example: + +```go +Entry("My New Feature Test", testConfig{ + name: "my-feature-test", + filePath: filepath.Join(packagePath, "data", "test_aws_my_feature.yml"), + description: "Tests my new feature end-to-end", +}, Label("default", "my-feature")), +``` + +### Which Tier to Target + +| Use smoke tier (no `post-merge`) | Use full tier (`post-merge`) | +|---|---| +| Core functionality every PR should validate | Edge cases and less-common paths | +| Fast tests (< 25 min) | Slow tests (> 30 min) | +| Platform-agnostic defaults | Platform-specific variants (RPM distros, arm64) | +| Minimal cluster configurations | Full-scale, HA, or dedicated cluster topologies | + +If in doubt, start with `post-merge` and promote the label out of the full +tier once the test has demonstrated stability. + +### Running E2E Tests Locally + +Use the Ginkgo label filter to select which tests to run: + +```bash +# Run only the smoke-equivalent tests +make -f tests/Makefile test GINKGO_ARGS="--label-filter='default && !rpm'" + +# Run a specific label +make -f tests/Makefile test GINKGO_ARGS="--label-filter='cluster && minimal'" + +# Run all RPM tests for Rocky 9 +make -f tests/Makefile test GINKGO_ARGS="--label-filter='rpm-rocky'" +``` + +Required environment variables: + +```bash +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= +export E2E_SSH_KEY= +``` + +> **Important:** Always validate E2E tests locally before pushing. CI E2E +> runs provision real GPU instances on AWS, and unnecessary runs are expensive. + ## Documentation - Update relevant documentation when adding features