From 61982ef79bea109f1d327b5a2b9ed3e9aa10d613 Mon Sep 17 00:00:00 2001 From: Carlos Eduardo Arango Gutierrez Date: Mon, 30 Mar 2026 11:58:56 +0200 Subject: [PATCH 1/3] docs: add E2E test tier documentation to contributing guide Signed-off-by: Carlos Eduardo Arango Gutierrez --- docs/contributing/README.md | 143 ++++++++++++++++++++++++++++++++++++ 1 file changed, 143 insertions(+) diff --git a/docs/contributing/README.md b/docs/contributing/README.md index e6155b33a..3e87ff303 100644 --- a/docs/contributing/README.md +++ b/docs/contributing/README.md @@ -106,6 +106,149 @@ sudo mv ./bin/holodeck /usr/local/bin/holodeck - Update existing tests when modifying features - Run the full test suite with `make test` +## E2E Testing + +Holodeck's end-to-end tests run on real AWS infrastructure. They are organized +into two tiers that control when tests execute in CI. + +### E2E Test Structure + +**Smoke tier (pre-merge)** — `.github/workflows/e2e-smoke.yaml` + +Runs on every PR push. Covers two label filters: + +- `default && !rpm` — standard single-node environment without RPM distros +- `cluster && minimal` — smallest valid multinode cluster + +Each job takes roughly 20 minutes, giving fast feedback before merge. + +**Full tier (post-merge)** — `.github/workflows/e2e.yaml` + +Runs only when a commit lands on `main` +(`github.ref == 'refs/heads/main'`). Covers 13 label filters plus an +arm64 job and an integration-test job that exercises holodeck as a +GitHub Action. + +| Label filter | What it covers | +|---|---| +| `legacy` | Kubernetes using a legacy version | +| `dra` | Dynamic Resource Allocation enabled | +| `kernel` | Kernel features / custom kernel | +| `ctk-git` | Container Toolkit installed from git source | +| `k8s-git` | Kubernetes built from git (kubeadm) | +| `k8s-kind-git` | Kubernetes built from git (KIND) | +| `k8s-latest` | Kubernetes tracking master branch | +| `cluster && gpu && !minimal && !ha && !dedicated` | Standard GPU cluster | +| `cluster && dedicated` | Cluster with dedicated CPU control-plane | +| `cluster && ha` | HA cluster (3 control-plane nodes) | +| `rpm-rocky` | Rocky Linux 9 — multiple container runtimes | +| `rpm-al2023` | Amazon Linux 2023 — multiple container runtimes | +| `rpm-fedora` | Fedora 42 — multiple container runtimes | +| `arm64` | ARM64 GPU instance (g5g) — run separately | + +### Label Taxonomy + +Tests are tagged with Ginkgo `Label()` annotations. Each test can carry +multiple labels; CI selects tests using boolean filter expressions. + +**Single-node labels** (defined in `tests/aws_test.go`): + +| Label | Description | +|---|---| +| `default` | Basic AWS environment, default configuration | +| `legacy` | Legacy Kubernetes version | +| `dra` | Dynamic Resource Allocation | +| `kernel` | Custom kernel features | +| `ctk-git` | CTK from git source | +| `k8s-git` | Kubernetes from git (kubeadm) | +| `k8s-kind-git` | Kubernetes from git (KIND) | +| `k8s-latest` | Kubernetes master branch | +| `rpm` | Any RPM-based distribution | +| `rpm-rocky` | Rocky Linux 9 | +| `rpm-al2023` | Amazon Linux 2023 | +| `rpm-fedora` | Fedora 42 | +| `post-merge` | Excluded from smoke tier; full tier only | + +**Cluster labels** (defined in `tests/aws_cluster_test.go`): + +| Label | Description | +|---|---| +| `cluster` | Multinode cluster test | +| `multinode` | Two or more nodes | +| `gpu` | GPU worker nodes | +| `minimal` | Smallest valid configuration (1 CP + 1 worker) | +| `dedicated` | Dedicated CPU control-plane node | +| `ha` | High-availability control plane (3 nodes) | +| `rpm` | RPM-based cluster OS | +| `rpm-rocky` | Rocky Linux 9 cluster | +| `rpm-al2023` | Amazon Linux 2023 cluster | +| `post-merge` | Excluded from smoke tier; full tier only | + +The `post-merge` label is the mechanism that keeps a test out of the smoke +tier. The smoke workflow's label filter `"default && !rpm"` already excludes +RPM tests, but adding `post-merge` makes the intent explicit and ensures the +test is skipped by any future smoke filter that might otherwise match it. + +### How to Add New Tests + +1. **Single-node test** — add an `Entry(...)` to the `DescribeTable` in + `tests/aws_test.go`. +2. **Cluster test** — add an `Entry(...)` to `tests/aws_cluster_test.go`. +3. Create the corresponding fixture file under `tests/data/`. +4. Assign Ginkgo labels with `Label("label1", "label2", ...)` as the last + argument of the `Entry`. +5. If the test is an edge case, platform-specific variant, or is expensive + (> 30 min), add `"post-merge"` to its label list so it runs only in the + full tier. + +Example: + +```go +Entry("My New Feature Test", testConfig{ + name: "my-feature-test", + filePath: filepath.Join(packagePath, "data", "test_aws_my_feature.yml"), + description: "Tests my new feature end-to-end", +}, Label("default", "my-feature")), +``` + +### Which Tier to Target + +| Use smoke tier (no `post-merge`) | Use full tier (`post-merge`) | +|---|---| +| Core functionality every PR should validate | Edge cases and less-common paths | +| Fast tests (< 25 min) | Slow tests (> 30 min) | +| Platform-agnostic defaults | Platform-specific variants (RPM distros, arm64) | +| Minimal cluster configurations | Full-scale, HA, or dedicated cluster topologies | + +If in doubt, start with `post-merge` and promote the label out of the full +tier once the test has demonstrated stability. + +### Running E2E Tests Locally + +Use the Ginkgo label filter to select which tests to run: + +```bash +# Run only the smoke-equivalent tests +make -f tests/Makefile test GINKGO_ARGS="--label-filter='default && !rpm'" + +# Run a specific label +make -f tests/Makefile test GINKGO_ARGS="--label-filter='cluster && minimal'" + +# Run all RPM tests for Rocky 9 +make -f tests/Makefile test GINKGO_ARGS="--label-filter='rpm-rocky'" +``` + +Required environment variables: + +```bash +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= +export E2E_SSH_KEY= +``` + +See [Memory: no push without local E2E validation](../../.claude/memory/) for +the project policy on validating E2E tests before opening a PR. + ## Documentation - Update relevant documentation when adding features From eb50067f8509115409ce3ba241c7cc04f90df1e2 Mon Sep 17 00:00:00 2001 From: Carlos Eduardo Arango Gutierrez Date: Mon, 30 Mar 2026 12:14:32 +0200 Subject: [PATCH 2/3] docs: fix ordered list prefix for markdown lint (MD029) Signed-off-by: Carlos Eduardo Arango Gutierrez --- docs/contributing/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/contributing/README.md b/docs/contributing/README.md index 3e87ff303..667a92f05 100644 --- a/docs/contributing/README.md +++ b/docs/contributing/README.md @@ -193,11 +193,11 @@ test is skipped by any future smoke filter that might otherwise match it. 1. **Single-node test** — add an `Entry(...)` to the `DescribeTable` in `tests/aws_test.go`. -2. **Cluster test** — add an `Entry(...)` to `tests/aws_cluster_test.go`. -3. Create the corresponding fixture file under `tests/data/`. -4. Assign Ginkgo labels with `Label("label1", "label2", ...)` as the last +1. **Cluster test** — add an `Entry(...)` to `tests/aws_cluster_test.go`. +1. Create the corresponding fixture file under `tests/data/`. +1. Assign Ginkgo labels with `Label("label1", "label2", ...)` as the last argument of the `Entry`. -5. If the test is an edge case, platform-specific variant, or is expensive +1. If the test is an edge case, platform-specific variant, or is expensive (> 30 min), add `"post-merge"` to its label list so it runs only in the full tier. From 05c46621c6ebb59140ee1b8500218232d1b95abc Mon Sep 17 00:00:00 2001 From: Carlos Eduardo Arango Gutierrez Date: Mon, 30 Mar 2026 12:15:38 +0200 Subject: [PATCH 3/3] =?UTF-8?q?docs:=20fix=20DE=20review=20feedback=20?= =?UTF-8?q?=E2=80=94=20full=20tier=20trigger,=20arm64=20job,=20broken=20li?= =?UTF-8?q?nk?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Carlos Eduardo Arango Gutierrez --- docs/contributing/README.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/docs/contributing/README.md b/docs/contributing/README.md index 667a92f05..63921fba6 100644 --- a/docs/contributing/README.md +++ b/docs/contributing/README.md @@ -124,10 +124,9 @@ Each job takes roughly 20 minutes, giving fast feedback before merge. **Full tier (post-merge)** — `.github/workflows/e2e.yaml` -Runs only when a commit lands on `main` -(`github.ref == 'refs/heads/main'`). Covers 13 label filters plus an -arm64 job and an integration-test job that exercises holodeck as a -GitHub Action. +Runs only when a commit lands on `main` or a `release-*` branch. +Covers 13 label filters plus a separate arm64 job and an +integration-test job that exercises holodeck as a GitHub Action. | Label filter | What it covers | |---|---| @@ -144,7 +143,10 @@ GitHub Action. | `rpm-rocky` | Rocky Linux 9 — multiple container runtimes | | `rpm-al2023` | Amazon Linux 2023 — multiple container runtimes | | `rpm-fedora` | Fedora 42 — multiple container runtimes | -| `arm64` | ARM64 GPU instance (g5g) — run separately | + +The `arm64` job is a separate workflow job (not a matrix entry) that only +runs on `main`. It uses `--label-filter='arm64'` — a test must carry +`Label("arm64")` to be selected. ### Label Taxonomy @@ -246,8 +248,8 @@ export AWS_SECRET_ACCESS_KEY= export E2E_SSH_KEY= ``` -See [Memory: no push without local E2E validation](../../.claude/memory/) for -the project policy on validating E2E tests before opening a PR. +> **Important:** Always validate E2E tests locally before pushing. CI E2E +> runs provision real GPU instances on AWS, and unnecessary runs are expensive. ## Documentation