Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
4521c22
Add CUE v0.16.0 to y-bin runner tools
Apr 2, 2026
887d317
qemu provisioner: delete disk on teardown by default
Apr 2, 2026
dc6757f
Add kubectl-yconverge to ystack with converge-mode support
Apr 2, 2026
085d366
Draft CUE-based converge DAG design
Apr 2, 2026
f3e61d8
Implement CUE converge engine with y-k8s.cue for all k3s modules
Apr 2, 2026
b63205b
Replace y-cluster-converge-ystack with CUE engine invocation
Apr 2, 2026
bbfbae7
Document yconverge.cue integration and y-kustomize refresh tracking
Apr 2, 2026
f9e5aca
Implement yconverge.cue design: rename, auto-check, --skip-checks
Apr 2, 2026
ae8c32d
Add itest suite and implement prechecks/postchecks in kubectl-yconverge
Apr 2, 2026
a670b9a
Support multiple -k args in kubectl-yconverge
Apr 2, 2026
fcf15a8
Add y-cue to runner.Dockerfile, add itest to CI workflow
Apr 2, 2026
0998b7b
Add itest for converge-mode labels and empty selector handling
Apr 3, 2026
a5b824a
Remove enabled:false handling from kubectl-yconverge
Apr 3, 2026
147ce44
Add tool binary prereq check to itest, fix GHA failures
Apr 3, 2026
508625e
Set YSTACK_HOME in GHA itest job for binary downloads
Apr 3, 2026
173a5aa
Simplify schema per PR review, restore #Wait/#Rollout with namespace
Apr 3, 2026
0ae85e2
Add namespaceGuess field, resolve from CLI/-n/kustomization/context
Apr 3, 2026
83e225d
Resolve namespaceGuess from referenced base on indirection
Apr 3, 2026
4fbb49f
Fail on broken yconverge.cue instead of silently skipping checks
Apr 3, 2026
d856bf1
Simplify itest to oneliner kubectl-yconverge calls
Apr 3, 2026
88a582a
Prefix all itest output with [cue itest]
Apr 3, 2026
538710c
Add output assertions via tee to itest
Apr 3, 2026
f2759be
Drop multi -k support from kubectl-yconverge
Apr 3, 2026
cf79ad0
Remove explicit namespace from checks where namespaceGuess suffices
Apr 3, 2026
10081e6
Move dependency resolution into kubectl-yconverge, remove CUE engine
Apr 4, 2026
825cdba
Move cue/ to yconverge/, update all import paths
Apr 5, 2026
ba2ddc7
Rename converge package to verify: verify.#Step
Apr 5, 2026
fade800
Add --keep flag to itest, reject unknown flags
Apr 5, 2026
f465252
Use ephemeral kubeconfig for itest, stable path with --keep
Apr 5, 2026
46b38c1
Fix kubeconfig for kubie compatibility
Apr 5, 2026
02356da
Add --teardown to itest for cleaning up kept clusters
Apr 5, 2026
c5f4249
esbuild 0.25.11->0.28.0
solsson Apr 8, 2026
63e01fe
see https://github.com/solsson/turbo/pull/1
solsson Apr 9, 2026
190557f
Merge branch 'turbo-fork-hashdepends' into converge-dag
solsson Apr 13, 2026
f95811f
yconverge: basic prod/qa reuse example
solsson Apr 5, 2026
8b68256
yconverge: mode selector, inlined dep walker, shared checks, replace …
solsson Apr 15, 2026
f487023
Rename CI workflow from "lint" to "checks"
solsson Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion .github/workflows/lint.yaml → .github/workflows/checks.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: lint
name: checks

on:
push:
Expand Down Expand Up @@ -26,3 +26,23 @@ jobs:
with:
key: script-lint-${{ github.ref_name }}-${{ github.run_id }}
path: ~/.cache/ystack

itest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/cache/restore@v4
with:
key: itest-${{ github.ref_name }}-
restore-keys: |
itest-main-
path: ~/.cache/ystack
- name: Integration tests (yconverge framework)
run: yconverge/itest/test.sh
env:
YSTACK_HOME: ${{ github.workspace }}
PATH: ${{ github.workspace }}/bin:/usr/local/bin:/usr/bin:/bin
- uses: actions/cache/save@v4
with:
key: itest-${{ github.ref_name }}-${{ github.run_id }}
path: ~/.cache/ystack
6 changes: 3 additions & 3 deletions .github/workflows/images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ on:
- main

jobs:
lint:
uses: ./.github/workflows/lint.yaml
checks:
uses: ./.github/workflows/checks.yaml
docker:
needs: lint
needs: checks
runs-on: ubuntu-latest
permissions:
packages: write
Expand Down
258 changes: 258 additions & 0 deletions TODO_CONVERGE_DAG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
# Converge DAG: CUE-based cluster convergence

## Problem

Dependencies between backends and modules are implicit in script ordering
(MANUAL_STEPS_FOR_NEW_SITES, y-site-upgrade, y-cluster-converge-dev).
Adding or reordering a module means editing a bash script.

## Design

Every kustomize base that can be applied with `kubectl yconverge -k`
is a **step**. Each step declares its readiness via **checks**.
Dependencies between steps are expressed as **CUE imports** —
importing another step's package makes it a precondition.

The dependency graph is the CUE import graph.
A `cue cmd converge` walks it in topological order.

## ystack provides

Schema in `cue/converge/schema.cue`:

```cue
package converge

// A convergence step: apply a kustomize base, then verify.
#Step: {
// Path to kustomize directory, relative to repo root.
kustomization: string
// Namespace override. If unset, kustomization must set it.
namespace?: string
// Checks that must pass after apply (and that downstream steps
// use as preconditions by importing this package).
// Empty list means no checks — the step is ready after apply.
checks: [...#Check]
}

// Check is a discriminated union. Each variant maps to a kubectl
// subcommand that manages its own timeout and output.
#Check: #Wait | #Rollout | #Exec

// Thin wrapper around kubectl wait.
// Timeout and output are managed by kubectl.
#Wait: {
kind: "wait"
resource: string // e.g. "pod/redpanda-0" or "job/setup-topic"
for: string // e.g. "condition=Ready" or "condition=Complete"
namespace?: string
timeout: *"60s" | string
description: *"" | string
}

// Thin wrapper around kubectl rollout status.
// Timeout and output are managed by kubectl.
#Rollout: {
kind: "rollout"
resource: string // e.g. "deploy/gateway-v4" or "statefulset/redpanda"
namespace?: string
timeout: *"60s" | string
description: *"" | string
}

// Arbitrary command for checks that don't map to kubectl builtins.
// The engine retries until timeout.
#Exec: {
kind: "exec"
command: string
timeout: *"60s" | string
description: string
}
```

## Validation

`cue vet` validates that every `y-k8s.cue` file conforms to the schema.
This runs without a cluster — it's a static check on the declarations.

```
y-cue vet ./...
```

This catches: missing required fields, wrong check types, invalid
timeout formats, typos in field names (CUE is closed by default —
unknown fields are errors).

CI can run `cue vet` to ensure all modules comply before merge.

## Engine

The engine in `cue/converge/converge_tool.cue` translates checks
to kubectl commands:

```cue
// #Wait -> kubectl wait --for=$for --timeout=$timeout $resource [-n $namespace]
// #Rollout -> kubectl rollout status --timeout=$timeout $resource [-n $namespace]
// #Exec -> retry with $timeout: sh -c $command
```

`kubectl wait` and `kubectl rollout status` handle their own polling
and timeout — the engine just propagates the timeout value and
passes through stdout/stderr.

For `#Exec` checks the engine manages the retry loop.

`kubectl-yconverge` handles the apply modes (create, replace,
serverside, serverside-force, regular).
The engine does not need to know about apply strategies.

## Modules provide

Each module has a `y-k8s.cue` that declares its step and checks.
A module with no checks is valid — it just declares the kustomization.

Example `kafka-v3/y-k8s.cue` (backend with rollout check):

```cue
package kafka_v3

import "yolean.se/ystack/cue/converge"

step: converge.#Step & {
kustomization: "cluster-local/kafka-v3"
checks: [
{kind: "rollout", resource: "statefulset/redpanda", namespace: "kafka", timeout: "120s"},
{kind: "exec", command: "kubectl exec -n kafka redpanda-0 -- rpk cluster info", description: "redpanda cluster healthy"},
]
}
```

Example `cluster-local/mysql/y-k8s.cue` (backend, no checks needed):

```cue
package mysql

import "yolean.se/ystack/cue/converge"

step: converge.#Step & {
kustomization: "cluster-local/mysql"
checks: []
}
```

Example `gateway-v4/y-k8s.cue` (module with dependencies):

```cue
package gateway_v4

import (
"yolean.se/ystack/cue/converge"
"yolean.se/checkit/cluster-local/kafka-v3"
"yolean.se/checkit/keycloak-v3"
)

// Importing kafka_v3 and keycloak_v3 makes their checks
// preconditions for this step. The engine ensures they
// converge and pass before applying gateway-v4.

step: converge.#Step & {
kustomization: "gateway-v4/site-apply-namespaced"
checks: [
{kind: "rollout", resource: "deploy/gateway-v4", namespace: "dev"},
{kind: "wait", resource: "job/setup-topic-gateway-v4-userstate", namespace: "dev", for: "condition=Complete"},
]
}
```

## Dependency resolution

The engine collects all `y-k8s.cue` files, inspects their imports,
and builds a topological sort. A step runs only after all imported
steps have converged and their checks pass.

Import cycles are a CUE compile error — no runtime cycle detection needed.

## Namespace binding

`y-site-generate` determines which modules a site needs and which
namespace they target. The CUE engine receives `--context=` and
site name as inputs. Namespace is either set in the kustomization
or passed as a CUE value that templates into check commands.

## Convergence is cheap

`kubectl yconverge -k` is idempotent. Re-running a fully converged
step is a no-op (unchanged resources) followed by passing checks.
This means:

- No "has this been applied" state tracking
- Re-running after a failure retries only what's needed
- Checks serve double duty: post-apply verification AND
precondition for downstream steps

## CLI surface

```
y-cue cmd converge --context=local dev # full site
y-cue cmd converge --context=local dev gateway-v4 # one module + deps
y-cue cmd check --context=local dev # checks only, no apply
y-cue vet ./... # validate all y-k8s.cue
```

## Proposed: yconverge.cue integration with kubectl-yconverge

Rename `y-k8s.cue` to `yconverge.cue`. The only valid location is
next to a `kustomization.yaml` file.

When `kubectl yconverge -k <dir>` completes with exit 0, it looks for
`yconverge.cue` in `<dir>/`. If found, it invokes the framework to
run that step's checks. This means any script that uses `kubectl yconverge`
automatically gets check verification — no separate orchestration needed.

One level of `resources:` indirection: if the kustomization has exactly
one `resources:` item pointing to a local directory, and the current
directory has no `yconverge.cue`, look for `yconverge.cue` in that
resource directory. This handles the common pattern where
`cluster-local/kafka-v3/kustomization.yaml` has `resources: [../../kafka-v3/cluster-backend]`
— the checks can live in `kafka-v3/cluster-backend/yconverge.cue`.

### Trade-off

Pro: Breaks up monolithic provision scripts. Any `kubectl yconverge -k`
call becomes self-validating. No separate engine invocation needed.

Con: Adds checking overhead to every `kubectl yconverge` call. In
`y-site-upgrade` which converges many modules in sequence, each apply
would trigger CUE evaluation + checks. Mitigation: checks should be
fast (rollout status and kubectl wait already return quickly when
resources are already ready). Could add `--skip-checks` flag for
batch operations that do their own validation.

## y-kustomize refresh tracking

When y-kustomize serves content from secrets mounted as volumes,
it needs a restart when those secrets change. Currently handled by
an explicit action in `40-kafka-ystack`.

Proposed: y-kustomize stores a hash of its secret contents as an
annotation on its own deployment:

```
yolean.se/y-kustomize-secrets-hash: sha256:<hash>
```

After any step applies secrets in the ystack namespace matching
`y-kustomize.*`, the engine computes the current hash and compares
to the annotation. Restart only on mismatch. This makes re-converge
of a fully converged cluster skip the restart entirely.

## Migration path

1. Add schema to ystack `cue/converge/`
2. Write `y-k8s.cue` for ystack backends (kafka, blobs, builds-registry)
3. Write `y-k8s.cue` for checkit backends (mysql, keycloak-v3)
4. Write `y-k8s.cue` for a few site modules (gateway-v4, events-v1)
5. `cue cmd converge --context=local dev` replaces
`y-cluster-provision-first-site dev` for local clusters
6. Extend to `y-site-upgrade` by adding upgrade-specific checks
7. Extend to non-local clusters by parameterizing context and namespace
Loading
Loading