From 2f5fb14708dc73c46602aa2e9abe5ce91c0bacb4 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Thu, 9 Apr 2026 13:51:19 -0400 Subject: [PATCH] docs: add runner refresh ADR and plan --- .../024-runner-base-refresh-from-upstream.md | 229 ++++++++ ...6-04-09-128-runner-update-from-upstream.md | 515 ++++++++++++++++++ 2 files changed, 744 insertions(+) create mode 100644 docs/decisions/024-runner-base-refresh-from-upstream.md create mode 100644 docs/plans/2026-04-09-128-runner-update-from-upstream.md diff --git a/docs/decisions/024-runner-base-refresh-from-upstream.md b/docs/decisions/024-runner-base-refresh-from-upstream.md new file mode 100644 index 0000000..88a614b --- /dev/null +++ b/docs/decisions/024-runner-base-refresh-from-upstream.md @@ -0,0 +1,229 @@ +# ADR-024: Runner Base Refresh from Upstream Sources + +**Date:** 2026-04-09 +**Status:** Draft +**Amends:** ADR-022 §3 (build-time runner base freshness moves from `claw build` to `claw pull`, narrowly) +**Depends on:** ADR-010 (CLI Surface Simplification), ADR-022 (Infrastructure Image Lifecycle and the Four-Verb Operator Surface) +**Implementation:** `docs/plans/2026-04-09-128-runner-update-from-upstream.md` + +## Context + +ADR-022 fixed runtime infra freshness with pinned, published images on `ghcr.io/mostlydev`. That model does not extend to built-in runner bases (`openclaw`, `microclaw`, `nullclaw`, `nanobot`, `picoclaw`, `nanoclaw-orchestrator`): + +- Each driver's `BaseImage()` (`internal/driver//baseimage.go`) returns an inline Dockerfile and a synthetic local tag like `openclaw:latest`. +- `internal/build/build.go:ensureBaseImage` only auto-builds the base image when it is missing locally (`build.go:63`). Once built, the local tag is treated as authoritative indefinitely. +- `claw pull` skips `build:` services entirely (ADR-022 §3, "without exception"), so it never refreshes runner bases. +- `claw build` runs `docker build` without `--pull` or `--no-cache` (`build.go:131`), so even rebuilding a Clawfile reuses any stale runner base layer that already exists. +- `docker pull openclaw:latest` fails with `repository does not exist` because `openclaw:latest` is not a registry artifact. + +The operator question — *"how do I get the latest OpenClaw onto Tiverton?"* — has no good answer in current `claw`. This already bit the live trading pod. + +An alternative direction would close the gap by introducing an explicit `claw runners update` top-level verb. That direction is internally defensible (runner refresh is mechanically a local build, not a network pull, so the verb should match the operation). This ADR proposes a different direction. + +## The trust argument against publishing runner bases + +ADR-022's published infra surface (`cllama`, `claw-api`, `clawdash`, `claw-wall`, `hermes-base`) consists of *mostlydev source code* compiled into images. Trusting `ghcr.io/mostlydev/cllama` is the same trust decision as trusting the Go source in this repo — there is no third-party software involved that the operator might want to verify independently. + +Built-in runner bases are different. Each one packages an upstream third-party harness: + +| Driver | Upstream package | +|---|---| +| openclaw | `https://openclaw.ai/install.sh` | +| microclaw | upstream microclaw distribution | +| nullclaw | upstream nullclaw distribution | +| nanobot | upstream nanobot distribution | +| picoclaw | `docker.io/sipeed/picoclaw` | +| nanoclaw-orchestrator | `https://github.com/qwibitai/nanoclaw.git` | + +If mostlydev publishes `ghcr.io/mostlydev/openclaw-base:v0.5.2`, an operator who installs OpenClaw via that path is trusting *mostlydev's repackaging* of OpenClaw. They have to take it on faith that the bytes inside the published image are exactly what the upstream installer would produce. Some operators will not want to grant that trust — and they should not have to, because the recipe is short, fully described in `baseimage.go`, and trivially reproducible from upstream sources on the operator's own machine. + +The trust model that fits this surface is **the Homebrew model**, not the registry model: + +- Mostlydev ships the *recipe* (the inline Dockerfile in `baseimage.go`). +- The operator's machine builds the runner base locally from upstream sources. +- The result is provably "what upstream said today, plus a thin mostlydev integration shim that the operator can audit in the recipe." + +Hermes-base remains pinned and published per ADR-022 §3, deliberately. Hermes-base is a mostlydev compatibility-shim image (`patch-hermes-runtime.py` in `dockerfiles/hermes-base/` is mostlydev source code), so trusting our published image is identical to trusting our Go source. The trust argument above does not apply. + +## Relationship to ADR-022 + +This ADR **amends ADR-022 §3 narrowly**. ADR-022 §3 says: + +> Build-time base images stay with `claw build`. ... Services with `build:` blocks are skipped by `claw pull`, without exception. + +That rule was justified by an anti-collision concern: preventing a registry `docker pull` from silently overwriting a locally built service image that happens to share a tag. **The anti-collision rationale is preserved intact** by this ADR, because runner-base refresh is a `docker build` of a recipe producing a local tag that never exists in any registry — there is no collision surface to begin with. + +What changes: `claw pull` now refreshes the *build-time runner base* of a pod's `build:` services, while still leaving the service images themselves to `claw build`. The split becomes: + +- **`claw pull`** — pinned runtime infra (unchanged) + runner base refresh (new, narrow amendment) +- **`claw build`** — pod service image compilation (unchanged) +- **`claw up`** — authority on staleness (unchanged semantics, enriched drift signal) + +The alternative would be introducing a new top-level verb (`claw runners update`), which preserves ADR-022 §3 verbatim but breaks ADR-022 §2's explicit "no new top-level verbs are introduced" promise. This ADR chooses to amend §3 narrowly rather than expand §2. Both are amendments; the choice is between amending §2 (verb surface) or §3 (pull scope). Amending §3 keeps the operator-visible surface smaller, so we choose it. + +## Decision + +### 1. Runner bases are built locally from upstream, never pinned by Clawdapus + +The five synthetic-tag drivers (`openclaw`, `microclaw`, `nullclaw`, `nanobot`, `picoclaw`, `nanoclaw-orchestrator`) keep their inline `BaseImage()` Dockerfiles. Clawdapus does not publish runner base images for these drivers and does not pin upstream runner versions in the binary's release manifest. + +`hermes-base` continues to be pinned and published per ADR-022 §3, as the deliberate exception. + +### 2. `claw pull` becomes the runner base refresh verb, with an explicit mode matrix + +`claw pull` is the sole verb for runner base freshness. Its input modes are: + +| Invocation | Behavior | +|---|---| +| `claw pull --file ` or `claw pull ` (positional) | **Pod mode**: pull pinned infra + refresh runner bases needed by the pod's `build:` services | +| `claw pull` with `claw-pod.yml` in cwd (no args) | **Pod mode via auto-resolution** — unchanged from current behavior (`cmd/claw/image_lifecycle.go:137`) | +| `claw pull ` | **Single-Clawfile mode**: resolve the input with the same single-Clawfile rules as `claw build `, then refresh that driver's runner base | +| `claw pull` with no args and no pod in cwd | **Bare mode**: core infra pull (unchanged) + refresh any *locally-tagged* managed runner aliases | + +Disambiguation preserves the repo's existing authoring behavior rather than narrowing it. `--file` remains pod-only. Positional `.yml`/`.yaml` inputs stay pod mode. Any other positional input is resolved using the same single-Clawfile rules that `claw build ` already uses: directories resolve to `/Clawfile`, filenames starting with `Clawfile` (including flat-layout names like `Clawfile.westin` and example files like `Clawfile.nanoclaw`) are treated as Clawfiles, and other custom paths continue to work if they already pass Clawfile detection. This avoids breaking existing flat-layout projects and custom-named Clawfiles just to support runner refresh. + +The bare mode's "refresh locally-tagged managed aliases" is deliberately lazy: on a fresh machine with no runner aliases locally, it is a no-op for runner bases (you only refresh what you're already using). This keeps `claw pull` cheap as a sanity-check command and makes single-Clawfile authoring workflows smooth: after refreshing once with `claw pull my.Clawfile`, subsequent `claw pull` invocations from any directory keep that alias current. + +For each runner driver that `claw pull` refreshes, it: + +1. Builds the inline `BaseImage()` Dockerfile with `docker build --pull --no-cache`. The `--pull` flag forces Docker to fetch the upstream FROM image (e.g., `node:22-slim`) fresh from Docker Hub. The `--no-cache` flag forces every `RUN` instruction to re-execute, so `curl https://openclaw.ai/install.sh | bash` actually re-runs against the current upstream installer. +2. Runs the driver's *version probe* inside the freshly built image to extract the upstream runner version (e.g., `openclaw 0.5.2` → `0.5.2`). +3. Computes the image ID (`docker inspect --format '{{.Id}}'`) and the recipe SHA (sha256 of the inline Dockerfile content). +4. Tags the result as **both** `:v` and `:latest` in the local Docker daemon. +5. Prints a one-line operator-visible upgrade message: `openclaw: installed v0.5.2 (was v0.5.0)`. + +For drivers that do not implement the version probe, the fallback tag is `:built-YYYYMMDD-` — the build date plus the first 12 characters of the image ID. The image-ID suffix prevents same-day collisions between multiple refreshes. + +### 3. `claw build` rewrites `FROM` and stamps three provenance labels + +When `clawfile.Emit` produces `Dockerfile.generated`, it replaces `FROM :latest` with `FROM :v` where `` is a known runner driver alias. The version is resolved from the local `:latest` tag's sibling version-prefixed tag in `RepoTags`. + +`claw build` also injects three labels into the generated Dockerfile: + +```dockerfile +LABEL claw.runner.built-against="openclaw:v0.5.2" +LABEL claw.runner.image-id="sha256:abc123def456..." +LABEL claw.runner.recipe-sha="sha256:789abc..." +``` + +- **`built-against`** — human-readable upstream version tag. Used in operator-facing hint messages. This is what the operator sees when they ask "what version of OpenClaw is this service built against?" +- **`image-id`** — the runner base image's Docker image ID at build time. This is the **strong drift fingerprint**. Two refreshes that produce the same version string but different image IDs (e.g., upstream re-released `0.5.2` with patches, or the same-day fallback built twice) are detected as drift. Drift comparison uses *this label*, not the version string. +- **`recipe-sha`** — sha256 of the inline `BaseImage()` Dockerfile at the moment the base was built. Detects when mostlydev edits the recipe itself (e.g., switches base image from `node:22-slim` to `node:24-slim`). Tertiary metadata; not used for default drift detection, but available to `claw inspect` and future tooling. + +The generated artifact is self-describing: `cat Dockerfile.generated` tells the operator exactly which upstream OpenClaw version the service image was built against, and inspecting the service image gives a cryptographically strong provenance trail. + +If the local runner base is missing or the version cannot be resolved, `claw build` fails closed with a remediation message that matches the caller's invocation shape: + +- Called from a pod context: `run: claw pull` (or `claw pull ` if `-f` was used) +- Called as `claw build `: `run: claw pull ` + +Both of these commands are honored by the mode matrix in §2, so the remediation always leads to a command that can fix the problem. + +### 4. `claw up` surfaces drift as a soft hint, using image IDs + +When `claw up` validates pod service images, it inspects each one for the `claw.runner.image-id` label and compares it to the current local `:latest` image ID. If the IDs differ, `claw up` prints: + +``` +analyst: built against openclaw v0.5.0 (image abc123def456), current is v0.5.2 (image 789abcd...) — consider running: claw build +``` + +This is informational, not fail-closed. The older runner base is still functionally valid — the operator may have intentionally avoided rebuilding. Service images without any `claw.runner.*` labels (built by older `claw` binaries) produce no drift hint; default `claw up` treats them as not-yet-migrated. `claw up --fix` *will* rebuild them to pick up provenance labels. + +**Epistemic boundary.** Clawdapus **cannot honestly know** that the local runner alias is older than upstream latest without an explicit refresh. `claw up` only compares image IDs that are already present locally. If an operator has not run `claw pull` recently, `claw up` will report "no drift" even if upstream has shipped a new release in the meantime. Upstream freshness is an explicit operator action, always gated by `claw pull`. This boundary is deliberate: we do not want `claw up` to probe upstream sources on every pod start, and we do not want it to claim knowledge it does not have. + +If the runner base is *missing entirely* (not just drifted), that is a hard failure with the same remediation as ADR-022's strict mode: `run: claw pull`. + +### 5. The `BaseImageProvider` interface gains optional siblings + +```go +// RunnerBaseProvider is optionally implemented by drivers whose base image is +// built from upstream sources rather than pulled from a pinned registry tag. +// Implementing this interface signals that claw pull should refresh the base +// against fresh upstream sources, and that claw build should rewrite +// FROM :latest to FROM :v at emit time. +type RunnerBaseProvider interface { + BaseImageProvider + RunnerAlias() string +} + +// RunnerVersionProber is optionally implemented by RunnerBaseProvider drivers +// that can report the installed upstream runner version. Drivers that do not +// implement this interface fall back to a build-date-plus-image-ID tag. +type RunnerVersionProber interface { + RunnerVersionProbe() []string +} +``` + +Hermes implements neither interface (its base image is pinned per ADR-022). The five synthetic-tag drivers implement `RunnerBaseProvider`, with `RunnerVersionProber` opted in per-driver after the probe is verified against the driver's installed toolchain (see plan §2). + +### 6. Contributor-only repackaging is unchanged + +Contributors hacking on `baseimage.go` can still rebuild a runner base manually with `docker build`. There is no separate dev tooling. The new mechanism is purely additive — it gives end users a refresh path through `claw pull` without taking anything away. + +## Consequences + +**Positive:** + +- Operators get an intuitive refresh verb (`claw pull`) without Clawdapus assuming the role of upstream packager for third-party harnesses. +- The trust boundary is honest: the operator trusts the recipe in `baseimage.go` (auditable Go source) and the upstream installer URL (third-party trust they already accept). Mostlydev never sits between. +- Pod images carry both human-readable version strings and a strong image-ID provenance stamp — the operator can answer "what version?" *and* drift detection survives upstream version-string instability. +- Drift detection works even when upstream re-releases the same version string with different bytes, because comparison is by image ID. +- Same-day fallback tags do not collide, because the fallback tag includes the image-ID suffix. +- No new publishing infrastructure, no new ghcr.io packages, no new manifest entries. +- The four-verb operator surface is preserved (ADR-022 §2). The amendment is entirely contained in `claw pull`'s internals; no new top-level verb. +- The single-Clawfile authoring path works end-to-end: `claw pull my.Clawfile` → `claw build my.Clawfile` → container runs. + +**Negative:** + +- `claw pull` becomes slower for pods with `build:` services. A clean OpenClaw rebuild downloads `node:22-slim` and runs the install script — minutes, not seconds. `claw pull` prints a "this may take a few minutes" warning at the start of each refresh phase. +- Reproducibility *across* `claw pull` runs is limited by upstream source stability. Two operators running `claw pull` on different days may get different versions. This is the explicit cost of not pinning. +- Runner refresh depends on upstream availability. If `openclaw.ai` is down, `claw pull` fails — but no worse than today, just more visible. +- `clawfile.Emit` couples (loosely) to local Docker state, since it needs the provenance info for FROM rewriting and label injection. Mitigated by passing the resolved provenance as an explicit parameter — the function stays pure if called with nil provenance. + +**Amends ADR-022 §3:** + +- `claw pull` no longer skips build-time runner bases "without exception" — it refreshes them. The anti-collision rationale of §3 is preserved (no registry collision surface exists for locally built bases) but the letter of the rule is narrowed. + +**Breaking / behavioral changes:** + +- `Dockerfile.generated` now contains `FROM openclaw:v0.5.2` instead of `FROM openclaw:latest`. Tests in `internal/clawfile/emit_test.go` that assert on literal `:latest` need updating. +- Pod images built before the upgrade lack the three provenance labels. On the next `claw up --fix`, they'll rebuild because the labels are missing. Default `claw up` prints nothing special for unlabeled images (treated as not-yet-migrated) and continues. + +**Risks:** + +- A driver's version probe could break silently if upstream changes `--version` output format. Mitigated by: (a) the per-driver probe verification step (see plan §2), which locks the parser against a captured sample; (b) the image-ID drift check, which is format-independent and catches changes even if the version parser silently degrades to the fallback tag. +- An operator who runs `claw pull` without `claw build` afterwards leaves their pod images stale relative to the refreshed runner base. The soft hint in `claw up` mitigates this but does not prevent it. + +## Migration + +There is no publishing pipeline to set up. The migration is entirely in-tree code: + +1. Add the `RunnerBaseProvider` and `RunnerVersionProber` interfaces in `internal/driver/types.go`. +2. Implement them in each of the six runner driver `baseimage.go` files, **verifying each probe against the driver's installed toolchain** (plan §2). +3. Add `RefreshRunnerBase` to `internal/build/build.go`. +4. Extend `cmd/claw/pull.go` with the three-mode dispatch (pod / Clawfile / bare). +5. Extend `internal/clawfile/emit.go` to take a runner-provenance struct and rewrite FROM lines + inject three labels. +6. Update `internal/build/build.go:Generate` to resolve provenance from local `docker image inspect` and pass it to emit. +7. Extend `cmd/claw/compose_up.go` to read `claw.runner.image-id` labels and emit the soft drift hint. +8. Update tests: `clawfile/emit_test.go`, `cmd/claw/pull_test.go`, `internal/build/build_test.go`. +9. Update operator-facing docs: `AGENTS.md`, `README.md`, `site/guide/cli.md`, regenerate `cmd/claw/skill_data/SKILL.md`. + +The implementation plan in `docs/plans/2026-04-09-128-runner-update-from-upstream.md` walks each step. + +## Alternatives Considered + +1. **An explicit `claw runners update` verb.** Preserves ADR-022 §3's "pull skips build:" rule verbatim, at the cost of introducing a new top-level verb that breaks ADR-022 §2's "no new top-level verbs" promise. **Rejected** because amending §3 narrowly (with the anti-collision spirit preserved) is less operator-visible than amending §2. + +2. **Publish and pin runner base images.** Internally consistent with ADR-022. **Rejected on the trust argument** — making mostlydev the implicit packaging authority for upstream third-party harnesses is the wrong relationship. + +3. **Add a new top-level verb `claw runners refresh`** without subcommand structure. **Same rejection as the explicit `claw runners update` verb above.** + +4. **Force `claw build` to always run with `--pull --no-cache`.** Simplest possible change. **Rejected** because it conflates pod-image compilation (fast, frequent) with runner-base refresh (slow, rare). + +5. **Single-label provenance via version string only.** Considered in an earlier draft of this ADR. **Rejected** because semver is not content-stable: upstream can re-release the same version with patches, and the same-day fallback tag collides across multiple refreshes. Image-ID comparison is format-independent and collision-free, so we adopt a three-label scheme where the version string is operator-facing and the image ID is the drift fingerprint. + +6. **Cache the resolved version in a local state file** (`~/.claw/runner-versions.json`). **Rejected** because Docker already manages the local image state; a parallel state file creates divergence risk. + +7. **Stamp the version label inside the Dockerfile during build with `--label`.** **Rejected** because the version is not known until the install completes — `--label` requires the value at `docker build` invocation time. The probe-then-tag-then-relabel-at-emit approach is the correct ordering. + +8. **Make `claw pull` with no args refresh *all* managed runner aliases unconditionally.** Considered as a simpler bare mode. **Rejected** because it would force 30+ minutes of refresh on a fresh machine for users who only care about one driver. The "refresh locally-tagged aliases only" bare mode is cheap on fresh machines and honest about what the operator is already using. diff --git a/docs/plans/2026-04-09-128-runner-update-from-upstream.md b/docs/plans/2026-04-09-128-runner-update-from-upstream.md new file mode 100644 index 0000000..8c1f636 --- /dev/null +++ b/docs/plans/2026-04-09-128-runner-update-from-upstream.md @@ -0,0 +1,515 @@ +# Implementation Plan: Issue #128 — Runner Refresh from Upstream + +**Date:** 2026-04-09 +**Status:** Draft (alternative) +**Issue:** #128 +**ADR:** `docs/decisions/024-runner-base-refresh-from-upstream.md` +**Alternative considered:** an explicit `claw runners update` top-level verb + +## Goal + +Make `claw pull` the operator-facing refresh verb for built-in runner bases, building each base from clean upstream sources on the local machine, surfacing the resolved upstream version explicitly, rewriting `Dockerfile.generated` to reference the explicit version tag, and stamping service images with human-readable + image-ID + recipe-SHA provenance. + +No mostlydev-published runner images. No release manifest entries. No new publishing workflows. + +## Shipping target + +This is a single focused implementation session **after** a per-driver probe verification pass. The surface is small (~9 files, ~500 lines net), but the probe verification is a blocker — each driver's `RunnerVersionProbe` must be validated against the driver's actual installed toolchain before the probe table in §2 is final. "Lands in one PR" is realistic; "ships without per-driver verification" is not. + +## Current-state verification + +- `internal/driver/openclaw/baseimage.go:3` — synthetic tag `openclaw:latest` +- `internal/driver/microclaw/baseimage.go` — synthetic tag `microclaw:latest` +- `internal/driver/nullclaw/baseimage.go` — synthetic tag `nullclaw:latest` +- `internal/driver/nanobot/baseimage.go` — synthetic tag `nanobot:latest` +- `internal/driver/picoclaw/baseimage.go:3` — synthetic tag `picoclaw:latest`, base is `docker.io/sipeed/picoclaw:latest` +- `internal/driver/nanoclaw/baseimage.go:3` — synthetic tag `nanoclaw-orchestrator:latest`; recipe installs `ca-certificates git python3 make g++` in builder and `ca-certificates git procps tini` in runtime (**note: no jq**) +- `internal/driver/hermes/baseimage.go:5` — already pinned to `ghcr.io/mostlydev/hermes-base:v2026.3.17` (out of scope) +- `internal/build/build.go:57-84` — `ensureBaseImage` only auto-builds when missing locally; never refreshes +- `internal/build/build.go:131` — `BuildFromDockerfileContent` runs plain `docker build`, no `--pull --no-cache` +- `internal/clawfile/emit.go:10` — `Emit(result *ParseResult) (string, error)` is pure-Go, copies FROM lines verbatim +- `cmd/claw/pull.go:9-27` — `runPull` has three code paths: no pod (`pullCoreInfraImages`), pod mode (infra + registry service pulls). Auto-resolves `claw-pod.yml` in cwd via `resolveOptionalPodFile`. +- `cmd/claw/image_lifecycle.go:137` — `resolveOptionalPodFile` returns `(path, true, nil)` when `claw-pod.yml` exists in cwd, `("", false, nil)` otherwise. +- `cmd/claw/image_lifecycle.go:281-348` — `requiredPodPullInfraSpecs` computes which infra a pod needs + +The current behavior: a machine that has built `openclaw:latest` once will use that stale image forever, with no `claw` verb that updates it. + +## Target operator flow + +Pod mode: + +``` +$ claw pull +[claw] pulling pinned infra (claw-api v0.8.0, cllama v0.3.3, ...) +[claw] refreshing runner bases for pod (1 driver: openclaw) +[claw] openclaw: building base from upstream (this may take a few minutes) +[claw] FROM node:22-slim (pulled fresh) +[claw] curl https://openclaw.ai/install.sh ... +[claw] openclaw: installed v0.5.2 (was v0.5.0) +[claw] tagged: openclaw:v0.5.2, openclaw:latest +[claw] pull complete +``` + +Single-Clawfile authoring mode: + +``` +$ claw pull my.Clawfile +[claw] refreshing runner base for clawfile (driver: openclaw) +[claw] openclaw: building base from upstream ... +[claw] openclaw: installed v0.5.2 +[claw] pull complete +``` + +Bare mode (refresh locally-tagged only): + +``` +$ claw pull +[claw] pulling pinned infra (...) +[claw] refreshing locally-tagged managed runner aliases (1: openclaw) +[claw] openclaw: already at v0.5.2 +[claw] pull complete +``` + +After a fresh `claw pull`, if the operator forgot to rebuild: + +``` +$ claw up -d +[claw] analyst: built against openclaw v0.5.0 (image abc123def456), current is v0.5.2 (image 789abcd...) — consider running: claw build +[claw] pod up +``` + +## Ordered work breakdown + +### 1. Add the optional driver interfaces + +**File:** `internal/driver/types.go` + +Append after `BaseImageProvider`: + +```go +// RunnerBaseProvider is optionally implemented by drivers whose base image is +// built from upstream sources rather than pulled from a pinned registry tag. +// Implementing this interface signals to claw pull that the base should be +// refreshed against fresh upstream sources, and to claw build that +// FROM :latest should be rewritten to FROM :v at emit time. +type RunnerBaseProvider interface { + BaseImageProvider + RunnerAlias() string +} + +// RunnerVersionProber is optionally implemented by RunnerBaseProvider drivers +// that can report the installed upstream runner version. The returned command +// is run inside the freshly built base image; stdout is parsed as the version +// string. Drivers that do not implement this interface fall back to a +// build-date-plus-image-ID tag like "built-20260409-abc123def456". +type RunnerVersionProber interface { + RunnerVersionProbe() []string +} +``` + +No changes to the existing `Driver` interface or `BaseImageProvider`. + +### 2. Per-driver probe verification (BLOCKING prerequisite) + +Each driver that opts into `RunnerVersionProber` must have its probe command verified against the driver's actual recipe before the probe is committed. Verification procedure: + +1. Build the driver's current `BaseImage()` Dockerfile manually: `docker build --pull --no-cache -t :verify -f .` +2. Run the candidate probe: `docker run --rm :verify ` +3. Capture stdout and confirm it parses into a clean version string. +4. Add the captured sample to `internal/driver//baseimage_test.go` as a golden fixture for the parser. + +Initial probe table — **each row must be verified in step §2.1 before implementation lands**: + +| Driver | Recipe installs | Candidate probe | Verified? | +|---|---|---|---| +| openclaw | node, npm, openclaw via install.sh | `openclaw --version \| awk '{print $NF}'` | ☐ | +| microclaw | (verify recipe) | `microclaw --version \| awk '{print $NF}'` | ☐ | +| nullclaw | (verify recipe) | `nullclaw --version \| awk '{print $NF}'` | ☐ | +| nanobot | python + nanobot package | `python -c "import nanobot; print(nanobot.__version__)"` | ☐ | +| nanoclaw | **node in builder + runtime; NO jq** | `node -e 'console.log(require("/workspace/package.json").version)'` | ☐ | +| picoclaw | `docker.io/sipeed/picoclaw:latest`, no tools added | **omit `RunnerVersionProber`** — falls back to `built-YYYYMMDD-` | N/A | + +**Explicit correction from the prior draft:** the nanoclaw probe is `node`-based, not `jq`-based. The nanoclaw recipe (`internal/driver/nanoclaw/baseimage.go:21`) does not install `jq`, and using `jq` would silently fall back to the build-date tag on every refresh. `node` is guaranteed present (the recipe is `FROM node:22-bookworm-slim`), and `/workspace/package.json` is guaranteed present (copied from the cloned repo in the builder stage). + +Drivers whose probe cannot be verified against their current recipe in a single verification pass are shipped with `RunnerVersionProber` **unimplemented**. They use the fallback tag scheme and can gain a probe later in a follow-up PR without touching the core mechanism. + +### 3. Implement `RunnerBaseProvider` (and optional probe) in each driver + +**Files:** `internal/driver/{openclaw,microclaw,nullclaw,nanobot,picoclaw,nanoclaw}/baseimage.go` + +For openclaw: + +```go +func (d *Driver) RunnerAlias() string { + return "openclaw" +} + +func (d *Driver) RunnerVersionProbe() []string { + return []string{"sh", "-c", "openclaw --version 2>/dev/null | awk '{print $NF}'"} +} +``` + +Repeat for microclaw, nullclaw, nanobot, nanoclaw using the verified probes from §2. Picoclaw implements only `RunnerAlias()`. + +### 4. Add `RefreshRunnerBase` in `internal/build/build.go` + +Add a new exported function: + +```go +// RefreshResult captures everything claw pull learned about a freshly built +// runner base. All three fields are passed to claw build later so service +// images can be stamped with full provenance. +type RefreshResult struct { + Alias string // "openclaw" + Version string // "0.5.2" (probed) or "built-20260409-abc123def456" (fallback) + ImageID string // "sha256:abc..." — strong drift fingerprint + RecipeSHA string // "sha256:def..." — recipe content hash + PreviousVer string // what was tagged before this refresh, for the upgrade message +} + +// RefreshRunnerBase rebuilds the driver's base image against fresh upstream +// sources, probes the resulting image for its upstream runner version, tags +// the result, and returns the provenance needed by the caller. +func RefreshRunnerBase(d driver.RunnerBaseProvider) (*RefreshResult, error) +``` + +Implementation shape: + +1. `alias := d.RunnerAlias(); _, dockerfile := d.BaseImage()` +2. Record the previous version (if any) via `lookupLocalRunnerVersion(alias)`. +3. Compute `recipeSHA := sha256.Sum256([]byte(dockerfile))`. +4. Generate a unique interim tag: `interim := alias + ":refreshing-" + shortRand()`. +5. Run `docker build --pull --no-cache -t `. +6. Query the image ID: `docker inspect --format '{{.Id}}' `. +7. Resolve the version via `RunnerVersionProber` if implemented; otherwise fall back to `built-YYYYMMDD-`. +8. Compute versioned tag: `versioned := alias + ":v" + version` (or `alias + ":" + fallbackTag` for non-probe drivers). +9. `docker tag ` and `docker tag :latest`. +10. `docker rmi ` (best-effort; not a hard error if it fails). +11. Return the `RefreshResult`. + +Helpers `dockerTag`, `dockerRmiQuiet`, `shortRand`, and `lookupLocalRunnerVersion` wrap shell invocations of `docker` — the file already shells out to docker (see `build.go:131`), so this is a continuation of the existing pattern. + +### 5. Three-mode dispatch in `cmd/claw/pull.go` + +The current `pullCmd` uses `resolveOptionalPodFile` (`image_lifecycle.go:137`), which handles both explicit `--file` and positional-arg cases plus the `claw-pod.yml`-in-cwd auto-resolution. Extend that logic to also recognize the full set of single-Clawfile inputs that `claw build ` already accepts, rather than inventing a narrower filename-only classifier. + +Change `pullCmd` definition in `cmd/claw/pull.go`: + +```go +var pullCmd = &cobra.Command{ + Use: "pull [pod-file-or-clawfile]", + Short: "Fetch pinned infra, refresh runner bases, and pull pod registry images", + Args: cobra.MaximumNArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + target, err := resolvePullTarget(composePodFile, args) + if err != nil { + return err + } + switch target.Kind { + case pullTargetPod: + return runPullPod(target.Path) + case pullTargetClawfile: + return runPullClawfile(target.Path) + case pullTargetBare: + return runPullBare() + } + return fmt.Errorf("unreachable") + }, +} +``` + +New helper in `cmd/claw/image_lifecycle.go` (next to `resolveOptionalPodFile`): + +```go +type pullTargetKind int + +const ( + pullTargetBare pullTargetKind = iota + pullTargetPod + pullTargetClawfile +) + +type pullTarget struct { + Kind pullTargetKind + Path string +} + +// resolvePullTarget implements the ADR-024 §2 mode matrix: +// - explicit -f → pullTargetPod +// - positional (.yml/.yaml) → pullTargetPod +// - positional accepted by claw build's +// single-Clawfile resolution rules → pullTargetClawfile +// - no args, claw-pod.yml in cwd → pullTargetPod (auto) +// - no args, no pod in cwd → pullTargetBare +// - unclassifiable positional input → hard error +func resolvePullTarget(explicit string, args []string) (pullTarget, error) { + if explicit != "" && len(args) > 0 { + return pullTarget{}, fmt.Errorf("pull target specified twice: use either '--file %s' or positional '%s', not both", explicit, args[0]) + } + if explicit != "" { + return pullTarget{Kind: pullTargetPod, Path: explicit}, nil + } + if len(args) > 0 { + return classifyPullArg(args[0]) + } + if _, err := os.Stat("claw-pod.yml"); err == nil { + return pullTarget{Kind: pullTargetPod, Path: "claw-pod.yml"}, nil + } else if !errors.Is(err, os.ErrNotExist) { + return pullTarget{}, fmt.Errorf("stat claw-pod.yml: %w", err) + } + return pullTarget{Kind: pullTargetBare}, nil +} + +func classifyPullArg(path string) (pullTarget, error) { + ext := strings.ToLower(filepath.Ext(path)) + if ext == ".yml" || ext == ".yaml" { + return pullTarget{Kind: pullTargetPod, Path: path}, nil + } + + resolved, err := resolveClawfilePath(path) + if err == nil && isClawBuildFile(resolved) { + return pullTarget{Kind: pullTargetClawfile, Path: resolved}, nil + } + + return pullTarget{}, fmt.Errorf("cannot classify pull target %q: expected a pod file (*.yml/*.yaml) or any path accepted by 'claw build '", path) +} +``` + +This deliberately reuses existing Clawfile behavior instead of narrowing it. Flat-layout filenames like `Clawfile.westin`, example files like `Clawfile.nanoclaw`, directories containing a `Clawfile`, and other custom paths that already satisfy `isClawBuildFile` continue to work in `claw pull`. We should not make runner refresh less capable than `claw build`, because the remediation path in §8 must accept the same input the user just built from. + +The `runPullBare` function preserves current behavior (`pullCoreInfraImages()`) and additionally refreshes any managed runner aliases already locally tagged. The `runPullPod` function is the existing logic plus a call to `refreshPodRunnerBases(p)`. The `runPullClawfile` function parses the Clawfile and refreshes just that driver's base. + +Backward compatibility: `resolveOptionalPodFile` at `image_lifecycle.go:137` is kept intact for other callers (`compose_up.go`, etc.). Only `pullCmd` switches to `resolvePullTarget`. + +### 6. Runner driver discovery helpers in `cmd/claw/image_lifecycle.go` + +```go +// requiredRunnerDriversForPod returns the unique RunnerBaseProvider drivers +// used by the pod's build: services. +func requiredRunnerDriversForPod(podDir string, p *pod.Pod, plans []plannedServiceImage) ([]driver.RunnerBaseProvider, error) + +// runnerDriverForClawfile returns the RunnerBaseProvider for a single clawfile +// path, or nil if the driver does not implement RunnerBaseProvider. +func runnerDriverForClawfile(clawfilePath string) (driver.RunnerBaseProvider, error) + +// locallyTaggedRunnerDrivers returns runner drivers whose :latest is +// already present in the local Docker daemon. Used by the bare claw pull mode. +func locallyTaggedRunnerDrivers() []driver.RunnerBaseProvider +``` + +Each helper calls out to `driver.Lookup` / `driver.All` (whichever is the current registry API) and filters by `RunnerBaseProvider` assertion. + +The refresh driver: + +```go +func refreshRunnerBases(drivers []driver.RunnerBaseProvider) (map[string]*build.RefreshResult, error) { + results := make(map[string]*build.RefreshResult) + for _, d := range drivers { + alias := d.RunnerAlias() + fmt.Printf("[claw] %s: building base from upstream (this may take a few minutes)\n", alias) + res, err := build.RefreshRunnerBase(d) + if err != nil { + return nil, fmt.Errorf("refresh runner base %s: %w", alias, err) + } + switch { + case res.PreviousVer == "": + fmt.Printf("[claw] %s: installed v%s\n", alias, res.Version) + case res.PreviousVer == res.Version: + fmt.Printf("[claw] %s: already at v%s\n", alias, res.Version) + default: + fmt.Printf("[claw] %s: installed v%s (was v%s)\n", alias, res.Version, res.PreviousVer) + } + results[alias] = res + } + return results, nil +} +``` + +### 7. FROM rewriting and label injection in `internal/clawfile/emit.go` + +Change the signature: + +```go +// RunnerProvenance describes the resolved runner base for a single driver +// alias. Passed to Emit so it can rewrite FROM lines and inject provenance +// labels into Dockerfile.generated. +type RunnerProvenance struct { + Alias string // "openclaw" + Version string // "0.5.2" or "built-20260409-abc123def456" + ImageID string // "sha256:..." + RecipeSHA string // "sha256:..." +} + +// Emit renders the generated Dockerfile. If runner is non-nil, FROM :latest +// is rewritten to FROM :v (or FROM :) and +// three provenance labels are injected. If runner is nil, Emit preserves the +// current literal-copy behavior. +func Emit(result *ParseResult, runner *RunnerProvenance) (string, error) +``` + +Inside `Emit`, when walking `result.DockerNodes`: + +- If `node.Value` is `"from"` (case-insensitive) and `runner != nil` and the FROM image matches `runner.Alias + ":latest"`, rewrite the line to `FROM :` where `` is `"v" + runner.Version` if `runner.Version` starts with a semver-looking string, otherwise `runner.Version` itself (for the fallback tag that already includes its full form). + +In `buildGeneratedLines`, append three lines when `runner != nil`: + +```go +if runner != nil { + lines = append(lines, + formatLabel("claw.runner.built-against", fmt.Sprintf("%s:%s", runner.Alias, runnerTagFor(runner.Version))), + formatLabel("claw.runner.image-id", runner.ImageID), + formatLabel("claw.runner.recipe-sha", runner.RecipeSHA), + ) +} +``` + +Where `runnerTagFor("0.5.2") = "v0.5.2"` and `runnerTagFor("built-20260409-abc...") = "built-20260409-abc..."`. + +Existing tests in `emit_test.go` pass `nil` for the new parameter and continue to pass. + +### 8. Populate provenance in `build.Generate` + +**File:** `internal/build/build.go` + +Modify `Generate`: + +```go +func Generate(clawfilePath string) (string, error) { + // existing parse + driver lookup ... + + var provenance *clawfile.RunnerProvenance + if rbp, ok := d.(driver.RunnerBaseProvider); ok { + p, err := resolveLocalRunnerProvenance(rbp) + if err != nil { + return "", remediationErrorf("claw pull", "%w", err) + } + provenance = p + } + + rendered, err := clawfile.Emit(parsed, provenance) + if err != nil { + return "", fmt.Errorf("emit dockerfile: %w", err) + } + // ... +} +``` + +`resolveLocalRunnerProvenance` looks up `:latest` in local Docker, extracts its image ID and the version-or-fallback tag from `RepoTags`, recomputes the recipe SHA from the driver's current `BaseImage()` Dockerfile content, and returns the provenance struct. Returns a remediation error if `:latest` is missing locally. + +The remediation error message adapts to invocation context: if `Generate` was called from `claw build `, the remediation is `run: claw pull `; if from `claw build` pod mode, `run: claw pull`. (Wiring this requires threading the caller's invocation context into the error; alternately, the error carries a hint that the caller formats.) + +### 9. Drift hint in `claw up` + +**File:** `cmd/claw/compose_up.go` + +In the pod service image validation phase, for each service image that has a `claw.runner.image-id` label: + +1. Read `claw.runner.built-against` (e.g., `openclaw:v0.5.0`) to extract the alias. +2. Read `claw.runner.image-id` (e.g., `sha256:abc...`). +3. Query the current local `:latest` image ID via `docker inspect`. +4. If different, print: + ``` + [claw] : built against (image ), current is (image ) — consider running: claw build + ``` +5. If `--fix` is set, automatically trigger a rebuild of the affected services. + +Service images without any `claw.runner.*` labels (built by older `claw` binaries) are silently treated as not-yet-migrated. `--fix` rebuilds them to pick up the labels. + +**Epistemic boundary (matches ADR-024 §4):** this check compares local image IDs only. `claw up` does not probe upstream sources, does not assert that the local alias is "latest," and does not refresh the runner base itself. Runner refresh is always an explicit `claw pull`. + +### 10. Tests + +**Unit tests:** + +- `internal/clawfile/emit_test.go`: + - Existing cases pass `nil` for `runner` and continue to pass (no regression). + - New case: pass a populated `RunnerProvenance`, assert FROM rewriting from `openclaw:latest` to `openclaw:v0.5.2` and presence of the three labels. + - Edge case: FROM with explicit non-`:latest` tag (e.g., `FROM openclaw:pinned`) should be left alone. + - Edge case: multi-stage Dockerfile with `FROM node:22 AS builder` followed by `FROM openclaw:latest` — only the runner FROM is rewritten. + +- `internal/build/build_test.go`: + - `RefreshRunnerBase` against a fake `RunnerBaseProvider` that returns a trivial Dockerfile (`FROM busybox\nRUN echo 0.0.1 > /version\nRUN chmod +x /bin/true`) and a probe that reads `/version`. Verify the interim build, probe, tagging, and image ID capture flow. + - Fake driver without `RunnerVersionProber` → fallback tag format `built-YYYYMMDD-`. + - Two consecutive refreshes of the same fake driver in the same test (simulating same-day) produce different image IDs and different fallback tags. **This is the regression test against codex's same-day collision finding.** + +- `cmd/claw/image_lifecycle_test.go` (or new `pull_test.go`): + - `resolvePullTarget` for all mode-matrix cases: + - explicit `--file foo.pod.yml` → pod mode + - positional `foo.pod.yml` → pod mode + - positional `Clawfile.nanoclaw` → clawfile mode + - positional `Clawfile` (basename, no extension) → clawfile mode + - positional directory containing `Clawfile` → clawfile mode + - positional custom file accepted by `isClawBuildFile` → clawfile mode + - positional `unknown.txt` → error + - no args, `claw-pod.yml` in cwd → pod mode (auto) + - no args, no pod in cwd → bare mode + - both `--file` and positional → error + - `locallyTaggedRunnerDrivers` returns drivers whose alias tag exists in a mocked docker state. + +- `cmd/claw/compose_up_test.go`: + - Drift detection: service image labeled `claw.runner.image-id=sha256:abc`, current `openclaw:latest` has image ID `sha256:def` → soft hint printed, `claw up` continues. + - **Same-version-different-id regression test:** service image labeled with version `openclaw:v0.5.2` and image-id `sha256:abc`, current `openclaw:latest` is also tagged `openclaw:v0.5.2` but has image-id `sha256:def` → drift detected (this is the exact case codex flagged as the semver-not-content-stable hole). + - Service image without `claw.runner.*` labels → no hint, `claw up` continues. + +**Integration tests:** + +- `cmd/claw/` integration test that exercises `claw pull` against a fixture pod with one OpenClaw `build:` service, using a stub driver registered with `RunnerBaseProvider`. Verifies the full pull → rebuild → tag → build → up flow without hitting real upstream installers. + +**Spike tests:** + +- Update `TestSpikeRollCall` / `TestSpikeComposeUp` assertions that touch `Dockerfile.generated` content — the FROM line is no longer literal `:latest`. + +**Full test sweep:** + +``` +go vet ./... +go test ./... +go test -tags integration ./... +``` + +Spike tests (live docker): + +``` +go test -tags spike -run TestSpikeRollCall ./cmd/claw/... +``` + +### 11. Documentation sweep + +- `AGENTS.md` (and the `CLAUDE.md` symlink): remove the OpenClaw refresh footgun from "Repo-Specific Gotchas." Add a note that `claw pull` refreshes runner bases for the pod's `build:` services (or for a single Clawfile) and that this can take minutes. Document the three-mode `claw pull` matrix. +- `README.md`: extend the four-verb explanation to mention runner refresh under `claw pull`, including the single-Clawfile mode. +- `site/guide/cli.md`: same. +- `site/guide/quickstart.md`: confirm the quickstart still works end-to-end. +- `cmd/claw/skill_data/SKILL.md` and `skills/clawdapus/SKILL.md`: regenerate via `go generate ./cmd/claw/...`. +- `site/changelog.md`: add an entry under the next version describing the runner refresh path, the FROM-rewrite change, and the three-label provenance. + +## Manual smoke test + +1. `docker rmi openclaw:latest openclaw:v* openclaw:built-* 2>/dev/null || true` +2. `cd examples/quickstart && claw up -d` — expect `claw build` failure with `run: claw pull` remediation. +3. `claw pull` — expect openclaw rebuild with version output. +4. `cat compose.generated.yml` and `cat /Dockerfile.generated` — verify `FROM openclaw:v` and three `LABEL claw.runner.*` lines. +5. `docker image inspect openclaw:latest` — verify both `openclaw:v` and `openclaw:latest` in `RepoTags`. +6. `claw build && claw up -d` — pod starts. +7. **Single-Clawfile mode:** `claw pull examples/trading-desk/Clawfile.nanoclaw` and `claw pull examples/quickstart` — expect runner refresh to work for both a custom-named Clawfile and a directory input, using the same resolution rules as `claw build`. +8. **Same-version drift simulation:** `docker tag openclaw:v openclaw:v-spoof && docker rmi openclaw:latest && docker tag busybox openclaw:latest` → `claw up` should print the soft drift hint because image IDs differ. +9. **Live Tiverton smoke:** on the trading pod host, run `claw pull -f claw-pod.yml`, then `claw build -f claw-pod.yml`, then `claw up -d -f claw-pod.yml`, then `claw compose exec analyst openclaw --version` to confirm the new runner version. + +## Open questions to settle during implementation + +1. **Probe output stability per driver.** Resolved per-driver in §2. Non-blocking for the core mechanism because graceful fallback is built in, but blocking for the probe table being final. +2. **Picoclaw versioning.** Confirmed: no probe, always uses the fallback tag. Drift detection still works via image-id comparison. +3. **Multi-stage Dockerfile FROM rewriting.** Covered by a test case in §10; implementation must walk `DockerNodes` and only rewrite the FROM whose image matches a known alias, leaving intermediate stages alone. +4. **Concurrent `claw pull` runs on the same machine.** Handled by the unique interim tag (`alias:refreshing-`) in `RefreshRunnerBase`. Worth a note in the plan for future reviewers. +5. **Network failures during refresh.** If `curl install.sh` fails mid-build, the operator gets a docker-build error. First implementation surfaces the raw error; friendlier wrapping can be a follow-up. +6. **Error-message adaptation for remediation.** `build.Generate`'s remediation error needs to know whether it was called from pod mode or single-Clawfile mode so the `run: claw pull ` hint matches the exact `claw build ` input. Plan: thread an invocation context through `Generate` or return a structured error the caller formats. + +## Non-goals for this work + +- Publishing any runner base images to ghcr.io. +- Changing `internal/infraimages/release_manifest.go` or its tests. +- Touching hermes-base (remains pinned per ADR-022). +- Adding a new top-level CLI verb. +- Per-driver configuration of which upstream version to install (operators get whatever the upstream installer picks at refresh time). +- Air-gapped operator support (this model assumes network access to upstream installers). +- Pinning runner versions in any artifact other than the local Docker daemon's tag list. +- Stronger upstream freshness checks (`claw up` comparing local alias against upstream latest). ADR-024 §4 explicitly rejects this as an epistemic over-claim.