Skip to content

claw up cleanup of .claw-runtime.previous-* fails on cross-UID openclaw cron dirs #169

@mostlydev

Description

@mostlydev

Summary

After the v0.8.11 runtime rotation (#153), claw up -d rotates the previous .claw-runtime to a .claw-runtime.previous-<gen> sibling and removes it after compose apply succeeds. On the Tiverton trading-desk deployment (clawdbot@tiverton:tiverton-house), that cleanup is now warn-and-continue on cross-UID openclaw cron run directories, leaving stale .claw-runtime.previous-* siblings on disk after every claw up.

The active deploy path is unaffected — the pod is healthy, post-apply verifies, and v0.8.14 came up clean. But unbounded .claw-runtime.previous-* accumulation on a long-running production deployment is real disk debt.

Reproduction

Observed on clawdbot@tiverton:tiverton-house upgrading to v0.8.14:

[claw] tiverton (70ebde4f00cb): post-apply verified
[claw] warning: could not remove previous runtime dir /home/clawdbot/tiverton-house/.claw-runtime.previous-1776369405501791510:
  openfdat /home/clawdbot/tiverton-house/.claw-runtime.previous-1776369405501791510/weston/cron/runs: permission denied
[claw] pod is up

The path that fails (weston/cron/runs/) is written by the openclaw container under its in-container UID. The host-side os.RemoveAll running as clawdbot cannot traverse or unlink the entries inside it, so the rotation cleanup gives up after the warning.

Proposed shape

Same family of fix as #163 (portable memory cross-UID repair): route the cleanup through a privileged helper container (busybox:1.36.1 rm -rf running as root) when the in-process removal hits permission denied, rather than warn-and-continue. The runtime rotation is opt-in to root only on the cleanup leg of an already-succeeded apply; the live pod is unaffected.

Alternative: pin the openclaw runtime user the same way Hermes/Nanobot/Nanoclaw were pinned in v0.8.13 so the cron runs/ directory is owned by the host UID in the first place. This was intentionally NOT done for openclaw (per v0.8.13 changelog: "OpenClaw and NullClaw are intentionally not pinned — they still accept variable upstream users — and the portable memory repair helper above covers the cross-UID case for those"). So the helper-container path is the consistent shape.

Scope

  • internal/runtime/ (rotation cleanup path that emits this warning)
  • Mirror the busybox helper pattern from PreparePortableMemory's repair helper
  • Spike test that proves cleanup succeeds even when the previous runtime contains files owned by a foreign UID

Trust order context

Per CLAUDE.md, runtime directories created by Materialize() use 0o777 so cross-UID writes work. The cron runs/ directory is openclaw-written, so it lands under whatever UID the openclaw container runs as — explaining why the cleanup leg specifically fails on openclaw runs/ paths and not on driver-materialized paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions