fix(sandbox): restrict /sandbox to read-only via Landlock (#804)#1121
Conversation
…stem policy (NVIDIA#804) Tighten the Landlock filesystem policy so agents cannot write arbitrary files in the /sandbox home directory. Only explicitly declared paths remain writable (/sandbox/.openclaw-data, /sandbox/.nemoclaw, /tmp). - Set include_workdir to false (verified against OpenShell landlock.rs: when true, WORKDIR is added to read_write, overriding read_only) - Move /sandbox from read_write to read_only in the policy - Add /sandbox/.nemoclaw to read_write for plugin state/config writes - DAC-protect blueprints with root ownership (defense-in-depth) - Pre-create .bashrc/.profile at build time (read-only home prevents runtime writes); source proxy config from writable proxy-env.sh - Redirect tool dotfiles (npm, git, pip, bash, claude, node) to /tmp via env vars in both the entrypoint and the sourced proxy-env.sh so interactive connect sessions also get the redirects Closes NVIDIA#804
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBuild-time hardening pre-creates and DAC-protects Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…proach The entrypoint no longer writes proxy config directly to ~/.bashrc (read-only home). Tests now verify that proxy-env.sh is written to the writable data dir and that .bashrc sourcing works correctly.
The sed-extracted block contains the path in comments before the variable assignment. replace() only swaps the first occurrence (the comment), leaving the actual _PROXY_ENV_FILE assignment pointing at /sandbox/.openclaw-data/ which doesn't exist in CI.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/deployment/sandbox-hardening.md (2)
85-86: Keep each sentence on its own source line in this intro.The first sentence is split across two source lines, and the second shares the same line as the end of the first. Please give each sentence its own line. As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/deployment/sandbox-hardening.md` around lines 85 - 86, The intro currently has two sentences on the same source line; split them so each sentence is on its own line: ensure "The sandbox Landlock policy restricts `/sandbox` (the agent's home directory) to read-only access." is one line and "Only explicitly declared directories are writable:" is the following line, updating the text in the same paragraph (no other changes).
103-105: Rewrite this in active voice and keep one sentence per line.
are pre-createdis passive, and the sentence is wrapped across multiple source lines. As per coding guidelines, "Active voice required. Flag passive constructions." and "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/deployment/sandbox-hardening.md` around lines 103 - 105, Rewrite the two-line passive sentence into active voice and ensure each sentence sits on its own source line: change "Shell init files (`.bashrc`, `.profile`) are pre-created at image build time and source runtime proxy configuration from the writable `/sandbox/.openclaw-data/proxy-env.sh`." into two active-voice sentences such as "The image build process pre-creates shell init files `.bashrc` and `.profile`." and "These files source runtime proxy configuration from `/sandbox/.openclaw-data/proxy-env.sh`." Place each sentence on its own line in the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Dockerfile`:
- Around line 151-159: The Dockerfile currently only makes
/sandbox/.nemoclaw/blueprints root-owned; instead ensure the parent is locked:
in the RUN that touches /sandbox/.nemoclaw, set ownership and permissions on the
parent directory (chown root:root /sandbox/.nemoclaw && chmod 755
/sandbox/.nemoclaw) before adjusting the blueprints subtree, then create the
runtime dirs (/sandbox/.nemoclaw/state and /sandbox/.nemoclaw/migration) and
chown those to sandbox:sandbox so only those are writable; update the existing
RUN that uses chown/chmod/mkdir to apply root ownership and 755 permissions to
/sandbox/.nemoclaw itself (and keep /sandbox/.nemoclaw/blueprints root:root) and
then chown only the state and migration dirs to sandbox.
In `@scripts/nemoclaw-start.sh`:
- Around line 248-273: The proxy env file is written into a sandbox-writable
directory (_PROXY_ENV_FILE="/sandbox/.openclaw-data/proxy-env.sh") which allows
a sandbox user to replace it with malicious shell code; instead write the proxy
env to a non-user-writable, root-owned location (for example create and use a
system-owned directory like /etc/openclaw or /var/lib/openclaw and set
ownership/mode) and update whatever startup/profile sourcing to point at that
path; ensure the write is done atomically and safely (create a temporary file in
the root-owned dir, set owner to root, chmod 0644, then rename into place) and
avoid following attacker symlinks (use safe file creation APIs or the install
command rather than plain cat > "$_PROXY_ENV_FILE"); also remove or stop
auto-sourcing any file from the sandbox-writable tree so agent-controlled files
cannot be executed at session startup.
---
Nitpick comments:
In `@docs/deployment/sandbox-hardening.md`:
- Around line 85-86: The intro currently has two sentences on the same source
line; split them so each sentence is on its own line: ensure "The sandbox
Landlock policy restricts `/sandbox` (the agent's home directory) to read-only
access." is one line and "Only explicitly declared directories are writable:" is
the following line, updating the text in the same paragraph (no other changes).
- Around line 103-105: Rewrite the two-line passive sentence into active voice
and ensure each sentence sits on its own source line: change "Shell init files
(`.bashrc`, `.profile`) are pre-created at image build time and source runtime
proxy configuration from the writable `/sandbox/.openclaw-data/proxy-env.sh`."
into two active-voice sentences such as "The image build process pre-creates
shell init files `.bashrc` and `.profile`." and "These files source runtime
proxy configuration from `/sandbox/.openclaw-data/proxy-env.sh`." Place each
sentence on its own line in the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6c35344c-d7f9-4676-92dd-982615431c01
📒 Files selected for processing (5)
DockerfileDockerfile.basedocs/deployment/sandbox-hardening.mdnemoclaw-blueprint/policies/openclaw-sandbox.yamlscripts/nemoclaw-start.sh
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/service-env.test.js (1)
187-225: Add guard for empty sed extraction to improve debuggability.Unlike the
extractProxyVarshelper (lines 110-115) which throws a descriptive error when the sed extraction fails, this test would fail with a confusingENOENTerror at line 211 if the script structure changes andpersistBlockis empty.🛠️ Proposed fix to add consistency with existing pattern
const persistBlock = execFileSync( "sed", ["-n", "/^_PROXY_URL=/,/^chmod 644/p", scriptPath], { encoding: "utf-8" } ); + if (!persistBlock.trim()) { + throw new Error( + "Failed to extract proxy persistence block from scripts/nemoclaw-start.sh — " + + "the _PROXY_URL..chmod block may have been moved or renamed" + ); + } const wrapper = [🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/service-env.test.js` around lines 187 - 225, The test "entrypoint writes proxy-env.sh to writable data dir" extracts a persistBlock via sed but doesn't guard against an empty result, causing a confusing ENOENT later; add the same defensive check used by extractProxyVars (throw a descriptive error when the sed extraction returns an empty string) before writing/executing tmpFile so failures in script structure are reported clearly—specifically check the persistBlock variable after the execFileSync sed call in this test and throw or assert with a helpful message if it's empty (refer to persistBlock and the extractProxyVars pattern for the exact guard behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/service-env.test.js`:
- Around line 187-225: The test "entrypoint writes proxy-env.sh to writable data
dir" extracts a persistBlock via sed but doesn't guard against an empty result,
causing a confusing ENOENT later; add the same defensive check used by
extractProxyVars (throw a descriptive error when the sed extraction returns an
empty string) before writing/executing tmpFile so failures in script structure
are reported clearly—specifically check the persistBlock variable after the
execFileSync sed call in this test and throw or assert with a helpful message if
it's empty (refer to persistBlock and the extractProxyVars pattern for the exact
guard behavior).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 21f50e8b-a606-4eeb-881c-034528b7744f
📒 Files selected for processing (1)
test/service-env.test.js
Address CodeRabbit review findings: - Lock /sandbox/.nemoclaw parent directory (root:root 755) so the agent cannot rename or replace the root-owned blueprints directory - Pre-create config.json and snapshots/ as sandbox-owned for runtime writes - Move proxy-env.sh from sandbox-writable .openclaw-data to /tmp where sticky-bit protection prevents the sandbox user from tampering with the root-owned file - Add rm -f before write to prevent symlink-following attacks - Add empty sed extraction guards in proxy persistence tests - Fix docs: one sentence per line, active voice Ref: NVIDIA#804
|
✨ Thanks for submitting this PR with a detailed summary, it proposes restricting the sandbox environment to address a potential security issue. |
…DIA#804) DAC tests (Docker-only, test/e2e-gateway-isolation.sh): - Tests 13-25: verify sandbox user cannot write to /sandbox, .nemoclaw parent, blueprints, .openclaw dir; verify sandbox CAN write to state, migration, snapshots, staging, config.json, .openclaw-data - Fix test 9: add missing `memory` symlink to verification list Landlock tests (OpenShell/Brev, checks/04-landlock-readonly.sh): - 8 tests verifying kernel-level read-only enforcement on /sandbox - Closes DAC gap: .bashrc/.profile are sandbox-owned but Landlock read_only prevents agent from injecting malicious env vars Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
…DIA#804) /sandbox is sandbox-owned (DAC allows writes). Read-only enforcement comes from Landlock at runtime, which is tested in the Brev e2e suite (checks/04-landlock-readonly.sh). Renumber remaining tests 13-24. Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
…IDIA#804) The base image on GHCR hasn't been rebuilt with pre-baked shell init files yet. Skip tests 23-24 gracefully instead of failing when the files don't exist. Tests will auto-activate after base image rebuild. Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
86b36bc to
00b1190
Compare
Merge resolution: - Dockerfile.base: keep logs, credentials, sandbox dirs plus telegram from main; remove duplicate credentials symlink - sandbox-hardening.md: keep NVIDIA#804 reference, adopt main colon format - Policy YAML: only Landlock changes, all endpoint rules from main preserved Review concerns addressed: - Single source of truth for tool-cache redirects (_TOOL_REDIRECTS array) - .env chmod logs warning instead of silently swallowing failure - Landlock kernel requirements documented in sandbox-hardening.md - telegram added to e2e symlink check list - Non-root mode e2e test (Test 25) - Symlink attack prevention test for proxy-env.sh - Updated test assertions for new proxy-env.sh format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) The variable was referenced at 3 locations in export_gateway_token() and install_configure_guard() but never assigned, causing an unbound-variable crash under set -euo pipefail during sandbox creation. Closes NVIDIA#1609 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
install -o/-g requires root to chown. In non-root mode (uid != 0), use mkdir -p instead — directories are already owned by the current user. Fixes e2e test-25 (non-root command execution). Signed-off-by: Prekshi Vyas <prekshivyas@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d9005af to
2845415
Compare
|
7a25686 to
80482b2
Compare
…VIDIA#1607) OpenShell's prepare_filesystem() chowns every read_write path to run_as_user at sandbox start, flipping /sandbox/.nemoclaw from root:root to sandbox:sandbox. This removed the DAC protection preventing the agent from renaming blueprints/. The sticky bit (1755) survives the ownership flip and prevents the sandbox user from renaming or deleting root-owned entries like blueprints/, while still allowing writes to sandbox-owned subdirs (state/, migration/, snapshots/, staging/, config.json). Note: this mitigates the security impact but does not prevent the ownership change itself — that requires an OpenShell-side fix in prepare_filesystem(). Mitigates NVIDIA#1607 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
80482b2 to
29f44ca
Compare
The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) does not include the gnupg package, even though `nemoclaw-start.sh` already exports `GNUPGHOME=/tmp/.gnupg` and `test/service-env.test.js` asserts the redirect is in place. As a result `gpg --list-keys` (or any other gpg invocation) inside the sandbox fails with `bash: gpg: command not found`, breaking workflows that expect signing/verification to be available — including the smoke check QA reported on DGX Spark (aarch64). The GNUPGHOME redirect was introduced in NVIDIA#1121 ("restrict /sandbox to read-only via Landlock") to keep gpg writable when `~/.gnupg` became unwritable, but the matching `apt-get install gnupg` line was never added to `Dockerfile.base`. The service-env tests assert the env var setup but don't actually invoke gpg, so CI never noticed the binary was missing. This adds `gnupg=2.2.40-1.1+deb12u2` (the bookworm-pinned version, matching the existing `=<version>` pinning style for every other package in the same `apt-get install` block) right after `git`. No other changes — same `--no-install-recommends`, same cleanup tail. The package brings in dirmngr, gpg-wks-server, and gpg-wks-client as dependencies (per a clean install probe in the exact base image SHA). Total layer cost ~3 MB compressed. Smoke tested locally by building Dockerfile.base with the fix and running the exact failing command from the bug report: $ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg . $ docker run --rm nemoclaw-base-test:gnupg gpg --version gpg (GnuPG) 2.2.40 $ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys gpg: directory '/root/.gnupg' created gpg: keybox '/root/.gnupg/pubring.kbx' created gpg: /root/.gnupg/trustdb.gpg: trustdb created (exit 0) $ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \ sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys' gpg: keybox '/tmp/.gnupg/pubring.kbx' created (exit 0) Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg` (matching what `nemoclaw-start.sh` exports) work as expected. Closes NVIDIA#1640. Signed-off-by: T Savo <evilgenius@nefariousplan.com>
…1649) <!-- markdownlint-disable MD041 --> ## Summary The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is missing the `gnupg` package — `gpg --list-keys` (and any other gpg invocation) fails with `bash: gpg: command not found` inside the sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to the existing `apt-get install` block in `Dockerfile.base`, restoring the binary that the rest of the codebase already assumes is present. ## Related Issue Closes #1640. ## Changes `Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing `apt-get install` block, slotted right after `git`. Same `--no-install-recommends`, same cleanup tail, same `=<version>` pinning style as every other package in the block. ```diff curl=7.88.1-10+deb12u14 \ git=1:2.39.5-0+deb12u3 \ + gnupg=2.2.40-1.1+deb12u2 \ ca-certificates=20230311+deb12u1 \ ``` The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified by `apt-cache madison gnupg` against the exact base image SHA `node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`, `gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost ~3 MB compressed. Diff: **+1 / 0** in 1 file. ### Why this is the right fix (and not "lower the env var" or "remove the test") The fix isn't obvious unless you trace where `GNUPGHOME` came from. Walking that chain: 1. **PR #1121** (`fix(sandbox): restrict /sandbox to read-only via Landlock (#804)`, authored by @prekshivyas, merged 2026-04-08) made the `/sandbox` home directory Landlock-read-only to prevent agents from modifying their own runtime environment. 2. To keep tools that normally write under `~/...` working (gpg, git config, python history, npm prefix, etc.), that PR redirected each tool's homedir to a writable `/tmp/...` path via env vars in `scripts/nemoclaw-start.sh`. The relevant line is at `scripts/nemoclaw-start.sh:53`: ```sh 'GNUPGHOME=/tmp/.gnupg' ``` alongside `HISTFILE=/tmp/.bash_history`, `GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc. 3. PR #1121 also added three matching assertions in `test/service-env.test.js` (lines 177, 191, 347) verifying that the redirect is set: ```js expect(src).toContain("GNUPGHOME=/tmp/.gnupg"); ``` 4. **What PR #1121 didn't do**: add `gnupg` to the `apt-get install` list in `Dockerfile.base`. The env var setup landed and the test assertions landed, but the install line was missed. 5. CI never noticed because `service-env.test.js` only asserts that the env var is *set* in the source — it never spawns a subprocess that actually runs `gpg`. So a working test suite + a missing binary coexist silently. The QA report (this issue, #1640) catches it as a runtime failure on DGX Spark aarch64 because their test step does invoke `gpg --list-keys`. The clear intent of #1121 was to **enable** gpg under a redirected `GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg blocked. This PR is the matching install line that #1121 should have included, closing a one-line oversight rather than adding new capability or rolling anything back. ### Why not just remove the GNUPGHOME redirect The env var redirect from #1121 is doing real work — without it, any future `apt-get install gnupg` would still leave gpg unable to write to its homedir under Landlock-read-only `/sandbox`. The redirect is the "right" half of the pair; the install is the missing left half. ### Why this isn't a security regression The sandbox runs LLM-driven agents and gpg is a credential-handling tool, so it's worth justifying explicitly: - The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per session — no preloaded keys. - Without keys, gpg can hash/check signatures of public material but cannot decrypt or sign anything. - An agent would have to first import a key (which requires the user to provide it — keys are not pulled from anywhere automatically) before gpg becomes capable of any sensitive operation. - This is the same threat model as `git` and `curl`, which are already in the image and could equally be used to fetch arbitrary content. gpg adds no new capability that the existing toolchain doesn't already have. If the project explicitly *did* want gpg unavailable to agents, the right fix would be to remove the GNUPGHOME redirect from #1121 *and* the matching test assertions, not to keep the env wiring while leaving the binary missing — that's just confusing. ## Type of Change - [x] Code change for a new feature, bug fix, or refactor. - [ ] Code change with doc updates. - [ ] Doc only. Prose changes without code sample modifications. - [ ] Doc only. Includes code sample changes. ## Testing Smoke-tested locally by building `Dockerfile.base` with the fix and running the exact failing command from the bug report: ```sh $ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg . [...] => exporting to image 46.7s done $ docker run --rm nemoclaw-base-test:gnupg gpg --version gpg (GnuPG) 2.2.40 libgcrypt 1.10.1 $ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys gpg: directory '/root/.gnupg' created gpg: keybox '/root/.gnupg/pubring.kbx' created gpg: /root/.gnupg/trustdb.gpg: trustdb created (exit 0) # And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh: $ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \ sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys' gpg: keybox '/tmp/.gnupg/pubring.kbx' created (exit 0) ``` Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg` (matching what `nemoclaw-start.sh` exports) work as expected. The exact `gpg --list-keys` failure from the bug report no longer reproduces. - [x] `hadolint Dockerfile.base` — clean (no warnings) - [x] `docker build -f Dockerfile.base` — succeeds, exports to image cleanly - [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`) - [x] `gpg --list-keys` in built image — works (was `bash: gpg: command not found` before this PR) - [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches the runtime env from `nemoclaw-start.sh`) - [ ] `npx prek run --all-files` — partial: ran the affected hooks (commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli` against the full local suite because two pre-existing baseline failures on stock `main` get in the way on a WSL2 dev host (the `shouldPatchCoredns` issue addressed by PR #1626 (merged) and the install-preflight PATH leakage addressed by PR #1628 (open)). Upstream CI runs on Linux GHA runners and doesn't hit either of those, so it'll exercise the full suite normally. - [ ] `npm test` — same caveat as above, ran the relevant projects in isolation - [ ] `make docs` builds without warnings. (for doc-only changes — N/A) ## Checklist ### General - [x] I have read and followed the [contributing guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md). - [ ] I have read and followed the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). (for doc-only changes — N/A) ### Code Changes - [x] Formatters applied — `hadolint Dockerfile.base` clean. No JS/TS/Python files touched. - [x] Tests added or updated for new or changed behavior — N/A. The existing `service-env.test.js` already asserts the `GNUPGHOME` redirect introduced in #1121; this PR makes the corresponding binary available so those assertions reflect a runtime that actually works. A new test that spawns `gpg` directly inside a container would arguably be worth a follow-up (it would have caught this gap originally), but it's a separate concern from this one-line install fix. - [x] No secrets, API keys, or credentials committed. - [ ] Doc pages updated for any user-facing behavior changes — N/A. The bug report describes the expected behavior; this PR just makes runtime match it. No docs claim gpg is unavailable. ### Doc Changes - N/A (no doc changes) --- Signed-off-by: T Savo <evilgenius@nefariousplan.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Base system image now includes GnuPG as a pinned OS package. * **Bug Fixes / Security** * GnuPG runtime directory is now created in a separate step with stricter permissions and sandbox ownership when applicable, reducing exposure. * **Tests** * Test suite updated to verify the new directory creation and permission/ownership behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: T Savo <evilgenius@nefariousplan.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
…VIDIA#1649) <!-- markdownlint-disable MD041 --> ## Summary The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is missing the `gnupg` package — `gpg --list-keys` (and any other gpg invocation) fails with `bash: gpg: command not found` inside the sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to the existing `apt-get install` block in `Dockerfile.base`, restoring the binary that the rest of the codebase already assumes is present. ## Related Issue Closes NVIDIA#1640. ## Changes `Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing `apt-get install` block, slotted right after `git`. Same `--no-install-recommends`, same cleanup tail, same `=<version>` pinning style as every other package in the block. ```diff curl=7.88.1-10+deb12u14 \ git=1:2.39.5-0+deb12u3 \ + gnupg=2.2.40-1.1+deb12u2 \ ca-certificates=20230311+deb12u1 \ ``` The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified by `apt-cache madison gnupg` against the exact base image SHA `node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`, `gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost ~3 MB compressed. Diff: **+1 / 0** in 1 file. ### Why this is the right fix (and not "lower the env var" or "remove the test") The fix isn't obvious unless you trace where `GNUPGHOME` came from. Walking that chain: 1. **PR NVIDIA#1121** (`fix(sandbox): restrict /sandbox to read-only via Landlock (NVIDIA#804)`, authored by @prekshivyas, merged 2026-04-08) made the `/sandbox` home directory Landlock-read-only to prevent agents from modifying their own runtime environment. 2. To keep tools that normally write under `~/...` working (gpg, git config, python history, npm prefix, etc.), that PR redirected each tool's homedir to a writable `/tmp/...` path via env vars in `scripts/nemoclaw-start.sh`. The relevant line is at `scripts/nemoclaw-start.sh:53`: ```sh 'GNUPGHOME=/tmp/.gnupg' ``` alongside `HISTFILE=/tmp/.bash_history`, `GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc. 3. PR NVIDIA#1121 also added three matching assertions in `test/service-env.test.js` (lines 177, 191, 347) verifying that the redirect is set: ```js expect(src).toContain("GNUPGHOME=/tmp/.gnupg"); ``` 4. **What PR NVIDIA#1121 didn't do**: add `gnupg` to the `apt-get install` list in `Dockerfile.base`. The env var setup landed and the test assertions landed, but the install line was missed. 5. CI never noticed because `service-env.test.js` only asserts that the env var is *set* in the source — it never spawns a subprocess that actually runs `gpg`. So a working test suite + a missing binary coexist silently. The QA report (this issue, NVIDIA#1640) catches it as a runtime failure on DGX Spark aarch64 because their test step does invoke `gpg --list-keys`. The clear intent of NVIDIA#1121 was to **enable** gpg under a redirected `GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg blocked. This PR is the matching install line that NVIDIA#1121 should have included, closing a one-line oversight rather than adding new capability or rolling anything back. ### Why not just remove the GNUPGHOME redirect The env var redirect from NVIDIA#1121 is doing real work — without it, any future `apt-get install gnupg` would still leave gpg unable to write to its homedir under Landlock-read-only `/sandbox`. The redirect is the "right" half of the pair; the install is the missing left half. ### Why this isn't a security regression The sandbox runs LLM-driven agents and gpg is a credential-handling tool, so it's worth justifying explicitly: - The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per session — no preloaded keys. - Without keys, gpg can hash/check signatures of public material but cannot decrypt or sign anything. - An agent would have to first import a key (which requires the user to provide it — keys are not pulled from anywhere automatically) before gpg becomes capable of any sensitive operation. - This is the same threat model as `git` and `curl`, which are already in the image and could equally be used to fetch arbitrary content. gpg adds no new capability that the existing toolchain doesn't already have. If the project explicitly *did* want gpg unavailable to agents, the right fix would be to remove the GNUPGHOME redirect from NVIDIA#1121 *and* the matching test assertions, not to keep the env wiring while leaving the binary missing — that's just confusing. ## Type of Change - [x] Code change for a new feature, bug fix, or refactor. - [ ] Code change with doc updates. - [ ] Doc only. Prose changes without code sample modifications. - [ ] Doc only. Includes code sample changes. ## Testing Smoke-tested locally by building `Dockerfile.base` with the fix and running the exact failing command from the bug report: ```sh $ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg . [...] => exporting to image 46.7s done $ docker run --rm nemoclaw-base-test:gnupg gpg --version gpg (GnuPG) 2.2.40 libgcrypt 1.10.1 $ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys gpg: directory '/root/.gnupg' created gpg: keybox '/root/.gnupg/pubring.kbx' created gpg: /root/.gnupg/trustdb.gpg: trustdb created (exit 0) # And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh: $ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \ sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys' gpg: keybox '/tmp/.gnupg/pubring.kbx' created (exit 0) ``` Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg` (matching what `nemoclaw-start.sh` exports) work as expected. The exact `gpg --list-keys` failure from the bug report no longer reproduces. - [x] `hadolint Dockerfile.base` — clean (no warnings) - [x] `docker build -f Dockerfile.base` — succeeds, exports to image cleanly - [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`) - [x] `gpg --list-keys` in built image — works (was `bash: gpg: command not found` before this PR) - [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches the runtime env from `nemoclaw-start.sh`) - [ ] `npx prek run --all-files` — partial: ran the affected hooks (commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli` against the full local suite because two pre-existing baseline failures on stock `main` get in the way on a WSL2 dev host (the `shouldPatchCoredns` issue addressed by PR NVIDIA#1626 (merged) and the install-preflight PATH leakage addressed by PR NVIDIA#1628 (open)). Upstream CI runs on Linux GHA runners and doesn't hit either of those, so it'll exercise the full suite normally. - [ ] `npm test` — same caveat as above, ran the relevant projects in isolation - [ ] `make docs` builds without warnings. (for doc-only changes — N/A) ## Checklist ### General - [x] I have read and followed the [contributing guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md). - [ ] I have read and followed the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). (for doc-only changes — N/A) ### Code Changes - [x] Formatters applied — `hadolint Dockerfile.base` clean. No JS/TS/Python files touched. - [x] Tests added or updated for new or changed behavior — N/A. The existing `service-env.test.js` already asserts the `GNUPGHOME` redirect introduced in NVIDIA#1121; this PR makes the corresponding binary available so those assertions reflect a runtime that actually works. A new test that spawns `gpg` directly inside a container would arguably be worth a follow-up (it would have caught this gap originally), but it's a separate concern from this one-line install fix. - [x] No secrets, API keys, or credentials committed. - [ ] Doc pages updated for any user-facing behavior changes — N/A. The bug report describes the expected behavior; this PR just makes runtime match it. No docs claim gpg is unavailable. ### Doc Changes - N/A (no doc changes) --- Signed-off-by: T Savo <evilgenius@nefariousplan.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Base system image now includes GnuPG as a pinned OS package. * **Bug Fixes / Security** * GnuPG runtime directory is now created in a separate step with stricter permissions and sandbox ownership when applicable, reducing exposure. * **Tests** * Test suite updated to verify the new directory creation and permission/ownership behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: T Savo <evilgenius@nefariousplan.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Summary
Restricts the
/sandboxhome directory to Landlock read-only, preventing agents from creating arbitrary files or modifying their runtime environment. Only explicitly declared paths remain writable.Key changes:
include_workdir: falsein the filesystem policy — verified against OpenShell'slandlock.rsthatinclude_workdir: trueadds WORKDIR toread_write, which would override ourread_onlyentry (Landlock grants the union of all matching rules)/sandboxfromread_writetoread_only/sandbox/.openclaw-data(agent state) and/sandbox/.nemoclaw(plugin state) asread_write/sandbox/.nemoclawparent (root:root 755) so the agent cannot rename or replace the root-ownedblueprints/directory. Onlystate/,migration/,snapshots/,staging/, andconfig.jsonare sandbox-owned for runtime writes..bashrc/.profileat image build time — they source proxy config from/tmp/nemoclaw-proxy-env.sh(sticky-bit protected, root-owned in root mode)/tmpinstead of sandbox-writable.openclaw-datato prevent agent content injection;rm -fbefore write prevents symlink-following attacks/tmpvia env vars in both the entrypoint and the sourcedproxy-env.sh(soopenshell sandbox connectsessions also get the redirects)Writable surface after this change:
/sandbox/sandbox/.openclaw/sandbox/.openclaw-data/sandbox/.nemoclaw/tmpRelated Issue
Closes #804
Changes
nemoclaw-blueprint/policies/openclaw-sandbox.yamlinclude_workdir: false,/sandbox→ read_only,/sandbox/.nemoclaw→ read_writeDockerfile.nemoclawparent + blueprints (root ownership), pre-create state/migration/snapshots/staging dirs and config.jsonDockerfile.base.bashrc/.profilesourcing/tmp/nemoclaw-proxy-env.shscripts/nemoclaw-start.sh/tmp/nemoclaw-proxy-env.shwith symlink protection, redirect tool dotfiles to/tmpdocs/deployment/sandbox-hardening.mdtest/service-env.test.jsTesting
nemoclaw onboardcompletes successfully (sandbox creation with new policy)openshell sandbox connect→ interactive shell works, proxy env vars are set/sandbox/.openclaw-data/workspace)/sandbox/(e.g.,touch /sandbox/testfails)openclaw gateway runstarts correctly (reads from read-only.openclaw/)/sandbox/.nemoclaw/state/)/sandbox/.nemoclaw/snapshots/,/sandbox/.nemoclaw/staging/)/sandbox/.nemoclaw/blueprints/is root-owned, parent is root-owned)Signed-off-by: Prekshi Vyas prekshivyas@gmail.com