Skip to content

fix(sandbox): restrict /sandbox to read-only via Landlock (#804)#1121

Merged
brandonpelfrey merged 46 commits intoNVIDIA:mainfrom
prekshivyas:security/read-only-sandbox-filesystem
Apr 8, 2026
Merged

fix(sandbox): restrict /sandbox to read-only via Landlock (#804)#1121
brandonpelfrey merged 46 commits intoNVIDIA:mainfrom
prekshivyas:security/read-only-sandbox-filesystem

Conversation

@prekshivyas
Copy link
Copy Markdown
Contributor

@prekshivyas prekshivyas commented Mar 30, 2026

Summary

Restricts the /sandbox home directory to Landlock read-only, preventing agents from creating arbitrary files or modifying their runtime environment. Only explicitly declared paths remain writable.

Key changes:

  • Set include_workdir: false in the filesystem policy — verified against OpenShell's landlock.rs that include_workdir: true adds WORKDIR to read_write, which would override our read_only entry (Landlock grants the union of all matching rules)
  • Move /sandbox from read_write to read_only
  • Keep /sandbox/.openclaw-data (agent state) and /sandbox/.nemoclaw (plugin state) as read_write
  • DAC-protect /sandbox/.nemoclaw parent (root:root 755) so the agent cannot rename or replace the root-owned blueprints/ directory. Only state/, migration/, snapshots/, staging/, and config.json are sandbox-owned for runtime writes.
  • Pre-create .bashrc/.profile at image build time — they source proxy config from /tmp/nemoclaw-proxy-env.sh (sticky-bit protected, root-owned in root mode)
  • Write proxy-env.sh to /tmp instead of sandbox-writable .openclaw-data to prevent agent content injection; rm -f before write prevents symlink-following attacks
  • Redirect tool dotfiles (npm, git, pip, bash, claude, node) to /tmp via env vars in both the entrypoint and the sourced proxy-env.sh (so openshell sandbox connect sessions also get the redirects)

Writable surface after this change:

Path Access Purpose
/sandbox read-only Home directory
/sandbox/.openclaw read-only Immutable gateway config
/sandbox/.openclaw-data read-write Agent state, workspace, plugins (via symlinks)
/sandbox/.nemoclaw read-write (Landlock) / root-owned (DAC) Plugin runtime dirs; parent and blueprints are root-owned
/tmp read-write Temp files, logs, tool caches, proxy-env.sh

Related Issue

Closes #804

Changes

File Change
nemoclaw-blueprint/policies/openclaw-sandbox.yaml include_workdir: false, /sandbox → read_only, /sandbox/.nemoclaw → read_write
Dockerfile DAC-protect .nemoclaw parent + blueprints (root ownership), pre-create state/migration/snapshots/staging dirs and config.json
Dockerfile.base Pre-create .bashrc/.profile sourcing /tmp/nemoclaw-proxy-env.sh
scripts/nemoclaw-start.sh Write proxy config to /tmp/nemoclaw-proxy-env.sh with symlink protection, redirect tool dotfiles to /tmp
docs/deployment/sandbox-hardening.md New "Read-Only Home Directory" section
test/service-env.test.js Updated proxy persistence tests for new path, added empty sed extraction guards

Testing

  • nemoclaw onboard completes successfully (sandbox creation with new policy)
  • openshell sandbox connect → interactive shell works, proxy env vars are set
  • Agent can write to workspace (/sandbox/.openclaw-data/workspace)
  • Agent cannot create files directly in /sandbox/ (e.g., touch /sandbox/test fails)
  • openclaw gateway run starts correctly (reads from read-only .openclaw/)
  • Plugin state persists across agent invocations (/sandbox/.nemoclaw/state/)
  • Snapshots and staging work (/sandbox/.nemoclaw/snapshots/, /sandbox/.nemoclaw/staging/)
  • Agent cannot rename or delete blueprints (/sandbox/.nemoclaw/blueprints/ is root-owned, parent is root-owned)

Signed-off-by: Prekshi Vyas prekshivyas@gmail.com

…stem policy (NVIDIA#804)

Tighten the Landlock filesystem policy so agents cannot write arbitrary
files in the /sandbox home directory. Only explicitly declared paths
remain writable (/sandbox/.openclaw-data, /sandbox/.nemoclaw, /tmp).

- Set include_workdir to false (verified against OpenShell landlock.rs:
  when true, WORKDIR is added to read_write, overriding read_only)
- Move /sandbox from read_write to read_only in the policy
- Add /sandbox/.nemoclaw to read_write for plugin state/config writes
- DAC-protect blueprints with root ownership (defense-in-depth)
- Pre-create .bashrc/.profile at build time (read-only home prevents
  runtime writes); source proxy config from writable proxy-env.sh
- Redirect tool dotfiles (npm, git, pip, bash, claude, node) to /tmp
  via env vars in both the entrypoint and the sourced proxy-env.sh
  so interactive connect sessions also get the redirects

Closes NVIDIA#804
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Build-time hardening pre-creates and DAC-protects /sandbox/.nemoclaw and shell init files; runtime entrypoint writes /tmp/nemoclaw-proxy-env.sh and redirects caches to /tmp; OpenShell/Landlock policy makes /sandbox read-only with only explicit subpaths writable. (48 words)

Changes

Cohort / File(s) Summary
Docker image build
Dockerfile, Dockerfile.base
Create /sandbox/.nemoclaw and subdirs; set ownership/perms (root:root 755 for blueprint dirs, sandbox:sandbox for runtime-writable subdirs); ensure /sandbox/.nemoclaw/config.json exists; pre-create /sandbox/.bashrc and /sandbox/.profile owned by sandbox:sandbox.
Sandbox policy
nemoclaw-blueprint/policies/openclaw-sandbox.yaml
Set include_workdir: false; mark /sandbox as read_only; remove broad /sandbox from read_write and add explicit /sandbox/.nemoclaw as read_write.
Entrypoint / startup script
scripts/nemoclaw-start.sh
Stop in-place edits to ~/.bashrc/~/.profile; emit /tmp/nemoclaw-proxy-env.sh containing proxy + cache/history/git/python env exports and chmod 644; make .env chmod failures non-fatal.
Documentation
docs/deployment/sandbox-hardening.md
Add "Read-Only Home Directory" section describing Landlock/OpenShell restrictions, allowed writable subpaths, prevented persistence vectors, pre-creation of shell init files; add reference to issue #804.
Tests
test/service-env.test.js
Switch tests to validate generated proxy-env.sh in a temp writable location (instead of modifying ~/.bashrc/~/.profile); update assertions for proxy values, cache redirect vars, idempotency, and gateway IP handling.

Sequence Diagram(s)

mermaid
sequenceDiagram
rect rgba(200,200,255,0.5)
participant Builder as Image Build
end
rect rgba(200,255,200,0.5)
participant Entrypoint as Container Entrypoint
end
rect rgba(255,200,200,0.5)
participant Policy as OpenShell/Landlock Policy
end
rect rgba(255,255,200,0.5)
participant Agent as Agent Process
end
Builder->>Filesystem: Create /sandbox/.bashrc, /sandbox/.profile, /sandbox/.nemoclaw/* and set owners/perms
Entrypoint->>Filesystem: Write /tmp/nemoclaw-proxy-env.sh (proxy + cache/history/git/python exports) and chmod 644
Entrypoint->>Env: Export cache/history/git/python paths pointing to /tmp
Policy->>Agent: Enforce /sandbox read-only, allow explicit writable subpaths
Agent->>Filesystem: Source /tmp/nemoclaw-proxy-env.sh and read allowed files
Agent->>Filesystem: Write only to declared writable subpaths (/sandbox/.nemoclaw, /sandbox/.openclaw-data, /tmp)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐰
I nudged the burrow, locked the blueprint chest,
Pre-made my shells so temp-proxy can rest,
Read-only tunnels, a few doors left to peep,
Safe crumbs in /tmp where caches softly sleep,
Hop in — the sandbox guards dreams while you sleep.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: restricting /sandbox to read-only via Landlock, with clear reference to issue #804.
Linked Issues check ✅ Passed The PR addresses all coding objectives from issue #804: read-only root filesystem, restricted writable paths, blueprint DAC hardening, environment variable redirects, and comprehensive filesystem policy updates.
Out of Scope Changes check ✅ Passed All changes directly support the read-only sandbox objective: Landlock policy updates, Docker build hardening, entrypoint proxy/cache redirects, tests for proxy config, and documentation of the new security model.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@prekshivyas prekshivyas changed the title security(sandbox): restrict /sandbox to read-only via Landlock (#804) fix(sandbox): restrict /sandbox to read-only via Landlock (#804) Mar 30, 2026
…proach

The entrypoint no longer writes proxy config directly to ~/.bashrc
(read-only home). Tests now verify that proxy-env.sh is written to
the writable data dir and that .bashrc sourcing works correctly.
The sed-extracted block contains the path in comments before the
variable assignment. replace() only swaps the first occurrence
(the comment), leaving the actual _PROXY_ENV_FILE assignment
pointing at /sandbox/.openclaw-data/ which doesn't exist in CI.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/deployment/sandbox-hardening.md (2)

85-86: Keep each sentence on its own source line in this intro.

The first sentence is split across two source lines, and the second shares the same line as the end of the first. Please give each sentence its own line. As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/deployment/sandbox-hardening.md` around lines 85 - 86, The intro
currently has two sentences on the same source line; split them so each sentence
is on its own line: ensure "The sandbox Landlock policy restricts `/sandbox`
(the agent's home directory) to read-only access." is one line and "Only
explicitly declared directories are writable:" is the following line, updating
the text in the same paragraph (no other changes).

103-105: Rewrite this in active voice and keep one sentence per line.

are pre-created is passive, and the sentence is wrapped across multiple source lines. As per coding guidelines, "Active voice required. Flag passive constructions." and "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/deployment/sandbox-hardening.md` around lines 103 - 105, Rewrite the
two-line passive sentence into active voice and ensure each sentence sits on its
own source line: change "Shell init files (`.bashrc`, `.profile`) are
pre-created at image build time and source runtime proxy configuration from the
writable `/sandbox/.openclaw-data/proxy-env.sh`." into two active-voice
sentences such as "The image build process pre-creates shell init files
`.bashrc` and `.profile`." and "These files source runtime proxy configuration
from `/sandbox/.openclaw-data/proxy-env.sh`." Place each sentence on its own
line in the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 151-159: The Dockerfile currently only makes
/sandbox/.nemoclaw/blueprints root-owned; instead ensure the parent is locked:
in the RUN that touches /sandbox/.nemoclaw, set ownership and permissions on the
parent directory (chown root:root /sandbox/.nemoclaw && chmod 755
/sandbox/.nemoclaw) before adjusting the blueprints subtree, then create the
runtime dirs (/sandbox/.nemoclaw/state and /sandbox/.nemoclaw/migration) and
chown those to sandbox:sandbox so only those are writable; update the existing
RUN that uses chown/chmod/mkdir to apply root ownership and 755 permissions to
/sandbox/.nemoclaw itself (and keep /sandbox/.nemoclaw/blueprints root:root) and
then chown only the state and migration dirs to sandbox.

In `@scripts/nemoclaw-start.sh`:
- Around line 248-273: The proxy env file is written into a sandbox-writable
directory (_PROXY_ENV_FILE="/sandbox/.openclaw-data/proxy-env.sh") which allows
a sandbox user to replace it with malicious shell code; instead write the proxy
env to a non-user-writable, root-owned location (for example create and use a
system-owned directory like /etc/openclaw or /var/lib/openclaw and set
ownership/mode) and update whatever startup/profile sourcing to point at that
path; ensure the write is done atomically and safely (create a temporary file in
the root-owned dir, set owner to root, chmod 0644, then rename into place) and
avoid following attacker symlinks (use safe file creation APIs or the install
command rather than plain cat > "$_PROXY_ENV_FILE"); also remove or stop
auto-sourcing any file from the sandbox-writable tree so agent-controlled files
cannot be executed at session startup.

---

Nitpick comments:
In `@docs/deployment/sandbox-hardening.md`:
- Around line 85-86: The intro currently has two sentences on the same source
line; split them so each sentence is on its own line: ensure "The sandbox
Landlock policy restricts `/sandbox` (the agent's home directory) to read-only
access." is one line and "Only explicitly declared directories are writable:" is
the following line, updating the text in the same paragraph (no other changes).
- Around line 103-105: Rewrite the two-line passive sentence into active voice
and ensure each sentence sits on its own source line: change "Shell init files
(`.bashrc`, `.profile`) are pre-created at image build time and source runtime
proxy configuration from the writable `/sandbox/.openclaw-data/proxy-env.sh`."
into two active-voice sentences such as "The image build process pre-creates
shell init files `.bashrc` and `.profile`." and "These files source runtime
proxy configuration from `/sandbox/.openclaw-data/proxy-env.sh`." Place each
sentence on its own line in the file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6c35344c-d7f9-4676-92dd-982615431c01

📥 Commits

Reviewing files that changed from the base of the PR and between 711b98e and ca44773.

📒 Files selected for processing (5)
  • Dockerfile
  • Dockerfile.base
  • docs/deployment/sandbox-hardening.md
  • nemoclaw-blueprint/policies/openclaw-sandbox.yaml
  • scripts/nemoclaw-start.sh

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/service-env.test.js (1)

187-225: Add guard for empty sed extraction to improve debuggability.

Unlike the extractProxyVars helper (lines 110-115) which throws a descriptive error when the sed extraction fails, this test would fail with a confusing ENOENT error at line 211 if the script structure changes and persistBlock is empty.

🛠️ Proposed fix to add consistency with existing pattern
       const persistBlock = execFileSync(
         "sed",
         ["-n", "/^_PROXY_URL=/,/^chmod 644/p", scriptPath],
         { encoding: "utf-8" }
       );
+      if (!persistBlock.trim()) {
+        throw new Error(
+          "Failed to extract proxy persistence block from scripts/nemoclaw-start.sh — " +
+          "the _PROXY_URL..chmod block may have been moved or renamed"
+        );
+      }
       const wrapper = [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 187 - 225, The test "entrypoint writes
proxy-env.sh to writable data dir" extracts a persistBlock via sed but doesn't
guard against an empty result, causing a confusing ENOENT later; add the same
defensive check used by extractProxyVars (throw a descriptive error when the sed
extraction returns an empty string) before writing/executing tmpFile so failures
in script structure are reported clearly—specifically check the persistBlock
variable after the execFileSync sed call in this test and throw or assert with a
helpful message if it's empty (refer to persistBlock and the extractProxyVars
pattern for the exact guard behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/service-env.test.js`:
- Around line 187-225: The test "entrypoint writes proxy-env.sh to writable data
dir" extracts a persistBlock via sed but doesn't guard against an empty result,
causing a confusing ENOENT later; add the same defensive check used by
extractProxyVars (throw a descriptive error when the sed extraction returns an
empty string) before writing/executing tmpFile so failures in script structure
are reported clearly—specifically check the persistBlock variable after the
execFileSync sed call in this test and throw or assert with a helpful message if
it's empty (refer to persistBlock and the extractProxyVars pattern for the exact
guard behavior).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 21f50e8b-a606-4eeb-881c-034528b7744f

📥 Commits

Reviewing files that changed from the base of the PR and between ca44773 and 7c42034.

📒 Files selected for processing (1)
  • test/service-env.test.js

prekshivyas and others added 6 commits March 30, 2026 16:36
Address CodeRabbit review findings:

- Lock /sandbox/.nemoclaw parent directory (root:root 755) so the agent
  cannot rename or replace the root-owned blueprints directory
- Pre-create config.json and snapshots/ as sandbox-owned for runtime writes
- Move proxy-env.sh from sandbox-writable .openclaw-data to /tmp where
  sticky-bit protection prevents the sandbox user from tampering with
  the root-owned file
- Add rm -f before write to prevent symlink-following attacks
- Add empty sed extraction guards in proxy persistence tests
- Fix docs: one sentence per line, active voice

Ref: NVIDIA#804
@wscurran wscurran added security Something isn't secure priority: high Important issue that should be resolved in the next release fix labels Mar 31, 2026
@wscurran
Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this PR with a detailed summary, it proposes restricting the sandbox environment to address a potential security issue.

prekshivyas and others added 6 commits March 31, 2026 11:58
…DIA#804)

DAC tests (Docker-only, test/e2e-gateway-isolation.sh):
- Tests 13-25: verify sandbox user cannot write to /sandbox, .nemoclaw
  parent, blueprints, .openclaw dir; verify sandbox CAN write to state,
  migration, snapshots, staging, config.json, .openclaw-data
- Fix test 9: add missing `memory` symlink to verification list

Landlock tests (OpenShell/Brev, checks/04-landlock-readonly.sh):
- 8 tests verifying kernel-level read-only enforcement on /sandbox
- Closes DAC gap: .bashrc/.profile are sandbox-owned but Landlock
  read_only prevents agent from injecting malicious env vars

Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
…DIA#804)

/sandbox is sandbox-owned (DAC allows writes). Read-only enforcement
comes from Landlock at runtime, which is tested in the Brev e2e suite
(checks/04-landlock-readonly.sh). Renumber remaining tests 13-24.

Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
…IDIA#804)

The base image on GHCR hasn't been rebuilt with pre-baked shell init
files yet. Skip tests 23-24 gracefully instead of failing when the
files don't exist. Tests will auto-activate after base image rebuild.

Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
Signed-off-by: Prekshi Vimadalal <pvimadalal@nvidia.com>
@prekshivyas prekshivyas force-pushed the security/read-only-sandbox-filesystem branch from 86b36bc to 00b1190 Compare April 7, 2026 21:18
prekshivyas and others added 4 commits April 7, 2026 16:08
Merge resolution:
- Dockerfile.base: keep logs, credentials, sandbox dirs plus telegram
  from main; remove duplicate credentials symlink
- sandbox-hardening.md: keep NVIDIA#804 reference, adopt main colon format
- Policy YAML: only Landlock changes, all endpoint rules from main preserved

Review concerns addressed:
- Single source of truth for tool-cache redirects (_TOOL_REDIRECTS array)
- .env chmod logs warning instead of silently swallowing failure
- Landlock kernel requirements documented in sandbox-hardening.md
- telegram added to e2e symlink check list
- Non-root mode e2e test (Test 25)
- Symlink attack prevention test for proxy-env.sh
- Updated test assertions for new proxy-env.sh format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

The variable was referenced at 3 locations in export_gateway_token() and
install_configure_guard() but never assigned, causing an unbound-variable
crash under set -euo pipefail during sandbox creation.

Closes NVIDIA#1609

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
install -o/-g requires root to chown. In non-root mode (uid != 0),
use mkdir -p instead — directories are already owned by the current
user. Fixes e2e test-25 (non-root command execution).

Signed-off-by: Prekshi Vyas <prekshivyas@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@prekshivyas prekshivyas force-pushed the security/read-only-sandbox-filesystem branch from d9005af to 2845415 Compare April 8, 2026 16:04
@brandonpelfrey
Copy link
Copy Markdown
Collaborator

 Verified as root inside container
 - /sandbox is sandbox-owned
 - /sandbox/.openclaw is root:root
 - /sandbox/.openclaw-data is sandbox:sandbox
 - /sandbox/.nemoclaw is root:root
 - /sandbox/.nemoclaw/blueprints is root:root
 - writable subdirs are sandbox:sandbox:
     - /sandbox/.nemoclaw/state
     - /sandbox/.nemoclaw/migration
     - /sandbox/.nemoclaw/snapshots
     - /sandbox/.nemoclaw/staging
 - /tmp is writable sticky tmpfs-style location

 Verified as sandbox user
 Blocked writes:
 - cannot write /sandbox/.openclaw/openclaw.json
 - cannot create files in /sandbox/.openclaw/
 - cannot write /sandbox/.nemoclaw/blueprints/
 - cannot create files directly in /sandbox/.nemoclaw/

 Allowed writes:
 - can write /sandbox/.nemoclaw/state/
 - can write /sandbox/.nemoclaw/migration/
 - can write /sandbox/.nemoclaw/snapshots/
 - can write /sandbox/.nemoclaw/staging/
 - can write /sandbox/.nemoclaw/config.json
 - can write /sandbox/.openclaw-data/

 Integrity check
 - sha256sum -c /sandbox/.openclaw/.config-hash passed

 Non-root fallback
 - Re-tested the earlier failure case:
     - docker run --rm --entrypoint "" --user 1000:1000 nemoclaw-rw-claim echo NON_ROOT_EXEC_OK
 - Result:
     - passed
 - So the previous non-root regression is fixed in the latest PR head.

 Conclusion
 - The PR’s R/W accessibility claims are now holding up in the built sandbox image:
     - immutable/root-owned:
           - /sandbox/.openclaw
           - /sandbox/.nemoclaw
           - /sandbox/.nemoclaw/blueprints
     - writable by sandbox:
           - /sandbox/.openclaw-data
           - /sandbox/.nemoclaw/state
           - /sandbox/.nemoclaw/migration
           - /sandbox/.nemoclaw/snapshots
           - /sandbox/.nemoclaw/staging
           - /sandbox/.nemoclaw/config.json
           - /tmp
 - The earlier non-root execution issue appears resolved by the latest PR update.

@prekshivyas prekshivyas force-pushed the security/read-only-sandbox-filesystem branch from 7a25686 to 80482b2 Compare April 8, 2026 18:00
…VIDIA#1607)

OpenShell's prepare_filesystem() chowns every read_write path to
run_as_user at sandbox start, flipping /sandbox/.nemoclaw from root:root
to sandbox:sandbox. This removed the DAC protection preventing the agent
from renaming blueprints/.

The sticky bit (1755) survives the ownership flip and prevents the
sandbox user from renaming or deleting root-owned entries like
blueprints/, while still allowing writes to sandbox-owned subdirs
(state/, migration/, snapshots/, staging/, config.json).

Note: this mitigates the security impact but does not prevent the
ownership change itself — that requires an OpenShell-side fix in
prepare_filesystem().

Mitigates NVIDIA#1607

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@prekshivyas prekshivyas force-pushed the security/read-only-sandbox-filesystem branch from 80482b2 to 29f44ca Compare April 8, 2026 18:04
@brandonpelfrey brandonpelfrey merged commit 045a340 into NVIDIA:main Apr 8, 2026
10 checks passed
TSavo added a commit to TSavo/NemoClaw that referenced this pull request Apr 8, 2026
The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) does not
include the gnupg package, even though `nemoclaw-start.sh` already
exports `GNUPGHOME=/tmp/.gnupg` and `test/service-env.test.js` asserts
the redirect is in place. As a result `gpg --list-keys` (or any other
gpg invocation) inside the sandbox fails with `bash: gpg: command not
found`, breaking workflows that expect signing/verification to be
available — including the smoke check QA reported on DGX Spark
(aarch64).

The GNUPGHOME redirect was introduced in NVIDIA#1121 ("restrict /sandbox to
read-only via Landlock") to keep gpg writable when `~/.gnupg` became
unwritable, but the matching `apt-get install gnupg` line was never
added to `Dockerfile.base`. The service-env tests assert the env var
setup but don't actually invoke gpg, so CI never noticed the binary
was missing.

This adds `gnupg=2.2.40-1.1+deb12u2` (the bookworm-pinned version,
matching the existing `=<version>` pinning style for every other
package in the same `apt-get install` block) right after `git`. No
other changes — same `--no-install-recommends`, same cleanup tail.

The package brings in dirmngr, gpg-wks-server, and gpg-wks-client as
dependencies (per a clean install probe in the exact base image SHA).
Total layer cost ~3 MB compressed.

Smoke tested locally by building Dockerfile.base with the fix and
running the exact failing command from the bug report:

  $ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg .
  $ docker run --rm nemoclaw-base-test:gnupg gpg --version
  gpg (GnuPG) 2.2.40
  $ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys
  gpg: directory '/root/.gnupg' created
  gpg: keybox '/root/.gnupg/pubring.kbx' created
  gpg: /root/.gnupg/trustdb.gpg: trustdb created
  (exit 0)
  $ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \
      sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys'
  gpg: keybox '/tmp/.gnupg/pubring.kbx' created
  (exit 0)

Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg`
(matching what `nemoclaw-start.sh` exports) work as expected.

Closes NVIDIA#1640.

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
cv added a commit that referenced this pull request Apr 9, 2026
…1649)

<!-- markdownlint-disable MD041 -->
## Summary

The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is
missing the `gnupg` package — `gpg --list-keys` (and any other gpg
invocation) fails with `bash: gpg: command not found` inside the
sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to
the existing `apt-get install` block in `Dockerfile.base`, restoring the
binary that the rest of the codebase already assumes is present.

## Related Issue

Closes #1640.

## Changes

`Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing
`apt-get install` block, slotted right after `git`. Same
`--no-install-recommends`, same cleanup tail, same `=<version>` pinning
style as every other package in the block.

```diff
        curl=7.88.1-10+deb12u14 \
        git=1:2.39.5-0+deb12u3 \
+       gnupg=2.2.40-1.1+deb12u2 \
        ca-certificates=20230311+deb12u1 \
```

The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified
by `apt-cache madison gnupg` against the exact base image SHA
`node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`,
`gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost
~3 MB compressed.

Diff: **+1 / 0** in 1 file.

### Why this is the right fix (and not "lower the env var" or "remove
the test")

The fix isn't obvious unless you trace where `GNUPGHOME` came from.
Walking that chain:

1. **PR #1121** (`fix(sandbox): restrict /sandbox to read-only via
Landlock (#804)`, authored by @prekshivyas, merged 2026-04-08) made the
`/sandbox` home directory Landlock-read-only to prevent agents from
modifying their own runtime environment.
2. To keep tools that normally write under `~/...` working (gpg, git
config, python history, npm prefix, etc.), that PR redirected each
tool's homedir to a writable `/tmp/...` path via env vars in
`scripts/nemoclaw-start.sh`. The relevant line is at
`scripts/nemoclaw-start.sh:53`:
   ```sh
   'GNUPGHOME=/tmp/.gnupg'
   ```
alongside `HISTFILE=/tmp/.bash_history`,
`GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc.
3. PR #1121 also added three matching assertions in
`test/service-env.test.js` (lines 177, 191, 347) verifying that the
redirect is set:
   ```js
   expect(src).toContain("GNUPGHOME=/tmp/.gnupg");
   ```
4. **What PR #1121 didn't do**: add `gnupg` to the `apt-get install`
list in `Dockerfile.base`. The env var setup landed and the test
assertions landed, but the install line was missed.
5. CI never noticed because `service-env.test.js` only asserts that the
env var is *set* in the source — it never spawns a subprocess that
actually runs `gpg`. So a working test suite + a missing binary coexist
silently. The QA report (this issue, #1640) catches it as a runtime
failure on DGX Spark aarch64 because their test step does invoke `gpg
--list-keys`.

The clear intent of #1121 was to **enable** gpg under a redirected
`GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg
blocked. This PR is the matching install line that #1121 should have
included, closing a one-line oversight rather than adding new capability
or rolling anything back.

### Why not just remove the GNUPGHOME redirect

The env var redirect from #1121 is doing real work — without it, any
future `apt-get install gnupg` would still leave gpg unable to write to
its homedir under Landlock-read-only `/sandbox`. The redirect is the
"right" half of the pair; the install is the missing left half.

### Why this isn't a security regression

The sandbox runs LLM-driven agents and gpg is a credential-handling
tool, so it's worth justifying explicitly:

- The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per
session — no preloaded keys.
- Without keys, gpg can hash/check signatures of public material but
cannot decrypt or sign anything.
- An agent would have to first import a key (which requires the user to
provide it — keys are not pulled from anywhere automatically) before gpg
becomes capable of any sensitive operation.
- This is the same threat model as `git` and `curl`, which are already
in the image and could equally be used to fetch arbitrary content. gpg
adds no new capability that the existing toolchain doesn't already have.

If the project explicitly *did* want gpg unavailable to agents, the
right fix would be to remove the GNUPGHOME redirect from #1121 *and* the
matching test assertions, not to keep the env wiring while leaving the
binary missing — that's just confusing.

## Type of Change

- [x] Code change for a new feature, bug fix, or refactor.
- [ ] Code change with doc updates.
- [ ] Doc only. Prose changes without code sample modifications.
- [ ] Doc only. Includes code sample changes.

## Testing

Smoke-tested locally by building `Dockerfile.base` with the fix and
running the exact failing command from the bug report:

```sh
$ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg .
[...]
=> exporting to image  46.7s done

$ docker run --rm nemoclaw-base-test:gnupg gpg --version
gpg (GnuPG) 2.2.40
libgcrypt 1.10.1

$ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
(exit 0)

# And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh:
$ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \
    sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys'
gpg: keybox '/tmp/.gnupg/pubring.kbx' created
(exit 0)
```

Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg`
(matching what `nemoclaw-start.sh` exports) work as expected. The exact
`gpg --list-keys` failure from the bug report no longer reproduces.

- [x] `hadolint Dockerfile.base` — clean (no warnings)
- [x] `docker build -f Dockerfile.base` — succeeds, exports to image
cleanly
- [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`)
- [x] `gpg --list-keys` in built image — works (was `bash: gpg: command
not found` before this PR)
- [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches
the runtime env from `nemoclaw-start.sh`)
- [ ] `npx prek run --all-files` — partial: ran the affected hooks
(commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli`
against the full local suite because two pre-existing baseline failures
on stock `main` get in the way on a WSL2 dev host (the
`shouldPatchCoredns` issue addressed by PR #1626 (merged) and the
install-preflight PATH leakage addressed by PR #1628 (open)). Upstream
CI runs on Linux GHA runners and doesn't hit either of those, so it'll
exercise the full suite normally.
- [ ] `npm test` — same caveat as above, ran the relevant projects in
isolation
- [ ] `make docs` builds without warnings. (for doc-only changes — N/A)

## Checklist

### General

- [x] I have read and followed the [contributing
guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
- [ ] I have read and followed the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
(for doc-only changes — N/A)

### Code Changes

- [x] Formatters applied — `hadolint Dockerfile.base` clean. No
JS/TS/Python files touched.
- [x] Tests added or updated for new or changed behavior — N/A. The
existing `service-env.test.js` already asserts the `GNUPGHOME` redirect
introduced in #1121; this PR makes the corresponding binary available so
those assertions reflect a runtime that actually works. A new test that
spawns `gpg` directly inside a container would arguably be worth a
follow-up (it would have caught this gap originally), but it's a
separate concern from this one-line install fix.
- [x] No secrets, API keys, or credentials committed.
- [ ] Doc pages updated for any user-facing behavior changes — N/A. The
bug report describes the expected behavior; this PR just makes runtime
match it. No docs claim gpg is unavailable.

### Doc Changes

- N/A (no doc changes)

---
Signed-off-by: T Savo <evilgenius@nefariousplan.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Base system image now includes GnuPG as a pinned OS package.

* **Bug Fixes / Security**
* GnuPG runtime directory is now created in a separate step with
stricter permissions and sandbox ownership when applicable, reducing
exposure.

* **Tests**
* Test suite updated to verify the new directory creation and
permission/ownership behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
ericksoa pushed a commit to cheese-head/NemoClaw that referenced this pull request Apr 14, 2026
…VIDIA#1649)

<!-- markdownlint-disable MD041 -->
## Summary

The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is
missing the `gnupg` package — `gpg --list-keys` (and any other gpg
invocation) fails with `bash: gpg: command not found` inside the
sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to
the existing `apt-get install` block in `Dockerfile.base`, restoring the
binary that the rest of the codebase already assumes is present.

## Related Issue

Closes NVIDIA#1640.

## Changes

`Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing
`apt-get install` block, slotted right after `git`. Same
`--no-install-recommends`, same cleanup tail, same `=<version>` pinning
style as every other package in the block.

```diff
        curl=7.88.1-10+deb12u14 \
        git=1:2.39.5-0+deb12u3 \
+       gnupg=2.2.40-1.1+deb12u2 \
        ca-certificates=20230311+deb12u1 \
```

The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified
by `apt-cache madison gnupg` against the exact base image SHA
`node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`,
`gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost
~3 MB compressed.

Diff: **+1 / 0** in 1 file.

### Why this is the right fix (and not "lower the env var" or "remove
the test")

The fix isn't obvious unless you trace where `GNUPGHOME` came from.
Walking that chain:

1. **PR NVIDIA#1121** (`fix(sandbox): restrict /sandbox to read-only via
Landlock (NVIDIA#804)`, authored by @prekshivyas, merged 2026-04-08) made the
`/sandbox` home directory Landlock-read-only to prevent agents from
modifying their own runtime environment.
2. To keep tools that normally write under `~/...` working (gpg, git
config, python history, npm prefix, etc.), that PR redirected each
tool's homedir to a writable `/tmp/...` path via env vars in
`scripts/nemoclaw-start.sh`. The relevant line is at
`scripts/nemoclaw-start.sh:53`:
   ```sh
   'GNUPGHOME=/tmp/.gnupg'
   ```
alongside `HISTFILE=/tmp/.bash_history`,
`GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc.
3. PR NVIDIA#1121 also added three matching assertions in
`test/service-env.test.js` (lines 177, 191, 347) verifying that the
redirect is set:
   ```js
   expect(src).toContain("GNUPGHOME=/tmp/.gnupg");
   ```
4. **What PR NVIDIA#1121 didn't do**: add `gnupg` to the `apt-get install`
list in `Dockerfile.base`. The env var setup landed and the test
assertions landed, but the install line was missed.
5. CI never noticed because `service-env.test.js` only asserts that the
env var is *set* in the source — it never spawns a subprocess that
actually runs `gpg`. So a working test suite + a missing binary coexist
silently. The QA report (this issue, NVIDIA#1640) catches it as a runtime
failure on DGX Spark aarch64 because their test step does invoke `gpg
--list-keys`.

The clear intent of NVIDIA#1121 was to **enable** gpg under a redirected
`GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg
blocked. This PR is the matching install line that NVIDIA#1121 should have
included, closing a one-line oversight rather than adding new capability
or rolling anything back.

### Why not just remove the GNUPGHOME redirect

The env var redirect from NVIDIA#1121 is doing real work — without it, any
future `apt-get install gnupg` would still leave gpg unable to write to
its homedir under Landlock-read-only `/sandbox`. The redirect is the
"right" half of the pair; the install is the missing left half.

### Why this isn't a security regression

The sandbox runs LLM-driven agents and gpg is a credential-handling
tool, so it's worth justifying explicitly:

- The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per
session — no preloaded keys.
- Without keys, gpg can hash/check signatures of public material but
cannot decrypt or sign anything.
- An agent would have to first import a key (which requires the user to
provide it — keys are not pulled from anywhere automatically) before gpg
becomes capable of any sensitive operation.
- This is the same threat model as `git` and `curl`, which are already
in the image and could equally be used to fetch arbitrary content. gpg
adds no new capability that the existing toolchain doesn't already have.

If the project explicitly *did* want gpg unavailable to agents, the
right fix would be to remove the GNUPGHOME redirect from NVIDIA#1121 *and* the
matching test assertions, not to keep the env wiring while leaving the
binary missing — that's just confusing.

## Type of Change

- [x] Code change for a new feature, bug fix, or refactor.
- [ ] Code change with doc updates.
- [ ] Doc only. Prose changes without code sample modifications.
- [ ] Doc only. Includes code sample changes.

## Testing

Smoke-tested locally by building `Dockerfile.base` with the fix and
running the exact failing command from the bug report:

```sh
$ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg .
[...]
=> exporting to image  46.7s done

$ docker run --rm nemoclaw-base-test:gnupg gpg --version
gpg (GnuPG) 2.2.40
libgcrypt 1.10.1

$ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
(exit 0)

# And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh:
$ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \
    sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys'
gpg: keybox '/tmp/.gnupg/pubring.kbx' created
(exit 0)
```

Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg`
(matching what `nemoclaw-start.sh` exports) work as expected. The exact
`gpg --list-keys` failure from the bug report no longer reproduces.

- [x] `hadolint Dockerfile.base` — clean (no warnings)
- [x] `docker build -f Dockerfile.base` — succeeds, exports to image
cleanly
- [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`)
- [x] `gpg --list-keys` in built image — works (was `bash: gpg: command
not found` before this PR)
- [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches
the runtime env from `nemoclaw-start.sh`)
- [ ] `npx prek run --all-files` — partial: ran the affected hooks
(commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli`
against the full local suite because two pre-existing baseline failures
on stock `main` get in the way on a WSL2 dev host (the
`shouldPatchCoredns` issue addressed by PR NVIDIA#1626 (merged) and the
install-preflight PATH leakage addressed by PR NVIDIA#1628 (open)). Upstream
CI runs on Linux GHA runners and doesn't hit either of those, so it'll
exercise the full suite normally.
- [ ] `npm test` — same caveat as above, ran the relevant projects in
isolation
- [ ] `make docs` builds without warnings. (for doc-only changes — N/A)

## Checklist

### General

- [x] I have read and followed the [contributing
guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
- [ ] I have read and followed the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
(for doc-only changes — N/A)

### Code Changes

- [x] Formatters applied — `hadolint Dockerfile.base` clean. No
JS/TS/Python files touched.
- [x] Tests added or updated for new or changed behavior — N/A. The
existing `service-env.test.js` already asserts the `GNUPGHOME` redirect
introduced in NVIDIA#1121; this PR makes the corresponding binary available so
those assertions reflect a runtime that actually works. A new test that
spawns `gpg` directly inside a container would arguably be worth a
follow-up (it would have caught this gap originally), but it's a
separate concern from this one-line install fix.
- [x] No secrets, API keys, or credentials committed.
- [ ] Doc pages updated for any user-facing behavior changes — N/A. The
bug report describes the expected behavior; this PR just makes runtime
match it. No docs claim gpg is unavailable.

### Doc Changes

- N/A (no doc changes)

---
Signed-off-by: T Savo <evilgenius@nefariousplan.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Base system image now includes GnuPG as a pinned OS package.

* **Bug Fixes / Security**
* GnuPG runtime directory is now created in a separate step with
stricter permissions and sandbox ownership when applicable, reducing
exposure.

* **Tests**
* Test suite updated to verify the new directory creation and
permission/ownership behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix priority: high Important issue that should be resolved in the next release security Something isn't secure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SECURITY] No read-only root filesystem — writable /sandbox increases attack surface

5 participants