fix(platform): allow shouldPatchCoredns isWsl override for deterministic WSL2 tests#1626
Conversation
📝 WalkthroughWalkthroughAdds an explicit Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR makes shouldPatchCoredns tests deterministic on WSL2 hosts by allowing callers to explicitly override WSL detection, avoiding reliance on host kernel introspection during unit tests.
Changes:
- Add an
opts.isWslboolean short-circuit toisWsl()so callers can bypass host detection when desired. - Update
shouldPatchCorednstests to pass{ isWsl: false }for non-WSL behavior and add a{ isWsl: true }case to lock the WSL branch.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| bin/lib/platform.js | Adds an explicit opts.isWsl override to make WSL detection controllable by callers (primarily for deterministic tests). |
| test/platform.test.js | Pins shouldPatchCoredns behavior via { isWsl: ... } and adds coverage for the WSL path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| it("patches CoreDNS for Colima and Podman (non-WSL host)", () => { | ||
| expect(shouldPatchCoredns("colima", { isWsl: false })).toBe(true); | ||
| expect(shouldPatchCoredns("podman", { isWsl: false })).toBe(true); | ||
| expect(shouldPatchCoredns("docker-desktop", { isWsl: false })).toBe(false); | ||
| expect(shouldPatchCoredns("docker", { isWsl: false })).toBe(false); | ||
| }); | ||
|
|
||
| it("never patches CoreDNS on WSL2 (host DNS unreachable from k3s pods)", () => { | ||
| expect(shouldPatchCoredns("colima", { isWsl: true })).toBe(false); | ||
| expect(shouldPatchCoredns("podman", { isWsl: true })).toBe(false); | ||
| expect(shouldPatchCoredns("docker-desktop", { isWsl: true })).toBe(false); | ||
| expect(shouldPatchCoredns("docker", { isWsl: true })).toBe(false); |
There was a problem hiding this comment.
The test titles/descriptions imply they depend on the host being non-WSL/WSL2, but the behavior is actually pinned via the { isWsl: ... } override and should pass on any host. Consider renaming these to reflect the simulated condition (e.g., "when isWsl is false" / "when isWsl is true" or "on WSL" instead of "on WSL2") to avoid confusion for future maintainers running the suite on different environments.
…tic WSL2 tests
`shouldPatchCoredns` consults `isWsl()`, which checks `os.release()` for the
"microsoft" tag in addition to env vars. On WSL2 hosts (a common Linux dev
setup under Windows), `os.release()` always returns a "microsoft"-tagged
string, so `isWsl()` returns true regardless of how the test sets up env
vars — and `shouldPatchCoredns("colima")` flips from `true` to `false`. The
existing test `expect(shouldPatchCoredns("colima")).toBe(true)` then fails
on WSL2 dev machines while passing in CI.
This adds an explicit `opts.isWsl` short-circuit to `isWsl()`. If a caller
passes a boolean for `isWsl`, that value is returned directly and the
host-detection logic is bypassed. Production callers (which don't pass an
`isWsl` opt) keep the existing behavior unchanged.
The `shouldPatchCoredns` test now passes `{ isWsl: false }` so the four
runtime-matching assertions exercise the *function* rather than the
*kernel*. A second test case pins the WSL2 branch with `{ isWsl: true }`
so future changes to "skip CoreDNS patching on WSL" are caught.
Closes NVIDIA#1621 (sub-item 3 of three).
Signed-off-by: T Savo <evilgenius@nefariousplan.com>
c16e506 to
15840cb
Compare
|
Force-pushed to add a Ready for a maintainer to approve workflow runs whenever you have a moment — first-time-contributor gate. |
cv
left a comment
There was a problem hiding this comment.
LGTM — security review PASS.
- Override is unreachable in production (all callers pass no
opts) - Strict
typeof === "boolean"guard prevents non-boolean values - Consistent with existing dependency-injection pattern in
platform.js - Improves test coverage of the WSL2 branch
No concerns.
|
Thanks for the merge and the explicit security review @cv — particularly noting the production-unreachability and the strict For tracking: this is one of three test-environment fixes from #1621. The other two (#1627 cli timeout, #1628 install-preflight sysbin isolation) are still open and have actionable reviewer feedback I just addressed in fresh commits. |
…1649) <!-- markdownlint-disable MD041 --> ## Summary The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is missing the `gnupg` package — `gpg --list-keys` (and any other gpg invocation) fails with `bash: gpg: command not found` inside the sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to the existing `apt-get install` block in `Dockerfile.base`, restoring the binary that the rest of the codebase already assumes is present. ## Related Issue Closes #1640. ## Changes `Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing `apt-get install` block, slotted right after `git`. Same `--no-install-recommends`, same cleanup tail, same `=<version>` pinning style as every other package in the block. ```diff curl=7.88.1-10+deb12u14 \ git=1:2.39.5-0+deb12u3 \ + gnupg=2.2.40-1.1+deb12u2 \ ca-certificates=20230311+deb12u1 \ ``` The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified by `apt-cache madison gnupg` against the exact base image SHA `node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`, `gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost ~3 MB compressed. Diff: **+1 / 0** in 1 file. ### Why this is the right fix (and not "lower the env var" or "remove the test") The fix isn't obvious unless you trace where `GNUPGHOME` came from. Walking that chain: 1. **PR #1121** (`fix(sandbox): restrict /sandbox to read-only via Landlock (#804)`, authored by @prekshivyas, merged 2026-04-08) made the `/sandbox` home directory Landlock-read-only to prevent agents from modifying their own runtime environment. 2. To keep tools that normally write under `~/...` working (gpg, git config, python history, npm prefix, etc.), that PR redirected each tool's homedir to a writable `/tmp/...` path via env vars in `scripts/nemoclaw-start.sh`. The relevant line is at `scripts/nemoclaw-start.sh:53`: ```sh 'GNUPGHOME=/tmp/.gnupg' ``` alongside `HISTFILE=/tmp/.bash_history`, `GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc. 3. PR #1121 also added three matching assertions in `test/service-env.test.js` (lines 177, 191, 347) verifying that the redirect is set: ```js expect(src).toContain("GNUPGHOME=/tmp/.gnupg"); ``` 4. **What PR #1121 didn't do**: add `gnupg` to the `apt-get install` list in `Dockerfile.base`. The env var setup landed and the test assertions landed, but the install line was missed. 5. CI never noticed because `service-env.test.js` only asserts that the env var is *set* in the source — it never spawns a subprocess that actually runs `gpg`. So a working test suite + a missing binary coexist silently. The QA report (this issue, #1640) catches it as a runtime failure on DGX Spark aarch64 because their test step does invoke `gpg --list-keys`. The clear intent of #1121 was to **enable** gpg under a redirected `GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg blocked. This PR is the matching install line that #1121 should have included, closing a one-line oversight rather than adding new capability or rolling anything back. ### Why not just remove the GNUPGHOME redirect The env var redirect from #1121 is doing real work — without it, any future `apt-get install gnupg` would still leave gpg unable to write to its homedir under Landlock-read-only `/sandbox`. The redirect is the "right" half of the pair; the install is the missing left half. ### Why this isn't a security regression The sandbox runs LLM-driven agents and gpg is a credential-handling tool, so it's worth justifying explicitly: - The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per session — no preloaded keys. - Without keys, gpg can hash/check signatures of public material but cannot decrypt or sign anything. - An agent would have to first import a key (which requires the user to provide it — keys are not pulled from anywhere automatically) before gpg becomes capable of any sensitive operation. - This is the same threat model as `git` and `curl`, which are already in the image and could equally be used to fetch arbitrary content. gpg adds no new capability that the existing toolchain doesn't already have. If the project explicitly *did* want gpg unavailable to agents, the right fix would be to remove the GNUPGHOME redirect from #1121 *and* the matching test assertions, not to keep the env wiring while leaving the binary missing — that's just confusing. ## Type of Change - [x] Code change for a new feature, bug fix, or refactor. - [ ] Code change with doc updates. - [ ] Doc only. Prose changes without code sample modifications. - [ ] Doc only. Includes code sample changes. ## Testing Smoke-tested locally by building `Dockerfile.base` with the fix and running the exact failing command from the bug report: ```sh $ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg . [...] => exporting to image 46.7s done $ docker run --rm nemoclaw-base-test:gnupg gpg --version gpg (GnuPG) 2.2.40 libgcrypt 1.10.1 $ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys gpg: directory '/root/.gnupg' created gpg: keybox '/root/.gnupg/pubring.kbx' created gpg: /root/.gnupg/trustdb.gpg: trustdb created (exit 0) # And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh: $ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \ sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys' gpg: keybox '/tmp/.gnupg/pubring.kbx' created (exit 0) ``` Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg` (matching what `nemoclaw-start.sh` exports) work as expected. The exact `gpg --list-keys` failure from the bug report no longer reproduces. - [x] `hadolint Dockerfile.base` — clean (no warnings) - [x] `docker build -f Dockerfile.base` — succeeds, exports to image cleanly - [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`) - [x] `gpg --list-keys` in built image — works (was `bash: gpg: command not found` before this PR) - [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches the runtime env from `nemoclaw-start.sh`) - [ ] `npx prek run --all-files` — partial: ran the affected hooks (commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli` against the full local suite because two pre-existing baseline failures on stock `main` get in the way on a WSL2 dev host (the `shouldPatchCoredns` issue addressed by PR #1626 (merged) and the install-preflight PATH leakage addressed by PR #1628 (open)). Upstream CI runs on Linux GHA runners and doesn't hit either of those, so it'll exercise the full suite normally. - [ ] `npm test` — same caveat as above, ran the relevant projects in isolation - [ ] `make docs` builds without warnings. (for doc-only changes — N/A) ## Checklist ### General - [x] I have read and followed the [contributing guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md). - [ ] I have read and followed the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). (for doc-only changes — N/A) ### Code Changes - [x] Formatters applied — `hadolint Dockerfile.base` clean. No JS/TS/Python files touched. - [x] Tests added or updated for new or changed behavior — N/A. The existing `service-env.test.js` already asserts the `GNUPGHOME` redirect introduced in #1121; this PR makes the corresponding binary available so those assertions reflect a runtime that actually works. A new test that spawns `gpg` directly inside a container would arguably be worth a follow-up (it would have caught this gap originally), but it's a separate concern from this one-line install fix. - [x] No secrets, API keys, or credentials committed. - [ ] Doc pages updated for any user-facing behavior changes — N/A. The bug report describes the expected behavior; this PR just makes runtime match it. No docs claim gpg is unavailable. ### Doc Changes - N/A (no doc changes) --- Signed-off-by: T Savo <evilgenius@nefariousplan.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Base system image now includes GnuPG as a pinned OS package. * **Bug Fixes / Security** * GnuPG runtime directory is now created in a separate step with stricter permissions and sandbox ownership when applicable, reducing exposure. * **Tests** * Test suite updated to verify the new directory creation and permission/ownership behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: T Savo <evilgenius@nefariousplan.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
…VIDIA#1649) <!-- markdownlint-disable MD041 --> ## Summary The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is missing the `gnupg` package — `gpg --list-keys` (and any other gpg invocation) fails with `bash: gpg: command not found` inside the sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to the existing `apt-get install` block in `Dockerfile.base`, restoring the binary that the rest of the codebase already assumes is present. ## Related Issue Closes NVIDIA#1640. ## Changes `Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing `apt-get install` block, slotted right after `git`. Same `--no-install-recommends`, same cleanup tail, same `=<version>` pinning style as every other package in the block. ```diff curl=7.88.1-10+deb12u14 \ git=1:2.39.5-0+deb12u3 \ + gnupg=2.2.40-1.1+deb12u2 \ ca-certificates=20230311+deb12u1 \ ``` The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified by `apt-cache madison gnupg` against the exact base image SHA `node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`, `gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost ~3 MB compressed. Diff: **+1 / 0** in 1 file. ### Why this is the right fix (and not "lower the env var" or "remove the test") The fix isn't obvious unless you trace where `GNUPGHOME` came from. Walking that chain: 1. **PR NVIDIA#1121** (`fix(sandbox): restrict /sandbox to read-only via Landlock (NVIDIA#804)`, authored by @prekshivyas, merged 2026-04-08) made the `/sandbox` home directory Landlock-read-only to prevent agents from modifying their own runtime environment. 2. To keep tools that normally write under `~/...` working (gpg, git config, python history, npm prefix, etc.), that PR redirected each tool's homedir to a writable `/tmp/...` path via env vars in `scripts/nemoclaw-start.sh`. The relevant line is at `scripts/nemoclaw-start.sh:53`: ```sh 'GNUPGHOME=/tmp/.gnupg' ``` alongside `HISTFILE=/tmp/.bash_history`, `GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc. 3. PR NVIDIA#1121 also added three matching assertions in `test/service-env.test.js` (lines 177, 191, 347) verifying that the redirect is set: ```js expect(src).toContain("GNUPGHOME=/tmp/.gnupg"); ``` 4. **What PR NVIDIA#1121 didn't do**: add `gnupg` to the `apt-get install` list in `Dockerfile.base`. The env var setup landed and the test assertions landed, but the install line was missed. 5. CI never noticed because `service-env.test.js` only asserts that the env var is *set* in the source — it never spawns a subprocess that actually runs `gpg`. So a working test suite + a missing binary coexist silently. The QA report (this issue, NVIDIA#1640) catches it as a runtime failure on DGX Spark aarch64 because their test step does invoke `gpg --list-keys`. The clear intent of NVIDIA#1121 was to **enable** gpg under a redirected `GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg blocked. This PR is the matching install line that NVIDIA#1121 should have included, closing a one-line oversight rather than adding new capability or rolling anything back. ### Why not just remove the GNUPGHOME redirect The env var redirect from NVIDIA#1121 is doing real work — without it, any future `apt-get install gnupg` would still leave gpg unable to write to its homedir under Landlock-read-only `/sandbox`. The redirect is the "right" half of the pair; the install is the missing left half. ### Why this isn't a security regression The sandbox runs LLM-driven agents and gpg is a credential-handling tool, so it's worth justifying explicitly: - The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per session — no preloaded keys. - Without keys, gpg can hash/check signatures of public material but cannot decrypt or sign anything. - An agent would have to first import a key (which requires the user to provide it — keys are not pulled from anywhere automatically) before gpg becomes capable of any sensitive operation. - This is the same threat model as `git` and `curl`, which are already in the image and could equally be used to fetch arbitrary content. gpg adds no new capability that the existing toolchain doesn't already have. If the project explicitly *did* want gpg unavailable to agents, the right fix would be to remove the GNUPGHOME redirect from NVIDIA#1121 *and* the matching test assertions, not to keep the env wiring while leaving the binary missing — that's just confusing. ## Type of Change - [x] Code change for a new feature, bug fix, or refactor. - [ ] Code change with doc updates. - [ ] Doc only. Prose changes without code sample modifications. - [ ] Doc only. Includes code sample changes. ## Testing Smoke-tested locally by building `Dockerfile.base` with the fix and running the exact failing command from the bug report: ```sh $ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg . [...] => exporting to image 46.7s done $ docker run --rm nemoclaw-base-test:gnupg gpg --version gpg (GnuPG) 2.2.40 libgcrypt 1.10.1 $ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys gpg: directory '/root/.gnupg' created gpg: keybox '/root/.gnupg/pubring.kbx' created gpg: /root/.gnupg/trustdb.gpg: trustdb created (exit 0) # And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh: $ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \ sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys' gpg: keybox '/tmp/.gnupg/pubring.kbx' created (exit 0) ``` Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg` (matching what `nemoclaw-start.sh` exports) work as expected. The exact `gpg --list-keys` failure from the bug report no longer reproduces. - [x] `hadolint Dockerfile.base` — clean (no warnings) - [x] `docker build -f Dockerfile.base` — succeeds, exports to image cleanly - [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`) - [x] `gpg --list-keys` in built image — works (was `bash: gpg: command not found` before this PR) - [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches the runtime env from `nemoclaw-start.sh`) - [ ] `npx prek run --all-files` — partial: ran the affected hooks (commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli` against the full local suite because two pre-existing baseline failures on stock `main` get in the way on a WSL2 dev host (the `shouldPatchCoredns` issue addressed by PR NVIDIA#1626 (merged) and the install-preflight PATH leakage addressed by PR NVIDIA#1628 (open)). Upstream CI runs on Linux GHA runners and doesn't hit either of those, so it'll exercise the full suite normally. - [ ] `npm test` — same caveat as above, ran the relevant projects in isolation - [ ] `make docs` builds without warnings. (for doc-only changes — N/A) ## Checklist ### General - [x] I have read and followed the [contributing guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md). - [ ] I have read and followed the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). (for doc-only changes — N/A) ### Code Changes - [x] Formatters applied — `hadolint Dockerfile.base` clean. No JS/TS/Python files touched. - [x] Tests added or updated for new or changed behavior — N/A. The existing `service-env.test.js` already asserts the `GNUPGHOME` redirect introduced in NVIDIA#1121; this PR makes the corresponding binary available so those assertions reflect a runtime that actually works. A new test that spawns `gpg` directly inside a container would arguably be worth a follow-up (it would have caught this gap originally), but it's a separate concern from this one-line install fix. - [x] No secrets, API keys, or credentials committed. - [ ] Doc pages updated for any user-facing behavior changes — N/A. The bug report describes the expected behavior; this PR just makes runtime match it. No docs claim gpg is unavailable. ### Doc Changes - N/A (no doc changes) --- Signed-off-by: T Savo <evilgenius@nefariousplan.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Base system image now includes GnuPG as a pinned OS package. * **Bug Fixes / Security** * GnuPG runtime directory is now created in a separate step with stricter permissions and sandbox ownership when applicable, reducing exposure. * **Tests** * Test suite updated to verify the new directory creation and permission/ownership behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: T Savo <evilgenius@nefariousplan.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
…tic WSL2 tests (NVIDIA#1626) <!-- markdownlint-disable MD041 --> ## Summary Adds an `opts.isWsl` short-circuit to `isWsl()` in `bin/lib/platform.js` so the `shouldPatchCoredns` test can pin its assertions on every host. Without the override the existing test fails on WSL2 dev machines because `os.release()` always reports a "microsoft"-tagged kernel string there. ## Related Issue Closes NVIDIA#1621 (sub-item 3 of three: `shouldPatchCoredns` non-deterministic on WSL2 hosts). ## Changes - `bin/lib/platform.js`: `isWsl()` now returns `opts.isWsl` directly when it's a boolean, before falling through to the existing detection logic. Production callers (which don't pass `opts.isWsl`) keep the existing behavior unchanged. - `test/platform.test.js`: existing `shouldPatchCoredns` test now passes `{ isWsl: false }` so the runtime-matching assertions exercise the *function* rather than the *kernel*. A second test case is added with `{ isWsl: true }` to lock the "skip CoreDNS patching on WSL2" branch so a future change to that branch is caught. Diff: +23 / −5 across 2 files. ## Type of Change - [x] Code change for a new feature, bug fix, or refactor. - [ ] Code change with doc updates. - [ ] Doc only. Prose changes without code sample modifications. - [ ] Doc only. Includes code sample changes. ## Testing - [x] `npx vitest run --project cli test/platform.test.js -t shouldPatchCoredns` — both new test cases pass on a WSL2 Ubuntu host (the existing test was the failure that prompted this PR) - [x] `npx prettier --check bin/lib/platform.js test/platform.test.js` clean - [x] `npx tsc --noEmit -p tsconfig.cli.json` clean - [ ] `npx prek run --all-files` / `npm test` — *not run on the local machine because the full CLI test suite hits two separate baseline failures on a WSL2 host: this very `shouldPatchCoredns` issue (which this PR fixes) and the `install-preflight` sysbin leakage covered by NVIDIA#1628. Upstream CI runs on Linux runners so it'll exercise the full suite normally.* - [ ] `make docs` builds without warnings. (for doc-only changes — N/A) ## Checklist ### General - [x] I have read and followed the [contributing guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md). - [ ] I have read and followed the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). (for doc-only changes — N/A) ### Code Changes - [x] Formatters applied — `npx prettier --check` clean for both touched files. - [x] Tests added or updated for new or changed behavior — the existing `shouldPatchCoredns` test now uses the `opts.isWsl` override; a new test case pins the WSL2 branch. - [x] No secrets, API keys, or credentials committed. - [ ] Doc pages updated for any user-facing behavior changes — N/A, the new opts argument is test-only API surface; production callers don't pass it. ### Doc Changes - N/A (no doc changes) --- Signed-off-by: T Savo <evilgenius@nefariousplan.com> Signed-off-by: T Savo <evilgenius@nefariousplan.com> Co-authored-by: TSavo <TSavo@users.noreply.github.com>
…VIDIA#1649) <!-- markdownlint-disable MD041 --> ## Summary The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is missing the `gnupg` package — `gpg --list-keys` (and any other gpg invocation) fails with `bash: gpg: command not found` inside the sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to the existing `apt-get install` block in `Dockerfile.base`, restoring the binary that the rest of the codebase already assumes is present. ## Related Issue Closes NVIDIA#1640. ## Changes `Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing `apt-get install` block, slotted right after `git`. Same `--no-install-recommends`, same cleanup tail, same `=<version>` pinning style as every other package in the block. ```diff curl=7.88.1-10+deb12u14 \ git=1:2.39.5-0+deb12u3 \ + gnupg=2.2.40-1.1+deb12u2 \ ca-certificates=20230311+deb12u1 \ ``` The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified by `apt-cache madison gnupg` against the exact base image SHA `node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`, `gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost ~3 MB compressed. Diff: **+1 / 0** in 1 file. ### Why this is the right fix (and not "lower the env var" or "remove the test") The fix isn't obvious unless you trace where `GNUPGHOME` came from. Walking that chain: 1. **PR NVIDIA#1121** (`fix(sandbox): restrict /sandbox to read-only via Landlock (NVIDIA#804)`, authored by @prekshivyas, merged 2026-04-08) made the `/sandbox` home directory Landlock-read-only to prevent agents from modifying their own runtime environment. 2. To keep tools that normally write under `~/...` working (gpg, git config, python history, npm prefix, etc.), that PR redirected each tool's homedir to a writable `/tmp/...` path via env vars in `scripts/nemoclaw-start.sh`. The relevant line is at `scripts/nemoclaw-start.sh:53`: ```sh 'GNUPGHOME=/tmp/.gnupg' ``` alongside `HISTFILE=/tmp/.bash_history`, `GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc. 3. PR NVIDIA#1121 also added three matching assertions in `test/service-env.test.js` (lines 177, 191, 347) verifying that the redirect is set: ```js expect(src).toContain("GNUPGHOME=/tmp/.gnupg"); ``` 4. **What PR NVIDIA#1121 didn't do**: add `gnupg` to the `apt-get install` list in `Dockerfile.base`. The env var setup landed and the test assertions landed, but the install line was missed. 5. CI never noticed because `service-env.test.js` only asserts that the env var is *set* in the source — it never spawns a subprocess that actually runs `gpg`. So a working test suite + a missing binary coexist silently. The QA report (this issue, NVIDIA#1640) catches it as a runtime failure on DGX Spark aarch64 because their test step does invoke `gpg --list-keys`. The clear intent of NVIDIA#1121 was to **enable** gpg under a redirected `GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg blocked. This PR is the matching install line that NVIDIA#1121 should have included, closing a one-line oversight rather than adding new capability or rolling anything back. ### Why not just remove the GNUPGHOME redirect The env var redirect from NVIDIA#1121 is doing real work — without it, any future `apt-get install gnupg` would still leave gpg unable to write to its homedir under Landlock-read-only `/sandbox`. The redirect is the "right" half of the pair; the install is the missing left half. ### Why this isn't a security regression The sandbox runs LLM-driven agents and gpg is a credential-handling tool, so it's worth justifying explicitly: - The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per session — no preloaded keys. - Without keys, gpg can hash/check signatures of public material but cannot decrypt or sign anything. - An agent would have to first import a key (which requires the user to provide it — keys are not pulled from anywhere automatically) before gpg becomes capable of any sensitive operation. - This is the same threat model as `git` and `curl`, which are already in the image and could equally be used to fetch arbitrary content. gpg adds no new capability that the existing toolchain doesn't already have. If the project explicitly *did* want gpg unavailable to agents, the right fix would be to remove the GNUPGHOME redirect from NVIDIA#1121 *and* the matching test assertions, not to keep the env wiring while leaving the binary missing — that's just confusing. ## Type of Change - [x] Code change for a new feature, bug fix, or refactor. - [ ] Code change with doc updates. - [ ] Doc only. Prose changes without code sample modifications. - [ ] Doc only. Includes code sample changes. ## Testing Smoke-tested locally by building `Dockerfile.base` with the fix and running the exact failing command from the bug report: ```sh $ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg . [...] => exporting to image 46.7s done $ docker run --rm nemoclaw-base-test:gnupg gpg --version gpg (GnuPG) 2.2.40 libgcrypt 1.10.1 $ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys gpg: directory '/root/.gnupg' created gpg: keybox '/root/.gnupg/pubring.kbx' created gpg: /root/.gnupg/trustdb.gpg: trustdb created (exit 0) # And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh: $ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \ sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys' gpg: keybox '/tmp/.gnupg/pubring.kbx' created (exit 0) ``` Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg` (matching what `nemoclaw-start.sh` exports) work as expected. The exact `gpg --list-keys` failure from the bug report no longer reproduces. - [x] `hadolint Dockerfile.base` — clean (no warnings) - [x] `docker build -f Dockerfile.base` — succeeds, exports to image cleanly - [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`) - [x] `gpg --list-keys` in built image — works (was `bash: gpg: command not found` before this PR) - [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches the runtime env from `nemoclaw-start.sh`) - [ ] `npx prek run --all-files` — partial: ran the affected hooks (commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli` against the full local suite because two pre-existing baseline failures on stock `main` get in the way on a WSL2 dev host (the `shouldPatchCoredns` issue addressed by PR NVIDIA#1626 (merged) and the install-preflight PATH leakage addressed by PR NVIDIA#1628 (open)). Upstream CI runs on Linux GHA runners and doesn't hit either of those, so it'll exercise the full suite normally. - [ ] `npm test` — same caveat as above, ran the relevant projects in isolation - [ ] `make docs` builds without warnings. (for doc-only changes — N/A) ## Checklist ### General - [x] I have read and followed the [contributing guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md). - [ ] I have read and followed the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). (for doc-only changes — N/A) ### Code Changes - [x] Formatters applied — `hadolint Dockerfile.base` clean. No JS/TS/Python files touched. - [x] Tests added or updated for new or changed behavior — N/A. The existing `service-env.test.js` already asserts the `GNUPGHOME` redirect introduced in NVIDIA#1121; this PR makes the corresponding binary available so those assertions reflect a runtime that actually works. A new test that spawns `gpg` directly inside a container would arguably be worth a follow-up (it would have caught this gap originally), but it's a separate concern from this one-line install fix. - [x] No secrets, API keys, or credentials committed. - [ ] Doc pages updated for any user-facing behavior changes — N/A. The bug report describes the expected behavior; this PR just makes runtime match it. No docs claim gpg is unavailable. ### Doc Changes - N/A (no doc changes) --- Signed-off-by: T Savo <evilgenius@nefariousplan.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Base system image now includes GnuPG as a pinned OS package. * **Bug Fixes / Security** * GnuPG runtime directory is now created in a separate step with stricter permissions and sandbox ownership when applicable, reducing exposure. * **Tests** * Test suite updated to verify the new directory creation and permission/ownership behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: T Savo <evilgenius@nefariousplan.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Summary
Adds an
opts.isWslshort-circuit toisWsl()inbin/lib/platform.jsso theshouldPatchCorednstest can pin its assertions on every host. Without the override the existing test fails on WSL2 dev machines becauseos.release()always reports a "microsoft"-tagged kernel string there.Related Issue
Closes #1621 (sub-item 3 of three:
shouldPatchCorednsnon-deterministic on WSL2 hosts).Changes
bin/lib/platform.js:isWsl()now returnsopts.isWsldirectly when it's a boolean, before falling through to the existing detection logic. Production callers (which don't passopts.isWsl) keep the existing behavior unchanged.test/platform.test.js: existingshouldPatchCorednstest now passes{ isWsl: false }so the runtime-matching assertions exercise the function rather than the kernel. A second test case is added with{ isWsl: true }to lock the "skip CoreDNS patching on WSL2" branch so a future change to that branch is caught.Diff: +23 / −5 across 2 files.
Type of Change
Testing
npx vitest run --project cli test/platform.test.js -t shouldPatchCoredns— both new test cases pass on a WSL2 Ubuntu host (the existing test was the failure that prompted this PR)npx prettier --check bin/lib/platform.js test/platform.test.jscleannpx tsc --noEmit -p tsconfig.cli.jsoncleannpx prek run --all-files/npm test— not run on the local machine because the full CLI test suite hits two separate baseline failures on a WSL2 host: this veryshouldPatchCorednsissue (which this PR fixes) and theinstall-preflightsysbin leakage covered by fix(test): isolate sysbin in install-preflight tests to prevent host PATH leakage #1628. Upstream CI runs on Linux runners so it'll exercise the full suite normally.make docsbuilds without warnings. (for doc-only changes — N/A)Checklist
General
Code Changes
npx prettier --checkclean for both touched files.shouldPatchCorednstest now uses theopts.isWsloverride; a new test case pins the WSL2 branch.Doc Changes
Signed-off-by: T Savo evilgenius@nefariousplan.com