Skip to content

fix(platform): allow shouldPatchCoredns isWsl override for deterministic WSL2 tests#1626

Merged
cv merged 1 commit intoNVIDIA:mainfrom
TSavo:fix/platform-shouldpatchcoredns-wsl
Apr 8, 2026
Merged

fix(platform): allow shouldPatchCoredns isWsl override for deterministic WSL2 tests#1626
cv merged 1 commit intoNVIDIA:mainfrom
TSavo:fix/platform-shouldpatchcoredns-wsl

Conversation

@TSavo
Copy link
Copy Markdown
Contributor

@TSavo TSavo commented Apr 8, 2026

Summary

Adds an opts.isWsl short-circuit to isWsl() in bin/lib/platform.js so the shouldPatchCoredns test can pin its assertions on every host. Without the override the existing test fails on WSL2 dev machines because os.release() always reports a "microsoft"-tagged kernel string there.

Related Issue

Closes #1621 (sub-item 3 of three: shouldPatchCoredns non-deterministic on WSL2 hosts).

Changes

  • bin/lib/platform.js: isWsl() now returns opts.isWsl directly when it's a boolean, before falling through to the existing detection logic. Production callers (which don't pass opts.isWsl) keep the existing behavior unchanged.
  • test/platform.test.js: existing shouldPatchCoredns test now passes { isWsl: false } so the runtime-matching assertions exercise the function rather than the kernel. A second test case is added with { isWsl: true } to lock the "skip CoreDNS patching on WSL2" branch so a future change to that branch is caught.

Diff: +23 / −5 across 2 files.

Type of Change

  • Code change for a new feature, bug fix, or refactor.
  • Code change with doc updates.
  • Doc only. Prose changes without code sample modifications.
  • Doc only. Includes code sample changes.

Testing

  • npx vitest run --project cli test/platform.test.js -t shouldPatchCoredns — both new test cases pass on a WSL2 Ubuntu host (the existing test was the failure that prompted this PR)
  • npx prettier --check bin/lib/platform.js test/platform.test.js clean
  • npx tsc --noEmit -p tsconfig.cli.json clean
  • npx prek run --all-files / npm testnot run on the local machine because the full CLI test suite hits two separate baseline failures on a WSL2 host: this very shouldPatchCoredns issue (which this PR fixes) and the install-preflight sysbin leakage covered by fix(test): isolate sysbin in install-preflight tests to prevent host PATH leakage #1628. Upstream CI runs on Linux runners so it'll exercise the full suite normally.
  • make docs builds without warnings. (for doc-only changes — N/A)

Checklist

General

Code Changes

  • Formatters applied — npx prettier --check clean for both touched files.
  • Tests added or updated for new or changed behavior — the existing shouldPatchCoredns test now uses the opts.isWsl override; a new test case pins the WSL2 branch.
  • No secrets, API keys, or credentials committed.
  • Doc pages updated for any user-facing behavior changes — N/A, the new opts argument is test-only API surface; production callers don't pass it.

Doc Changes

  • N/A (no doc changes)

Signed-off-by: T Savo evilgenius@nefariousplan.com

Copilot AI review requested due to automatic review settings April 8, 2026 20:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Adds an explicit opts.isWsl boolean override to isWsl() in bin/lib/platform.js, so callers can force WSL/non-WSL detection. Tests in test/platform.test.js were updated to split the previous host-ambiguous shouldPatchCoredns assertion into two deterministic cases covering { isWsl: false } and { isWsl: true }.

Changes

Cohort / File(s) Summary
WSL Override Support
bin/lib/platform.js
isWsl(opts = {}) now returns immediately when opts.isWsl is a boolean; existing environment-based detection remains for other cases.
Platform Test Coverage
test/platform.test.js
Replaced a single host-dependent shouldPatchCoredns test with two explicit cases passing { isWsl: false } and { isWsl: true } to make assertions deterministic across hosts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I toggled WSL with a gentle hop,
Tests now pass and never stop,
One false, one true — no flake or fright,
The rabbit's patch makes CI bright ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding an isWsl override to the shouldPatchCoredns function for deterministic WSL2 testing.
Linked Issues check ✅ Passed The PR implements objective 3 from issue #1621: allowing tests to pass an explicit isWsl option to override host detection and adding a WSL-specific test case.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing objective 3 from #1621. No modifications address objectives 1 or 2, and no unrelated changes are present.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes shouldPatchCoredns tests deterministic on WSL2 hosts by allowing callers to explicitly override WSL detection, avoiding reliance on host kernel introspection during unit tests.

Changes:

  • Add an opts.isWsl boolean short-circuit to isWsl() so callers can bypass host detection when desired.
  • Update shouldPatchCoredns tests to pass { isWsl: false } for non-WSL behavior and add a { isWsl: true } case to lock the WSL branch.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

File Description
bin/lib/platform.js Adds an explicit opts.isWsl override to make WSL detection controllable by callers (primarily for deterministic tests).
test/platform.test.js Pins shouldPatchCoredns behavior via { isWsl: ... } and adds coverage for the WSL path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +166 to +177
it("patches CoreDNS for Colima and Podman (non-WSL host)", () => {
expect(shouldPatchCoredns("colima", { isWsl: false })).toBe(true);
expect(shouldPatchCoredns("podman", { isWsl: false })).toBe(true);
expect(shouldPatchCoredns("docker-desktop", { isWsl: false })).toBe(false);
expect(shouldPatchCoredns("docker", { isWsl: false })).toBe(false);
});

it("never patches CoreDNS on WSL2 (host DNS unreachable from k3s pods)", () => {
expect(shouldPatchCoredns("colima", { isWsl: true })).toBe(false);
expect(shouldPatchCoredns("podman", { isWsl: true })).toBe(false);
expect(shouldPatchCoredns("docker-desktop", { isWsl: true })).toBe(false);
expect(shouldPatchCoredns("docker", { isWsl: true })).toBe(false);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test titles/descriptions imply they depend on the host being non-WSL/WSL2, but the behavior is actually pinned via the { isWsl: ... } override and should pass on any host. Consider renaming these to reflect the simulated condition (e.g., "when isWsl is false" / "when isWsl is true" or "on WSL" instead of "on WSL2") to avoid confusion for future maintainers running the suite on different environments.

Copilot uses AI. Check for mistakes.
…tic WSL2 tests

`shouldPatchCoredns` consults `isWsl()`, which checks `os.release()` for the
"microsoft" tag in addition to env vars. On WSL2 hosts (a common Linux dev
setup under Windows), `os.release()` always returns a "microsoft"-tagged
string, so `isWsl()` returns true regardless of how the test sets up env
vars — and `shouldPatchCoredns("colima")` flips from `true` to `false`. The
existing test `expect(shouldPatchCoredns("colima")).toBe(true)` then fails
on WSL2 dev machines while passing in CI.

This adds an explicit `opts.isWsl` short-circuit to `isWsl()`. If a caller
passes a boolean for `isWsl`, that value is returned directly and the
host-detection logic is bypassed. Production callers (which don't pass an
`isWsl` opt) keep the existing behavior unchanged.

The `shouldPatchCoredns` test now passes `{ isWsl: false }` so the four
runtime-matching assertions exercise the *function* rather than the
*kernel*. A second test case pins the WSL2 branch with `{ isWsl: true }`
so future changes to "skip CoreDNS patching on WSL" are caught.

Closes NVIDIA#1621 (sub-item 3 of three).

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
@TSavo TSavo force-pushed the fix/platform-shouldpatchcoredns-wsl branch from c16e506 to 15840cb Compare April 8, 2026 21:06
@TSavo
Copy link
Copy Markdown
Contributor Author

TSavo commented Apr 8, 2026

Force-pushed to add a Signed-off-by trailer (DCO compliance) — the diff itself is unchanged. CodeRabbit's prior review still applies to the same content.

Ready for a maintainer to approve workflow runs whenever you have a moment — first-time-contributor gate.

Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — security review PASS.

  • Override is unreachable in production (all callers pass no opts)
  • Strict typeof === "boolean" guard prevents non-boolean values
  • Consistent with existing dependency-injection pattern in platform.js
  • Improves test coverage of the WSL2 branch

No concerns.

@cv cv merged commit 196d21b into NVIDIA:main Apr 8, 2026
1 check passed
@TSavo
Copy link
Copy Markdown
Contributor Author

TSavo commented Apr 8, 2026

Thanks for the merge and the explicit security review @cv — particularly noting the production-unreachability and the strict typeof === "boolean" guard. Glad the dependency-injection pattern in platform.js carried the day on the design choice.

For tracking: this is one of three test-environment fixes from #1621. The other two (#1627 cli timeout, #1628 install-preflight sysbin isolation) are still open and have actionable reviewer feedback I just addressed in fresh commits.

cv added a commit that referenced this pull request Apr 9, 2026
…1649)

<!-- markdownlint-disable MD041 -->
## Summary

The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is
missing the `gnupg` package — `gpg --list-keys` (and any other gpg
invocation) fails with `bash: gpg: command not found` inside the
sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to
the existing `apt-get install` block in `Dockerfile.base`, restoring the
binary that the rest of the codebase already assumes is present.

## Related Issue

Closes #1640.

## Changes

`Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing
`apt-get install` block, slotted right after `git`. Same
`--no-install-recommends`, same cleanup tail, same `=<version>` pinning
style as every other package in the block.

```diff
        curl=7.88.1-10+deb12u14 \
        git=1:2.39.5-0+deb12u3 \
+       gnupg=2.2.40-1.1+deb12u2 \
        ca-certificates=20230311+deb12u1 \
```

The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified
by `apt-cache madison gnupg` against the exact base image SHA
`node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`,
`gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost
~3 MB compressed.

Diff: **+1 / 0** in 1 file.

### Why this is the right fix (and not "lower the env var" or "remove
the test")

The fix isn't obvious unless you trace where `GNUPGHOME` came from.
Walking that chain:

1. **PR #1121** (`fix(sandbox): restrict /sandbox to read-only via
Landlock (#804)`, authored by @prekshivyas, merged 2026-04-08) made the
`/sandbox` home directory Landlock-read-only to prevent agents from
modifying their own runtime environment.
2. To keep tools that normally write under `~/...` working (gpg, git
config, python history, npm prefix, etc.), that PR redirected each
tool's homedir to a writable `/tmp/...` path via env vars in
`scripts/nemoclaw-start.sh`. The relevant line is at
`scripts/nemoclaw-start.sh:53`:
   ```sh
   'GNUPGHOME=/tmp/.gnupg'
   ```
alongside `HISTFILE=/tmp/.bash_history`,
`GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc.
3. PR #1121 also added three matching assertions in
`test/service-env.test.js` (lines 177, 191, 347) verifying that the
redirect is set:
   ```js
   expect(src).toContain("GNUPGHOME=/tmp/.gnupg");
   ```
4. **What PR #1121 didn't do**: add `gnupg` to the `apt-get install`
list in `Dockerfile.base`. The env var setup landed and the test
assertions landed, but the install line was missed.
5. CI never noticed because `service-env.test.js` only asserts that the
env var is *set* in the source — it never spawns a subprocess that
actually runs `gpg`. So a working test suite + a missing binary coexist
silently. The QA report (this issue, #1640) catches it as a runtime
failure on DGX Spark aarch64 because their test step does invoke `gpg
--list-keys`.

The clear intent of #1121 was to **enable** gpg under a redirected
`GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg
blocked. This PR is the matching install line that #1121 should have
included, closing a one-line oversight rather than adding new capability
or rolling anything back.

### Why not just remove the GNUPGHOME redirect

The env var redirect from #1121 is doing real work — without it, any
future `apt-get install gnupg` would still leave gpg unable to write to
its homedir under Landlock-read-only `/sandbox`. The redirect is the
"right" half of the pair; the install is the missing left half.

### Why this isn't a security regression

The sandbox runs LLM-driven agents and gpg is a credential-handling
tool, so it's worth justifying explicitly:

- The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per
session — no preloaded keys.
- Without keys, gpg can hash/check signatures of public material but
cannot decrypt or sign anything.
- An agent would have to first import a key (which requires the user to
provide it — keys are not pulled from anywhere automatically) before gpg
becomes capable of any sensitive operation.
- This is the same threat model as `git` and `curl`, which are already
in the image and could equally be used to fetch arbitrary content. gpg
adds no new capability that the existing toolchain doesn't already have.

If the project explicitly *did* want gpg unavailable to agents, the
right fix would be to remove the GNUPGHOME redirect from #1121 *and* the
matching test assertions, not to keep the env wiring while leaving the
binary missing — that's just confusing.

## Type of Change

- [x] Code change for a new feature, bug fix, or refactor.
- [ ] Code change with doc updates.
- [ ] Doc only. Prose changes without code sample modifications.
- [ ] Doc only. Includes code sample changes.

## Testing

Smoke-tested locally by building `Dockerfile.base` with the fix and
running the exact failing command from the bug report:

```sh
$ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg .
[...]
=> exporting to image  46.7s done

$ docker run --rm nemoclaw-base-test:gnupg gpg --version
gpg (GnuPG) 2.2.40
libgcrypt 1.10.1

$ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
(exit 0)

# And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh:
$ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \
    sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys'
gpg: keybox '/tmp/.gnupg/pubring.kbx' created
(exit 0)
```

Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg`
(matching what `nemoclaw-start.sh` exports) work as expected. The exact
`gpg --list-keys` failure from the bug report no longer reproduces.

- [x] `hadolint Dockerfile.base` — clean (no warnings)
- [x] `docker build -f Dockerfile.base` — succeeds, exports to image
cleanly
- [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`)
- [x] `gpg --list-keys` in built image — works (was `bash: gpg: command
not found` before this PR)
- [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches
the runtime env from `nemoclaw-start.sh`)
- [ ] `npx prek run --all-files` — partial: ran the affected hooks
(commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli`
against the full local suite because two pre-existing baseline failures
on stock `main` get in the way on a WSL2 dev host (the
`shouldPatchCoredns` issue addressed by PR #1626 (merged) and the
install-preflight PATH leakage addressed by PR #1628 (open)). Upstream
CI runs on Linux GHA runners and doesn't hit either of those, so it'll
exercise the full suite normally.
- [ ] `npm test` — same caveat as above, ran the relevant projects in
isolation
- [ ] `make docs` builds without warnings. (for doc-only changes — N/A)

## Checklist

### General

- [x] I have read and followed the [contributing
guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
- [ ] I have read and followed the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
(for doc-only changes — N/A)

### Code Changes

- [x] Formatters applied — `hadolint Dockerfile.base` clean. No
JS/TS/Python files touched.
- [x] Tests added or updated for new or changed behavior — N/A. The
existing `service-env.test.js` already asserts the `GNUPGHOME` redirect
introduced in #1121; this PR makes the corresponding binary available so
those assertions reflect a runtime that actually works. A new test that
spawns `gpg` directly inside a container would arguably be worth a
follow-up (it would have caught this gap originally), but it's a
separate concern from this one-line install fix.
- [x] No secrets, API keys, or credentials committed.
- [ ] Doc pages updated for any user-facing behavior changes — N/A. The
bug report describes the expected behavior; this PR just makes runtime
match it. No docs claim gpg is unavailable.

### Doc Changes

- N/A (no doc changes)

---
Signed-off-by: T Savo <evilgenius@nefariousplan.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Base system image now includes GnuPG as a pinned OS package.

* **Bug Fixes / Security**
* GnuPG runtime directory is now created in a separate step with
stricter permissions and sandbox ownership when applicable, reducing
exposure.

* **Tests**
* Test suite updated to verify the new directory creation and
permission/ownership behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
ericksoa pushed a commit to cheese-head/NemoClaw that referenced this pull request Apr 14, 2026
…VIDIA#1649)

<!-- markdownlint-disable MD041 -->
## Summary

The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is
missing the `gnupg` package — `gpg --list-keys` (and any other gpg
invocation) fails with `bash: gpg: command not found` inside the
sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to
the existing `apt-get install` block in `Dockerfile.base`, restoring the
binary that the rest of the codebase already assumes is present.

## Related Issue

Closes NVIDIA#1640.

## Changes

`Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing
`apt-get install` block, slotted right after `git`. Same
`--no-install-recommends`, same cleanup tail, same `=<version>` pinning
style as every other package in the block.

```diff
        curl=7.88.1-10+deb12u14 \
        git=1:2.39.5-0+deb12u3 \
+       gnupg=2.2.40-1.1+deb12u2 \
        ca-certificates=20230311+deb12u1 \
```

The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified
by `apt-cache madison gnupg` against the exact base image SHA
`node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`,
`gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost
~3 MB compressed.

Diff: **+1 / 0** in 1 file.

### Why this is the right fix (and not "lower the env var" or "remove
the test")

The fix isn't obvious unless you trace where `GNUPGHOME` came from.
Walking that chain:

1. **PR NVIDIA#1121** (`fix(sandbox): restrict /sandbox to read-only via
Landlock (NVIDIA#804)`, authored by @prekshivyas, merged 2026-04-08) made the
`/sandbox` home directory Landlock-read-only to prevent agents from
modifying their own runtime environment.
2. To keep tools that normally write under `~/...` working (gpg, git
config, python history, npm prefix, etc.), that PR redirected each
tool's homedir to a writable `/tmp/...` path via env vars in
`scripts/nemoclaw-start.sh`. The relevant line is at
`scripts/nemoclaw-start.sh:53`:
   ```sh
   'GNUPGHOME=/tmp/.gnupg'
   ```
alongside `HISTFILE=/tmp/.bash_history`,
`GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc.
3. PR NVIDIA#1121 also added three matching assertions in
`test/service-env.test.js` (lines 177, 191, 347) verifying that the
redirect is set:
   ```js
   expect(src).toContain("GNUPGHOME=/tmp/.gnupg");
   ```
4. **What PR NVIDIA#1121 didn't do**: add `gnupg` to the `apt-get install`
list in `Dockerfile.base`. The env var setup landed and the test
assertions landed, but the install line was missed.
5. CI never noticed because `service-env.test.js` only asserts that the
env var is *set* in the source — it never spawns a subprocess that
actually runs `gpg`. So a working test suite + a missing binary coexist
silently. The QA report (this issue, NVIDIA#1640) catches it as a runtime
failure on DGX Spark aarch64 because their test step does invoke `gpg
--list-keys`.

The clear intent of NVIDIA#1121 was to **enable** gpg under a redirected
`GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg
blocked. This PR is the matching install line that NVIDIA#1121 should have
included, closing a one-line oversight rather than adding new capability
or rolling anything back.

### Why not just remove the GNUPGHOME redirect

The env var redirect from NVIDIA#1121 is doing real work — without it, any
future `apt-get install gnupg` would still leave gpg unable to write to
its homedir under Landlock-read-only `/sandbox`. The redirect is the
"right" half of the pair; the install is the missing left half.

### Why this isn't a security regression

The sandbox runs LLM-driven agents and gpg is a credential-handling
tool, so it's worth justifying explicitly:

- The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per
session — no preloaded keys.
- Without keys, gpg can hash/check signatures of public material but
cannot decrypt or sign anything.
- An agent would have to first import a key (which requires the user to
provide it — keys are not pulled from anywhere automatically) before gpg
becomes capable of any sensitive operation.
- This is the same threat model as `git` and `curl`, which are already
in the image and could equally be used to fetch arbitrary content. gpg
adds no new capability that the existing toolchain doesn't already have.

If the project explicitly *did* want gpg unavailable to agents, the
right fix would be to remove the GNUPGHOME redirect from NVIDIA#1121 *and* the
matching test assertions, not to keep the env wiring while leaving the
binary missing — that's just confusing.

## Type of Change

- [x] Code change for a new feature, bug fix, or refactor.
- [ ] Code change with doc updates.
- [ ] Doc only. Prose changes without code sample modifications.
- [ ] Doc only. Includes code sample changes.

## Testing

Smoke-tested locally by building `Dockerfile.base` with the fix and
running the exact failing command from the bug report:

```sh
$ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg .
[...]
=> exporting to image  46.7s done

$ docker run --rm nemoclaw-base-test:gnupg gpg --version
gpg (GnuPG) 2.2.40
libgcrypt 1.10.1

$ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
(exit 0)

# And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh:
$ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \
    sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys'
gpg: keybox '/tmp/.gnupg/pubring.kbx' created
(exit 0)
```

Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg`
(matching what `nemoclaw-start.sh` exports) work as expected. The exact
`gpg --list-keys` failure from the bug report no longer reproduces.

- [x] `hadolint Dockerfile.base` — clean (no warnings)
- [x] `docker build -f Dockerfile.base` — succeeds, exports to image
cleanly
- [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`)
- [x] `gpg --list-keys` in built image — works (was `bash: gpg: command
not found` before this PR)
- [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches
the runtime env from `nemoclaw-start.sh`)
- [ ] `npx prek run --all-files` — partial: ran the affected hooks
(commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli`
against the full local suite because two pre-existing baseline failures
on stock `main` get in the way on a WSL2 dev host (the
`shouldPatchCoredns` issue addressed by PR NVIDIA#1626 (merged) and the
install-preflight PATH leakage addressed by PR NVIDIA#1628 (open)). Upstream
CI runs on Linux GHA runners and doesn't hit either of those, so it'll
exercise the full suite normally.
- [ ] `npm test` — same caveat as above, ran the relevant projects in
isolation
- [ ] `make docs` builds without warnings. (for doc-only changes — N/A)

## Checklist

### General

- [x] I have read and followed the [contributing
guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
- [ ] I have read and followed the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
(for doc-only changes — N/A)

### Code Changes

- [x] Formatters applied — `hadolint Dockerfile.base` clean. No
JS/TS/Python files touched.
- [x] Tests added or updated for new or changed behavior — N/A. The
existing `service-env.test.js` already asserts the `GNUPGHOME` redirect
introduced in NVIDIA#1121; this PR makes the corresponding binary available so
those assertions reflect a runtime that actually works. A new test that
spawns `gpg` directly inside a container would arguably be worth a
follow-up (it would have caught this gap originally), but it's a
separate concern from this one-line install fix.
- [x] No secrets, API keys, or credentials committed.
- [ ] Doc pages updated for any user-facing behavior changes — N/A. The
bug report describes the expected behavior; this PR just makes runtime
match it. No docs claim gpg is unavailable.

### Doc Changes

- N/A (no doc changes)

---
Signed-off-by: T Savo <evilgenius@nefariousplan.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Base system image now includes GnuPG as a pinned OS package.

* **Bug Fixes / Security**
* GnuPG runtime directory is now created in a separate step with
stricter permissions and sandbox ownership when applicable, reducing
exposure.

* **Tests**
* Test suite updated to verify the new directory creation and
permission/ownership behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…tic WSL2 tests (NVIDIA#1626)

<!-- markdownlint-disable MD041 -->
## Summary

Adds an `opts.isWsl` short-circuit to `isWsl()` in `bin/lib/platform.js`
so the `shouldPatchCoredns` test can pin its assertions on every host.
Without the override the existing test fails on WSL2 dev machines
because `os.release()` always reports a "microsoft"-tagged kernel string
there.

## Related Issue

Closes NVIDIA#1621 (sub-item 3 of three: `shouldPatchCoredns`
non-deterministic on WSL2 hosts).

## Changes

- `bin/lib/platform.js`: `isWsl()` now returns `opts.isWsl` directly
when it's a boolean, before falling through to the existing detection
logic. Production callers (which don't pass `opts.isWsl`) keep the
existing behavior unchanged.
- `test/platform.test.js`: existing `shouldPatchCoredns` test now passes
`{ isWsl: false }` so the runtime-matching assertions exercise the
*function* rather than the *kernel*. A second test case is added with `{
isWsl: true }` to lock the "skip CoreDNS patching on WSL2" branch so a
future change to that branch is caught.

Diff: +23 / −5 across 2 files.

## Type of Change

- [x] Code change for a new feature, bug fix, or refactor.
- [ ] Code change with doc updates.
- [ ] Doc only. Prose changes without code sample modifications.
- [ ] Doc only. Includes code sample changes.

## Testing

- [x] `npx vitest run --project cli test/platform.test.js -t
shouldPatchCoredns` — both new test cases pass on a WSL2 Ubuntu host
(the existing test was the failure that prompted this PR)
- [x] `npx prettier --check bin/lib/platform.js test/platform.test.js`
clean
- [x] `npx tsc --noEmit -p tsconfig.cli.json` clean
- [ ] `npx prek run --all-files` / `npm test` — *not run on the local
machine because the full CLI test suite hits two separate baseline
failures on a WSL2 host: this very `shouldPatchCoredns` issue (which
this PR fixes) and the `install-preflight` sysbin leakage covered by
NVIDIA#1628. Upstream CI runs on Linux runners so it'll exercise the full
suite normally.*
- [ ] `make docs` builds without warnings. (for doc-only changes — N/A)

## Checklist

### General

- [x] I have read and followed the [contributing
guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
- [ ] I have read and followed the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
(for doc-only changes — N/A)

### Code Changes

- [x] Formatters applied — `npx prettier --check` clean for both touched
files.
- [x] Tests added or updated for new or changed behavior — the existing
`shouldPatchCoredns` test now uses the `opts.isWsl` override; a new test
case pins the WSL2 branch.
- [x] No secrets, API keys, or credentials committed.
- [ ] Doc pages updated for any user-facing behavior changes — N/A, the
new opts argument is test-only API surface; production callers don't
pass it.

### Doc Changes

- N/A (no doc changes)

---
Signed-off-by: T Savo <evilgenius@nefariousplan.com>

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
Co-authored-by: TSavo <TSavo@users.noreply.github.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…VIDIA#1649)

<!-- markdownlint-disable MD041 -->
## Summary

The sandbox base image (`ghcr.io/nvidia/nemoclaw/sandbox-base`) is
missing the `gnupg` package — `gpg --list-keys` (and any other gpg
invocation) fails with `bash: gpg: command not found` inside the
sandbox. This adds a single pinned `gnupg=2.2.40-1.1+deb12u2` line to
the existing `apt-get install` block in `Dockerfile.base`, restoring the
binary that the rest of the codebase already assumes is present.

## Related Issue

Closes NVIDIA#1640.

## Changes

`Dockerfile.base`: add `gnupg=2.2.40-1.1+deb12u2` to the existing
`apt-get install` block, slotted right after `git`. Same
`--no-install-recommends`, same cleanup tail, same `=<version>` pinning
style as every other package in the block.

```diff
        curl=7.88.1-10+deb12u14 \
        git=1:2.39.5-0+deb12u3 \
+       gnupg=2.2.40-1.1+deb12u2 \
        ca-certificates=20230311+deb12u1 \
```

The pinned version is the bookworm-stable `2.2.40-1.1+deb12u2`, verified
by `apt-cache madison gnupg` against the exact base image SHA
`node:22-slim@sha256:4f77a690...`. The package brings in `dirmngr`,
`gpg-wks-server`, and `gpg-wks-client` as dependencies. Total layer cost
~3 MB compressed.

Diff: **+1 / 0** in 1 file.

### Why this is the right fix (and not "lower the env var" or "remove
the test")

The fix isn't obvious unless you trace where `GNUPGHOME` came from.
Walking that chain:

1. **PR NVIDIA#1121** (`fix(sandbox): restrict /sandbox to read-only via
Landlock (NVIDIA#804)`, authored by @prekshivyas, merged 2026-04-08) made the
`/sandbox` home directory Landlock-read-only to prevent agents from
modifying their own runtime environment.
2. To keep tools that normally write under `~/...` working (gpg, git
config, python history, npm prefix, etc.), that PR redirected each
tool's homedir to a writable `/tmp/...` path via env vars in
`scripts/nemoclaw-start.sh`. The relevant line is at
`scripts/nemoclaw-start.sh:53`:
   ```sh
   'GNUPGHOME=/tmp/.gnupg'
   ```
alongside `HISTFILE=/tmp/.bash_history`,
`GIT_CONFIG_GLOBAL=/tmp/.gitconfig`, `PYTHONUSERBASE=/tmp/.local`, etc.
3. PR NVIDIA#1121 also added three matching assertions in
`test/service-env.test.js` (lines 177, 191, 347) verifying that the
redirect is set:
   ```js
   expect(src).toContain("GNUPGHOME=/tmp/.gnupg");
   ```
4. **What PR NVIDIA#1121 didn't do**: add `gnupg` to the `apt-get install`
list in `Dockerfile.base`. The env var setup landed and the test
assertions landed, but the install line was missed.
5. CI never noticed because `service-env.test.js` only asserts that the
env var is *set* in the source — it never spawns a subprocess that
actually runs `gpg`. So a working test suite + a missing binary coexist
silently. The QA report (this issue, NVIDIA#1640) catches it as a runtime
failure on DGX Spark aarch64 because their test step does invoke `gpg
--list-keys`.

The clear intent of NVIDIA#1121 was to **enable** gpg under a redirected
`GNUPGHOME` — you wouldn't redirect the homedir if you wanted gpg
blocked. This PR is the matching install line that NVIDIA#1121 should have
included, closing a one-line oversight rather than adding new capability
or rolling anything back.

### Why not just remove the GNUPGHOME redirect

The env var redirect from NVIDIA#1121 is doing real work — without it, any
future `apt-get install gnupg` would still leave gpg unable to write to
its homedir under Landlock-read-only `/sandbox`. The redirect is the
"right" half of the pair; the install is the missing left half.

### Why this isn't a security regression

The sandbox runs LLM-driven agents and gpg is a credential-handling
tool, so it's worth justifying explicitly:

- The redirected `GNUPGHOME=/tmp/.gnupg` is **fresh and empty** per
session — no preloaded keys.
- Without keys, gpg can hash/check signatures of public material but
cannot decrypt or sign anything.
- An agent would have to first import a key (which requires the user to
provide it — keys are not pulled from anywhere automatically) before gpg
becomes capable of any sensitive operation.
- This is the same threat model as `git` and `curl`, which are already
in the image and could equally be used to fetch arbitrary content. gpg
adds no new capability that the existing toolchain doesn't already have.

If the project explicitly *did* want gpg unavailable to agents, the
right fix would be to remove the GNUPGHOME redirect from NVIDIA#1121 *and* the
matching test assertions, not to keep the env wiring while leaving the
binary missing — that's just confusing.

## Type of Change

- [x] Code change for a new feature, bug fix, or refactor.
- [ ] Code change with doc updates.
- [ ] Doc only. Prose changes without code sample modifications.
- [ ] Doc only. Includes code sample changes.

## Testing

Smoke-tested locally by building `Dockerfile.base` with the fix and
running the exact failing command from the bug report:

```sh
$ docker build -f Dockerfile.base -t nemoclaw-base-test:gnupg .
[...]
=> exporting to image  46.7s done

$ docker run --rm nemoclaw-base-test:gnupg gpg --version
gpg (GnuPG) 2.2.40
libgcrypt 1.10.1

$ docker run --rm nemoclaw-base-test:gnupg gpg --list-keys
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
(exit 0)

# And with the runtime-redirected GNUPGHOME from nemoclaw-start.sh:
$ docker run --rm -e GNUPGHOME=/tmp/.gnupg nemoclaw-base-test:gnupg \
    sh -c 'mkdir -p /tmp/.gnupg && chmod 700 /tmp/.gnupg && gpg --list-keys'
gpg: keybox '/tmp/.gnupg/pubring.kbx' created
(exit 0)
```

Both the default `~/.gnupg` and the runtime-redirected `/tmp/.gnupg`
(matching what `nemoclaw-start.sh` exports) work as expected. The exact
`gpg --list-keys` failure from the bug report no longer reproduces.

- [x] `hadolint Dockerfile.base` — clean (no warnings)
- [x] `docker build -f Dockerfile.base` — succeeds, exports to image
cleanly
- [x] `gpg --version` in built image — works (`gpg (GnuPG) 2.2.40`)
- [x] `gpg --list-keys` in built image — works (was `bash: gpg: command
not found` before this PR)
- [x] `gpg --list-keys` with `GNUPGHOME=/tmp/.gnupg` — works (matches
the runtime env from `nemoclaw-start.sh`)
- [ ] `npx prek run --all-files` — partial: ran the affected hooks
(commitlint, gitleaks, hadolint) which all pass; did NOT run `test-cli`
against the full local suite because two pre-existing baseline failures
on stock `main` get in the way on a WSL2 dev host (the
`shouldPatchCoredns` issue addressed by PR NVIDIA#1626 (merged) and the
install-preflight PATH leakage addressed by PR NVIDIA#1628 (open)). Upstream
CI runs on Linux GHA runners and doesn't hit either of those, so it'll
exercise the full suite normally.
- [ ] `npm test` — same caveat as above, ran the relevant projects in
isolation
- [ ] `make docs` builds without warnings. (for doc-only changes — N/A)

## Checklist

### General

- [x] I have read and followed the [contributing
guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
- [ ] I have read and followed the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
(for doc-only changes — N/A)

### Code Changes

- [x] Formatters applied — `hadolint Dockerfile.base` clean. No
JS/TS/Python files touched.
- [x] Tests added or updated for new or changed behavior — N/A. The
existing `service-env.test.js` already asserts the `GNUPGHOME` redirect
introduced in NVIDIA#1121; this PR makes the corresponding binary available so
those assertions reflect a runtime that actually works. A new test that
spawns `gpg` directly inside a container would arguably be worth a
follow-up (it would have caught this gap originally), but it's a
separate concern from this one-line install fix.
- [x] No secrets, API keys, or credentials committed.
- [ ] Doc pages updated for any user-facing behavior changes — N/A. The
bug report describes the expected behavior; this PR just makes runtime
match it. No docs claim gpg is unavailable.

### Doc Changes

- N/A (no doc changes)

---
Signed-off-by: T Savo <evilgenius@nefariousplan.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Base system image now includes GnuPG as a pinned OS package.

* **Bug Fixes / Security**
* GnuPG runtime directory is now created in a separate step with
stricter permissions and sandbox ownership when applicable, reducing
exposure.

* **Tests**
* Test suite updated to verify the new directory creation and
permission/ownership behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: T Savo <evilgenius@nefariousplan.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[NemoClaw][Tests] Three test-environment fragility issues (slow hosts, system PATH leakage, WSL2)

3 participants