[codex] fix(security): harden control UI and k8s defaults#1149
[codex] fix(security): harden control UI and k8s defaults#114913ernkastel wants to merge 14 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an opt-in build/runtime gate Changes
sequenceDiagram
rect rgba(200,230,255,0.5)
participant User
participant OnboardCLI as Onboard CLI
participant DockerBuild as Dockerfile/Build
participant Runtime as Gateway Runtime
end
User->>OnboardCLI: run installer (optionally NEMOCLAW_INSECURE_LOCAL_UI=1)
OnboardCLI->>OnboardCLI: resolveInsecureLocalUiPreference(session)
OnboardCLI->>DockerBuild: patchStagedDockerfile(..., options.insecureLocalUi)
DockerBuild->>DockerBuild: bake ENV NEMOCLAW_INSECURE_LOCAL_UI
DockerBuild->>Runtime: start/runtime loads env & generates gateway config
Runtime->>Runtime: parse insecure flag & compute loopback_only_origins
Runtime->>Runtime: enable_insecure_local_ui = insecure && loopback_only_origins
Runtime->>Runtime: set controlUi.allowInsecureAuth / dangerouslyDisableDeviceAuth based on enable_insecure_local_ui
Runtime-->>User: Control UI auth behavior set
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
README.md (1)
74-79:⚠️ Potential issue | 🟡 MinorPut the opt-in flag in a copy-pasteable install example.
The note lands after the one-liner that already starts onboarding, so readers can miss the only chance to set it on the first run. Please move this above the command or show the exact installer invocation with the flag set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 74 - 79, Move the NEMOCLAW_INSECURE_LOCAL_UI opt-in note so it appears before the one‑liner installer or add a copy‑pasteable installer example that sets the env var (e.g., show the exact curl invocation with NEMOCLAW_INSECURE_LOCAL_UI=1 | bash), ensuring the README’s one‑liner example includes the environment flag so users can set it on first run; update the block containing the curl -fsSL ... | bash line and the explanatory text to reflect the new placement/example.k8s/README.md (1)
25-30:⚠️ Potential issue | 🟡 MinorSurface the Secret prerequisite from the deploy step.
The new Secret guidance is below the Quick Start
kubectl apply, so authenticated-endpoint users can easily start with the dummy fallback on their first deploy. A note near Step 1 would make the required ordering much harder to miss.Also applies to: 63-66
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/README.md` around lines 25 - 30, Move or add a short prerequisite note about creating the required Secret immediately above the "Deploy NemoClaw" step so users see it before running kubectl apply; reference the "Deploy NemoClaw" heading and the kubectl commands (kubectl create namespace nemoclaw and kubectl apply -f ...) and add a cross-reference that the same note is needed near the other occurrence (the block referenced by lines 63-66) so authenticated-endpoint users create the Secret first rather than relying on the dummy fallback.install.sh (1)
198-220:⚠️ Potential issue | 🟡 MinorShow the exact installer invocation for this flag in
--help.Listing the variable is useful, but the usage block above is still a pipe-to-
bashflow. Please add a one-liner or tell readers to export the variable first, otherwise the opt-in path is still hard to discover from--help.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@install.sh` around lines 198 - 220, Update the usage() help block to show an exact installer invocation for NEMOCLAW_EXPERIMENTAL so users know how to opt in; specifically, augment the NEMOCLAW_EXPERIMENTAL line to include a concrete example of setting the environment variable before running the installer and provide both the inline prefix form (prefix the curl | bash command with the env var) and the export-then-run form so readers can copy a one-liner; reference the same curl URL used in the existing usage text and ensure the change is made inside the usage() function near the NEMOCLAW_EXPERIMENTAL listing.
🧹 Nitpick comments (1)
test/security-configuration-hardening.test.js (1)
31-37: Consider tightening greedy regex patterns for multi-container robustness.The
[\s\S]*patterns match across the entire manifest content. If the manifest evolves to include multiple containers, these assertions could match settings from a different container while still passing.For stronger guarantees, you could extract the workspace container block first, then assert on that substring:
♻️ Optional: Scope assertions to workspace container block
// Extract workspace container section for more precise assertions const workspaceMatch = manifest.match(/- name: workspace[\s\S]*?(?=- name: [a-z]|containers:|$)/); expect(workspaceMatch).not.toBeNull(); const workspaceSection = workspaceMatch[0]; expect(workspaceSection).toMatch(/allowPrivilegeEscalation:\s*false/); expect(workspaceSection).toMatch(/capabilities:\s*[\r\n]+\s*drop:\s*[\r\n]+\s*-\s*ALL/); expect(workspaceSection).toMatch(/seccompProfile:\s*[\r\n]+\s*type:\s*RuntimeDefault/);This is a minor robustness improvement and acceptable to defer if the manifest structure is stable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/security-configuration-hardening.test.js` around lines 31 - 37, The current tests use greedy /[\s\S]*/ patterns against manifest which can accidentally match other containers; update the test in security-configuration-hardening.test.js to first extract the workspace container block from the manifest (use a non-greedy match like manifest.match(/- name: workspace[\s\S]*?(?=- name: |\ncontainers:|$)/)) and assert workspaceMatch is not null, then run the existing allowPrivilegeEscalation, capabilities drop, and seccompProfile assertions against that extracted workspaceSection (replace the global regex checks with checks against workspaceSection) so the assertions only validate the workspace container settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 2174-2183: The onboarding flow currently computes
insecureLocalUiEnabled from process.env at runtime but never persists it to
session state, so resumed runs can flip the Control UI auth mode; update the
onboarding session state to save the chosen insecureLocalUi value when first
computed and then read that persisted value (falling back to the env only if
absent) when calling createSandbox(...) and patchStagedDockerfile(...);
specifically, record the boolean under a clear session key (e.g.,
session.insecureLocalUi) at the point where insecureLocalUiEnabled is derived,
and replace direct uses of process.env in subsequent calls (including the
patchStagedDockerfile call that passes { insecureLocalUi: insecureLocalUiEnabled
}) with the stored session value so resumed runs consistently reuse the original
choice.
---
Outside diff comments:
In `@install.sh`:
- Around line 198-220: Update the usage() help block to show an exact installer
invocation for NEMOCLAW_EXPERIMENTAL so users know how to opt in; specifically,
augment the NEMOCLAW_EXPERIMENTAL line to include a concrete example of setting
the environment variable before running the installer and provide both the
inline prefix form (prefix the curl | bash command with the env var) and the
export-then-run form so readers can copy a one-liner; reference the same curl
URL used in the existing usage text and ensure the change is made inside the
usage() function near the NEMOCLAW_EXPERIMENTAL listing.
In `@k8s/README.md`:
- Around line 25-30: Move or add a short prerequisite note about creating the
required Secret immediately above the "Deploy NemoClaw" step so users see it
before running kubectl apply; reference the "Deploy NemoClaw" heading and the
kubectl commands (kubectl create namespace nemoclaw and kubectl apply -f ...)
and add a cross-reference that the same note is needed near the other occurrence
(the block referenced by lines 63-66) so authenticated-endpoint users create the
Secret first rather than relying on the dummy fallback.
In `@README.md`:
- Around line 74-79: Move the NEMOCLAW_INSECURE_LOCAL_UI opt-in note so it
appears before the one‑liner installer or add a copy‑pasteable installer example
that sets the env var (e.g., show the exact curl invocation with
NEMOCLAW_INSECURE_LOCAL_UI=1 | bash), ensuring the README’s one‑liner example
includes the environment flag so users can set it on first run; update the block
containing the curl -fsSL ... | bash line and the explanatory text to reflect
the new placement/example.
---
Nitpick comments:
In `@test/security-configuration-hardening.test.js`:
- Around line 31-37: The current tests use greedy /[\s\S]*/ patterns against
manifest which can accidentally match other containers; update the test in
security-configuration-hardening.test.js to first extract the workspace
container block from the manifest (use a non-greedy match like manifest.match(/-
name: workspace[\s\S]*?(?=- name: |\ncontainers:|$)/)) and assert workspaceMatch
is not null, then run the existing allowPrivilegeEscalation, capabilities drop,
and seccompProfile assertions against that extracted workspaceSection (replace
the global regex checks with checks against workspaceSection) so the assertions
only validate the workspace container settings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 359595bb-ed4e-46b6-bda9-d72aadc5a85e
📒 Files selected for processing (9)
DockerfileREADME.mdbin/lib/onboard.jsdocs/reference/architecture.mdinstall.shk8s/README.mdk8s/nemoclaw-k8s.yamltest/onboard.test.jstest/security-configuration-hardening.test.js
Signed-off-by: 13ernkastel <LennonCMJ@live.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
bin/lib/onboard.js (1)
3591-3609: Extract this sandbox-security branch out ofonboard().This block adds another decision path and session mutation to a function that is already over the repo’s complexity cap. Moving insecure-local-UI resolution, sandbox creation, and session write-back into a small helper would make the resume flow easier to audit.
As per coding guidelines "Maintain cyclomatic complexity limit of 20 (target: ratchet down to 15) for all functions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 3591 - 3609, The sandbox-security branch inside onboard() should be extracted into a small helper to reduce cyclomatic complexity: move the insecure-local-UI resolution, sandbox creation call and session write-back into a new function (e.g., createAndRecordSandbox or handleSandboxCreation) that takes the necessary inputs (gpu, model, provider, preferredInferenceApi, sandboxName, session/onboardSession) and returns sandboxName and any metadata; replace the block using resolveInsecureLocalUiPreference, createSandbox, and onboardSession.markStepComplete with a single call to that helper so onboard() no longer performs the decision logic or mutates the session directly.test/onboard-session.test.js (1)
86-118: Add afalseround-trip forinsecureLocalUi.Lines 96 and 203 only prove the opt-in path. Since
falseis the security-default value and the one most likely to disappear in a truthiness regression, it’s worth asserting thatcreateSession() -> saveSession() -> loadSession()andsummarizeForDebug()preserveinsecureLocalUi === falsetoo.Also applies to: 202-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard-session.test.js` around lines 86 - 118, The test suite currently only verifies the truthy opt-in path for insecureLocalUi; add an explicit false round-trip to ensure the default/false value isn't lost. Update the test(s) that exercise session.createSession(), session.saveSession(), session.loadSession(), and session.summarizeForDebug() to include a case where insecureLocalUi is false (either set it explicitly or rely on createSession() default) and assert that loaded.insecureLocalUi === false and that summarizeForDebug() output also preserves insecureLocalUi as false; use the same test file and extend the existing "persists safe provider metadata without persisting secrets" test or add a sibling test to cover this negative case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Line 981: The assignment for insecureLocalUiValue should use a strict boolean
check like other places in this file: when computing insecureLocalUiValue
(currently using options.insecureLocalUi in the ternary) change the logic to
first test typeof options.insecureLocalUi === "boolean" and then map true -> "1"
and false -> "0" (with a sensible default "0" if not a boolean) so that
patchStagedDockerfile()/the exported behavior only treats actual booleans as
enabling the insecure flag.
In `@test/security-configuration-hardening.test.js`:
- Around line 39-45: The manifest assertions are too loose and can false-pass;
update the test in security-configuration-hardening.test.js to (1) assert the
COMPATIBLE_API_KEY env entry contains a valueFrom->secretKeyRef with name
"nemoclaw-compatible-api-key" and that optional: true appears within the same
env block (use a single non-greedy regex like /-
name:\s*COMPATIBLE_API_KEY[\s\S]*?valueFrom:\s*secretKeyRef[\s\S]*?name:\s*nemoclaw-compatible-api-key[\s\S]*?optional:\s*true/
to target that block) and (2) strengthen the curl rejection to detect any
piped-to-bash pattern rather than one exact URL by asserting the manifest does
not match a regex such as /\bcurl\b[\s\S]*\|\s*bash\b/ so any "curl ... | bash"
is disallowed.
---
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 3591-3609: The sandbox-security branch inside onboard() should be
extracted into a small helper to reduce cyclomatic complexity: move the
insecure-local-UI resolution, sandbox creation call and session write-back into
a new function (e.g., createAndRecordSandbox or handleSandboxCreation) that
takes the necessary inputs (gpu, model, provider, preferredInferenceApi,
sandboxName, session/onboardSession) and returns sandboxName and any metadata;
replace the block using resolveInsecureLocalUiPreference, createSandbox, and
onboardSession.markStepComplete with a single call to that helper so onboard()
no longer performs the decision logic or mutates the session directly.
In `@test/onboard-session.test.js`:
- Around line 86-118: The test suite currently only verifies the truthy opt-in
path for insecureLocalUi; add an explicit false round-trip to ensure the
default/false value isn't lost. Update the test(s) that exercise
session.createSession(), session.saveSession(), session.loadSession(), and
session.summarizeForDebug() to include a case where insecureLocalUi is false
(either set it explicitly or rely on createSession() default) and assert that
loaded.insecureLocalUi === false and that summarizeForDebug() output also
preserves insecureLocalUi as false; use the same test file and extend the
existing "persists safe provider metadata without persisting secrets" test or
add a sibling test to cover this negative case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 21681fdc-0e0b-4fbd-843a-2185e2b4f6b2
📒 Files selected for processing (8)
README.mdbin/lib/onboard-session.jsbin/lib/onboard.jsinstall.shk8s/README.mdtest/onboard-session.test.jstest/onboard.test.jstest/security-configuration-hardening.test.js
✅ Files skipped from review due to trivial changes (2)
- install.sh
- k8s/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- README.md
- test/onboard.test.js
|
The review feedback on this PR is addressed on the current head ( |
2ffb0ab to
0d16ae7
Compare
Signed-off-by: 13ernkastel <LennonCMJ@live.com>
0d16ae7 to
4bd0255
Compare
|
Hi @13ernkastel — thank you for this contribution. Your approach to gating Control UI auth behind an explicit opt-in ( We had several PRs in flight tackling overlapping parts of the same problem — gateway auth defaults, auto-pair restrictions, and Dockerfile config hardening (#123, #690, and this one). To avoid fragmentation and conflicting env var conventions, we consolidated the approaches into a single PR: #1217 ( #1217 took a similar direction to what you proposed here — secure-by-default with an explicit build arg to opt out ( We referred to your implementation while building the consolidated fix, and the thinking around deterministic image config and explicit operator opt-in directly influenced the final design. Closing this in favor of the merged approach in #1217. Thank you again for the security contribution — we hope to see more from you on NemoClaw! 🙏 |
|
Closing — superseded by #1217 (see comment above). |
Cherry-picked k8s hardening from @13ernkastel's PR NVIDIA#1149, which was closed after its auth changes were superseded by NVIDIA#1217. Changes to k8s/nemoclaw-k8s.yaml: - automountServiceAccountToken: false - enableServiceLinks: false - workspace container: allowPrivilegeEscalation false, drop ALL caps, RuntimeDefault seccomp - COMPATIBLE_API_KEY from optional Secret with dummy fallback - NEMOCLAW_POLICY_MODE default changed from skip to suggested - Replace curl|bash with download-then-execute pattern Also adds k8s/README.md updates and regression test. Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Fixes #1323 — Partially addresses #803. ## Summary Hardens the experimental Kubernetes sample manifest (`k8s/nemoclaw-k8s.yaml`) with safer defaults. This work was cherry-picked from @13ernkastel's PR #1149, which was closed after its auth hardening portion was superseded by #1217. The k8s hardening in #1149 was independent and valuable, so we're carrying it forward here with proper attribution. ## Changes ### Pod-level - `automountServiceAccountToken: false` — pod does not need k8s API access - `enableServiceLinks: false` — prevents service env var injection ### Workspace container security context - `allowPrivilegeEscalation: false` - `capabilities.drop: [ALL]` - `seccompProfile.type: RuntimeDefault` ### Credential handling - `COMPATIBLE_API_KEY` loaded from optional Secret (`nemoclaw-compatible-api-key`) with dummy fallback for unauthenticated endpoints (Dynamo/vLLM) - `NEMOCLAW_POLICY_MODE` default changed from `skip` to `suggested` ### Installer download - Replaced `curl ... | bash` with download-then-execute pattern using `curl --proto =https --tlsv1.2` ### Documentation - Updated `k8s/README.md` with Secret setup instructions and revised config table ### Tests - Added `test/security-configuration-hardening.test.js` with regression coverage for all k8s manifest hardening ## Attribution Co-authored-by: @13ernkastel (from PR #1149) ## Context This came up during post-merge cleanup of #1217 (`fix(security): harden gateway auth defaults and restrict auto-pair`). PR #1149 bundled k8s hardening with auth changes; we split them apart so the k8s work doesn't get lost. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added optional `NEMOCLAW_POLICY_MODE` environment variable for deployment configuration (default: `suggested`). * **Documentation** * Updated Kubernetes deployment guide with revised manifest setup instructions and safer configuration examples. * **Configuration Changes** * API key now sourced from Kubernetes Secret with optional override; defaults to `dummy` when not provided. * Kubernetes manifest applies stricter default configurations. * **Tests** * Added validation tests for manifest configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: 13ernkastel <LennonCMJ@live.com> Co-authored-by: Aaron Erickson 🦞 <aerickson@nvidia.com>
What changed
This draft hardens two operator-facing security defaults.
NEMOCLAW_INSECURE_LOCAL_UI=1opt-in and only enables token-only auth when all configured UI origins are loopback-only.allowPrivilegeEscalation: false,capabilities.drop: [ALL], andseccompProfile: RuntimeDefault, the manifest no longer uses directcurl | bash, and non-interactive onboarding usesNEMOCLAW_POLICY_MODE=suggestedinstead of skipping policy setup.COMPATIBLE_API_KEYwith a dummy fallback for unauthenticated endpoints like Dynamo/vLLM, so the manifest no longer normalizes inline placeholder credentials.Why it changed
The sandbox image previously baked in insecure Control UI auth settings for every deployment, even when they were not explicitly requested. The Kubernetes sample also disabled policy application during onboarding, used weaker runtime defaults than necessary for an example manifest, and encouraged a less disciplined endpoint-key setup.
This change makes the weaker posture opt-in instead of default while keeping the existing local development escape hatch available when it is explicitly requested.
Impact
NEMOCLAW_INSECURE_LOCAL_UI=1.Validation
bash -n install.shbash -n scripts/nemoclaw-start.shnpx vitest run test/onboard.test.js test/security-c2-dockerfile-injection.test.js test/security-configuration-hardening.test.js./node_modules/.bin/eslint bin/lib/onboard.js test/onboard.test.js test/security-configuration-hardening.test.jsSummary by CodeRabbit
New Features
Documentation
Kubernetes Improvements
Tests
Signed-off-by: 13ernkastel LennonCMJ@live.com