Skip to content

security: add shell-injection prevention and pass NGC_API_KEY to NIM container#219

Closed
brianwtaylor wants to merge 1 commit intoNVIDIA:mainfrom
brianwtaylor:fix/nim-ngc-api-key
Closed

security: add shell-injection prevention and pass NGC_API_KEY to NIM container#219
brianwtaylor wants to merge 1 commit intoNVIDIA:mainfrom
brianwtaylor:fix/nim-ngc-api-key

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 17, 2026

Summary

  • Adds shellQuote() to runner.js — single-quote wrapper that prevents shell injection in interpolated values
  • Passes NGC_API_KEY (with NVIDIA_API_KEY fallback) to NIM docker containers, fixing NGC_API_KEY is not passed to Nim container #210 where containers cannot pull model weights from NGC
  • Quotes CHAT_UI_URL, NVIDIA_API_KEY, and NGC_API_KEY when passed to sandbox env
  • Double-quotes model names in openshell inference set commands
  • Consolidates the local shellQuote from onboard.js into the shared runner.js export

Intentionally does not touch --credential lines — that's #330's territory.

Test plan

Automated Tests

node --test test/nim.test.js

11 new tests covering:

  • shellQuote: simple values, embedded quotes, empty strings, metacharacters, type coercion
  • NGC_API_KEY escaping: normal key, embedded quotes, empty key, shell metacharacters
  • Env var fallback: NGC_API_KEY preferred over NVIDIA_API_KEY

Manual verification

@brianwtaylor brianwtaylor force-pushed the fix/nim-ngc-api-key branch 3 times, most recently from f8b24f5 to 2184f5d Compare March 17, 2026 20:26
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Nim management helpers (image lookup, pull, start, stop, health, status), GPU detection, injects escaped NGC_API_KEY/NVIDIA_API_KEY into docker runs, extends onboarding with sandbox/provider setup, and adds tests validating API-key escaping.

Changes

Cohort / File(s) Summary
Nim container and GPU helpers
bin/lib/nim.js
New exported helpers: containerName, getImageForModel, listModels, detectGpu, pullNimImage, startNimContainer, waitForNimHealth, stopNimContainer, nimStatus. Implements image lookup/pull, GPU detection (NVIDIA, DGX Spark GB10, Apple Silicon), health polling, and injection of shell-escaped NGC_API_KEY/NVIDIA_API_KEY into docker run.
Onboarding and CLI helpers
bin/lib/onboard.js
Adds onboard(opts) entrypoint and multiple onboarding helpers (step printing, Docker checks, openshell install, preflight, gateway/startup, sandbox creation, setupNim/setupInference/setupOpenclaw/setupPolicies, dashboard). Improves sandbox env escaping and non-interactive handling.
Tests
test/nim.test.js
New "NGC_API_KEY escaping" tests covering embedding single quotes, empty keys, shell metacharacters, and fallback behavior between primary and fallback variables.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI
  participant Host as Host OS
  participant Registry as Image Registry
  participant Docker as Docker Engine
  participant NIM as NIM Container

  CLI->>Host: startNimContainer(sandbox, model, port, env keys)
  Host->>Host: detectGpu()
  Host->>Registry: getImageForModel(model) / pullNimImage()
  Registry-->>Host: image available
  Host->>Docker: docker run --name container -p port:8000 -e NGC_API_KEY='...' -e NVIDIA_API_KEY='...' <image>
  Docker-->>Host: container started
  Host->>NIM: waitForNimHealth(port) (poll /health)
  NIM-->>Host: /health -> healthy
  Host-->>CLI: return container status (running, healthy, device info)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped in code with keys in paw,
Escaped the quotes to keep things law,
Docker sang and GPUs woke bright,
Health checks blinked their green delight,
A nimble rabbit onboarded night.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive While the core changes align with issue #210, the PR significantly expands the public API surface (8+ new exported functions in nim.js and 1 in onboard.js) beyond the minimal requirement to pass NGC_API_KEY, raising questions about scope. Clarify whether the expanded API functions (containerName, listModels, detectGpu, etc.) and enhanced helpers are necessary for the NGC_API_KEY fix or if they should be addressed in separate issues for better separation of concerns.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR fully addresses the linked issue #210 by implementing NGC_API_KEY passing to the NIM container via -e flag in startNimContainer(), with single-quote escaping for security, and forwarding it through the sandbox environment via onboard.js.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title directly addresses the main changes: adding shell-injection prevention and passing NGC_API_KEY to NIM container, which aligns with the core fix for issue #210 and the primary modifications in startNimContainer and onboard.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@brianwtaylor
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/nim.test.js (2)

107-115: Fallback tests use static strings rather than environment variable simulation.

These tests verify JavaScript's || operator behavior with literal strings, but they don't actually test the environment variable fallback in nim.js. The assertions will always pass because "ngc-primary" || "nvidia-fallback" is evaluated at parse time.

If testing fallback behavior is important, consider mocking process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY and calling the actual module function.

♻️ Suggested approach for actual fallback testing
it("prefers NGC_API_KEY over NVIDIA_API_KEY in fallback", () => {
  const originalNgc = process.env.NGC_API_KEY;
  const originalNvidia = process.env.NVIDIA_API_KEY;
  try {
    process.env.NGC_API_KEY = "ngc-primary";
    process.env.NVIDIA_API_KEY = "nvidia-fallback";
    // Call the actual function or extract the fallback logic to a testable helper
    const ngcKey = process.env.NGC_API_KEY || process.env.NVIDIA_API_KEY || "";
    assert.equal(ngcKey, "ngc-primary");
  } finally {
    process.env.NGC_API_KEY = originalNgc;
    process.env.NVIDIA_API_KEY = originalNvidia;
  }
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nim.test.js` around lines 107 - 115, The tests in nim.test.js currently
assert JavaScript OR behavior with string literals instead of exercising the
real environment-variable fallback; update the two tests ("prefers NGC_API_KEY
over NVIDIA_API_KEY in fallback" and "falls back to NVIDIA_API_KEY when
NGC_API_KEY empty") to set and restore process.env.NGC_API_KEY and
process.env.NVIDIA_API_KEY around calling the actual fallback logic (either by
importing the helper from nim.js or evaluating process.env.NGC_API_KEY ||
process.env.NVIDIA_API_KEY || ""), asserting the returned value; ensure you save
original env values and restore them in a finally block so tests are isolated.

78-116: Tests validate the escaping algorithm but don't exercise the actual module code.

The tests duplicate the escaping logic inline rather than calling nim.startNimContainer with mocked environment variables. This verifies the escaping pattern is correct but doesn't confirm the module applies it properly.

Consider adding integration tests that mock process.env and verify the actual startNimContainer function generates the expected docker command (you could mock the run function to capture the command string).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nim.test.js` around lines 78 - 116, Current tests inline-implement
escaping instead of exercising the real code; update the tests to mock
process.env and call the actual startNimContainer function, then intercept/mock
the run function to capture the generated docker command string and assert it
contains the correctly single-quoted/escaped NGC_API_KEY behavior. Specifically,
set process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY for the different
scenarios, call startNimContainer(...), and verify the captured run invocation
includes the expected "-e NGC_API_KEY='...'" (with embedded quotes escaped as
"'\\''") and that precedence/fallback logic prefers NGC_API_KEY over
NVIDIA_API_KEY; restore process.env and restore the mocked run after each test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/nim.test.js`:
- Around line 107-115: The tests in nim.test.js currently assert JavaScript OR
behavior with string literals instead of exercising the real
environment-variable fallback; update the two tests ("prefers NGC_API_KEY over
NVIDIA_API_KEY in fallback" and "falls back to NVIDIA_API_KEY when NGC_API_KEY
empty") to set and restore process.env.NGC_API_KEY and
process.env.NVIDIA_API_KEY around calling the actual fallback logic (either by
importing the helper from nim.js or evaluating process.env.NGC_API_KEY ||
process.env.NVIDIA_API_KEY || ""), asserting the returned value; ensure you save
original env values and restore them in a finally block so tests are isolated.
- Around line 78-116: Current tests inline-implement escaping instead of
exercising the real code; update the tests to mock process.env and call the
actual startNimContainer function, then intercept/mock the run function to
capture the generated docker command string and assert it contains the correctly
single-quoted/escaped NGC_API_KEY behavior. Specifically, set
process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY for the different
scenarios, call startNimContainer(...), and verify the captured run invocation
includes the expected "-e NGC_API_KEY='...'" (with embedded quotes escaped as
"'\\''") and that precedence/fallback logic prefers NGC_API_KEY over
NVIDIA_API_KEY; restore process.env and restore the mocked run after each test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e699047a-0eff-41ee-aa17-7bbd37bc84d7

📥 Commits

Reviewing files that changed from the base of the PR and between 2a9afbc and ccad222.

📒 Files selected for processing (3)
  • bin/lib/nim.js
  • bin/lib/onboard.js
  • test/nim.test.js

@brianwtaylor
Copy link
Copy Markdown
Contributor Author

@coderabbitai Which specific functions are missing docstrings for the Docstring Coverage check? Please list them.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all JS files and check for functions missing JSDoc comments
echo "=== bin/lib/nim.js ==="
cat bin/lib/nim.js

echo ""
echo "=== bin/lib/onboard.js ==="
cat bin/lib/onboard.js

echo ""
echo "=== test/nim.test.js ==="
cat test/nim.test.js

Length of output: 31923


@brianwtaylor: Here's the full list of functions missing JSDoc docstrings across the two source files. The only two functions that currently have docstrings are startNimContainer and createSandbox (both added in this PR).


bin/lib/nim.js — missing docstrings (8 functions)

Function Signature
containerName containerName(sandboxName)
getImageForModel getImageForModel(modelName)
listModels listModels()
detectGpu detectGpu()
pullNimImage pullNimImage(model)
waitForNimHealth waitForNimHealth(port = 8000, timeout = 300)
stopNimContainer stopNimContainer(sandboxName)
nimStatus nimStatus(sandboxName)

bin/lib/onboard.js — missing docstrings (11 functions)

Function Signature
step step(n, total, msg)
isDockerRunning isDockerRunning()
isOpenshellInstalled isOpenshellInstalled()
installOpenshell installOpenshell()
preflight preflight()
startGateway startGateway(gpu)
setupNim setupNim(sandboxName, gpu)
setupInference setupInference(sandboxName, model, provider)
setupOpenclaw setupOpenclaw(sandboxName)
setupPolicies setupPolicies(sandboxName)
printDashboard printDashboard(sandboxName, model, provider)
onboard onboard()

Note: test/nim.test.js uses only describe/it blocks (no exported functions), so it is typically excluded from docstring coverage checks.

Would you like me to generate JSDoc stubs for all of these, or address them in a follow-up issue?

@brianwtaylor
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
bin/lib/onboard.js (1)

223-235: ⚠️ Potential issue | 🔴 Critical

Escape CHAT_UI_URL before shell interpolation.

CHAT_UI_URL is injected into a shell command without quoting/escaping, so a crafted value can break command parsing or inject shell tokens.

🔧 Proposed fix
-  const chatUiUrl = process.env.CHAT_UI_URL || 'http://127.0.0.1:18789';
-  const envArgs = [`CHAT_UI_URL=${chatUiUrl}`];
+  const chatUiUrl = process.env.CHAT_UI_URL || "http://127.0.0.1:18789";
+  const safeChatUiUrl = chatUiUrl.replace(/'/g, "'\\''");
+  const envArgs = [`CHAT_UI_URL='${safeChatUiUrl}'`];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 223 - 235, CHAT_UI_URL is interpolated raw
into the shell command which allows shell injection; create a sanitized value
like the other keys before building envArgs: compute safeChatUiUrl =
(process.env.CHAT_UI_URL || chatUiUrl).replace(/'/g, "'\\''") and push
`CHAT_UI_URL='${safeChatUiUrl}'` into envArgs, then use envArgs.join(" ") in the
run(...) call (symbols: chatUiUrl, envArgs, run(...)). Ensure you always wrap
the escaped value in single quotes when pushing to envArgs.
bin/lib/nim.js (1)

213-234: ⚠️ Potential issue | 🟡 Minor

nimStatus health probe is locked to port 8000.

If startNimContainer is called with a non-default port, nimStatus can report incorrect health on Line 229.

🔧 Proposed fix
 /**
  * Check the running status and health of a sandbox's NIM container.
  * `@param` {string} sandboxName - Sandbox identifier.
+ * `@param` {number} [port=8000] - Host port where NIM is exposed.
  * `@returns` {{ running: boolean, healthy?: boolean, container: string, state?: string }}
  */
-function nimStatus(sandboxName) {
+function nimStatus(sandboxName, port = 8000) {
   const name = containerName(sandboxName);
   try {
@@
     let healthy = false;
     if (state === "running") {
-      const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, {
+      const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, {
         ignoreError: true,
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 213 - 234, nimStatus currently hardcodes probing
localhost:8000 which breaks health checks when startNimContainer mapped NIM to a
different host port; change nimStatus so that when state === "running" it
inspects the container's port mapping (via docker inspect) to find the HostPort
for the container's 8000/tcp (or the configured NIM service port) and then curl
localhost:<HostPort>/v1/models instead of always using 8000. Update the logic in
nimStatus (use containerName(sandboxName) to get the container) to parse the
docker inspect output for the mapped host port and fall back to 8000 if no
mapping is found.
🧹 Nitpick comments (2)
bin/lib/onboard.js (1)

229-232: Align sandbox NGC_API_KEY behavior with NIM fallback semantics.

Right now, the sandbox only gets NGC_API_KEY when that exact env var is present. To match startNimContainer behavior, set NGC_API_KEY from NVIDIA_API_KEY when NGC_API_KEY is absent.

♻️ Proposed alignment
-  if (process.env.NGC_API_KEY) {
-    const safeKey = process.env.NGC_API_KEY.replace(/'/g, "'\\''");
+  const sandboxNgcKey = process.env.NGC_API_KEY || process.env.NVIDIA_API_KEY;
+  if (sandboxNgcKey) {
+    const safeKey = sandboxNgcKey.replace(/'/g, "'\\''");
     envArgs.push(`NGC_API_KEY='${safeKey}'`);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 229 - 232, The sandbox only adds NGC_API_KEY
to envArgs when process.env.NGC_API_KEY exists, but startNimContainer falls back
to process.env.NVIDIA_API_KEY; update the logic around envArgs to use
NVIDIA_API_KEY when NGC_API_KEY is not set: check process.env.NGC_API_KEY first,
otherwise use process.env.NVIDIA_API_KEY, apply the same safeKey escaping
(`replace(/'/g, "'\\''")`) and push the resulting `NGC_API_KEY='...''` string
into envArgs so sandbox behavior matches startNimContainer (references: envArgs,
process.env.NGC_API_KEY, process.env.NVIDIA_API_KEY).
bin/lib/nim.js (1)

150-170: Validate port before composing the docker command.

port is documented as numeric but not validated at runtime. A non-integer value can produce an invalid command string.

🔧 Proposed validation
 function startNimContainer(sandboxName, model, port = 8000) {
+  if (!Number.isInteger(port) || port < 1 || port > 65535) {
+    throw new Error(`Invalid port: ${port}`);
+  }
   const name = containerName(sandboxName);
   const image = getImageForModel(model);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 150 - 170, The startNimContainer function
currently accepts port without runtime validation; add validation in
startNimContainer to coerce port to a number (e.g., Number(port) or parseInt),
verify it is an integer and within the valid TCP port range (1–65535), and if
invalid log an error (e.g., console.error with the offending value) and call
process.exit(1) before composing the docker run command; ensure you use the
validated numeric port variable when building envFlags and the docker run string
so malformed inputs cannot produce an invalid command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 213-234: nimStatus currently hardcodes probing localhost:8000
which breaks health checks when startNimContainer mapped NIM to a different host
port; change nimStatus so that when state === "running" it inspects the
container's port mapping (via docker inspect) to find the HostPort for the
container's 8000/tcp (or the configured NIM service port) and then curl
localhost:<HostPort>/v1/models instead of always using 8000. Update the logic in
nimStatus (use containerName(sandboxName) to get the container) to parse the
docker inspect output for the mapped host port and fall back to 8000 if no
mapping is found.

In `@bin/lib/onboard.js`:
- Around line 223-235: CHAT_UI_URL is interpolated raw into the shell command
which allows shell injection; create a sanitized value like the other keys
before building envArgs: compute safeChatUiUrl = (process.env.CHAT_UI_URL ||
chatUiUrl).replace(/'/g, "'\\''") and push `CHAT_UI_URL='${safeChatUiUrl}'` into
envArgs, then use envArgs.join(" ") in the run(...) call (symbols: chatUiUrl,
envArgs, run(...)). Ensure you always wrap the escaped value in single quotes
when pushing to envArgs.

---

Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 150-170: The startNimContainer function currently accepts port
without runtime validation; add validation in startNimContainer to coerce port
to a number (e.g., Number(port) or parseInt), verify it is an integer and within
the valid TCP port range (1–65535), and if invalid log an error (e.g.,
console.error with the offending value) and call process.exit(1) before
composing the docker run command; ensure you use the validated numeric port
variable when building envFlags and the docker run string so malformed inputs
cannot produce an invalid command.

In `@bin/lib/onboard.js`:
- Around line 229-232: The sandbox only adds NGC_API_KEY to envArgs when
process.env.NGC_API_KEY exists, but startNimContainer falls back to
process.env.NVIDIA_API_KEY; update the logic around envArgs to use
NVIDIA_API_KEY when NGC_API_KEY is not set: check process.env.NGC_API_KEY first,
otherwise use process.env.NVIDIA_API_KEY, apply the same safeKey escaping
(`replace(/'/g, "'\\''")`) and push the resulting `NGC_API_KEY='...''` string
into envArgs so sandbox behavior matches startNimContainer (references: envArgs,
process.env.NGC_API_KEY, process.env.NVIDIA_API_KEY).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8a2e24a6-7169-4933-a959-950d835a8beb

📥 Commits

Reviewing files that changed from the base of the PR and between ccad222 and 445376c.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • bin/lib/onboard.js

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/lib/nim.js (1)

218-238: ⚠️ Potential issue | 🟡 Minor

Hardcoded port 8000 in nimStatus ignores the port parameter used elsewhere.

The waitForNimHealth function accepts a configurable port parameter, but nimStatus hardcodes port 8000 when checking health. If a NIM container is started on a different port, nimStatus will check the wrong endpoint.

🐛 Proposed fix to use the actual container port
-function nimStatus(sandboxName) {
+function nimStatus(sandboxName, port = 8000) {
   const name = containerName(sandboxName);
   try {
     const state = runCapture(
       `docker inspect --format '{{.State.Status}}' ${name} 2>/dev/null`,
       { ignoreError: true }
     );
     if (!state) return { running: false, container: name };

     let healthy = false;
     if (state === "running") {
-      const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, {
+      const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, {
         ignoreError: true,
       });
       healthy = !!health;
     }
     return { running: state === "running", healthy, container: name, state };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 218 - 238, nimStatus currently hardcodes port
8000 when probing health; change nimStatus to accept an optional port parameter
(e.g., nimStatus(sandboxName, port = 8000)) and use that port in the health curl
check instead of 8000, then update callers (notably waitForNimHealth) to pass
the configured port through; keep the default at 8000 so behavior is unchanged
when no port is supplied.
🧹 Nitpick comments (1)
test/nim.test.js (1)

107-115: Tests verify JS || semantics rather than actual env var fallback.

These tests use hardcoded string literals ("ngc-primary" || "nvidia-fallback") which test JavaScript's || operator behavior in isolation. They don't exercise the actual process.env.NGC_API_KEY || process.env.NVIDIA_API_KEY logic in nim.js.

Consider adding integration tests that mock process.env to verify the real implementation's fallback behavior, but this is fine for now as a sanity check of the pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nim.test.js` around lines 107 - 115, The tests currently assert JS ||
behavior using hardcoded literals (the ngcKey variables in the two it blocks)
instead of exercising the real fallback in nim.js; update each test to set
process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY to the desired values
(e.g., set NGC_API_KEY="ngc-primary" and NVIDIA_API_KEY="nvidia-fallback" for
the first case, and NGC_API_KEY="" with NVIDIA_API_KEY="nvidia-fallback" for the
second), then import/require the nim.js module or call its exported function
that reads these env vars so the module evaluates the actual fallback, assert
the returned value, and finally restore/delete the modified process.env entries
to avoid cross-test pollution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 218-238: nimStatus currently hardcodes port 8000 when probing
health; change nimStatus to accept an optional port parameter (e.g.,
nimStatus(sandboxName, port = 8000)) and use that port in the health curl check
instead of 8000, then update callers (notably waitForNimHealth) to pass the
configured port through; keep the default at 8000 so behavior is unchanged when
no port is supplied.

---

Nitpick comments:
In `@test/nim.test.js`:
- Around line 107-115: The tests currently assert JS || behavior using hardcoded
literals (the ngcKey variables in the two it blocks) instead of exercising the
real fallback in nim.js; update each test to set process.env.NGC_API_KEY and
process.env.NVIDIA_API_KEY to the desired values (e.g., set
NGC_API_KEY="ngc-primary" and NVIDIA_API_KEY="nvidia-fallback" for the first
case, and NGC_API_KEY="" with NVIDIA_API_KEY="nvidia-fallback" for the second),
then import/require the nim.js module or call its exported function that reads
these env vars so the module evaluates the actual fallback, assert the returned
value, and finally restore/delete the modified process.env entries to avoid
cross-test pollution.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 90da016d-0491-431c-8a97-871918a66562

📥 Commits

Reviewing files that changed from the base of the PR and between 0d1d2d8 and 445376c.

📒 Files selected for processing (3)
  • bin/lib/nim.js
  • bin/lib/onboard.js
  • test/nim.test.js

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 255-259: The CHAT_UI_URL value is currently injected into the env
shell command unescaped; use the same single-quote-escaping helper used for
NVIDIA_API_KEY and NGC_API_KEY to wrap CHAT_UI_URL so spaces or shell
metacharacters cannot break or inject into the onboarding command, and apply the
same change to the other env-construction site that mirrors this pattern (the
other block that builds the env string later in the file). Locate the env
construction that references CHAT_UI_URL and replace the raw interpolation with
the helper-escaped version (same helper function name used for
NVIDIA_API_KEY/NGC_API_KEY) in both places.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 853216d5-7c69-466e-aeea-b90e47369994

📥 Commits

Reviewing files that changed from the base of the PR and between 445376c and cadbc5f.

📒 Files selected for processing (3)
  • bin/lib/nim.js
  • bin/lib/onboard.js
  • test/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/nim.test.js

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
bin/lib/onboard.js (2)

474-513: ⚠️ Potential issue | 🟠 Major

Check port 8000 before starting local NIM.

This path always binds NIM to localhost:8000, but preflight never reserves that port and the later health/provider wiring assumes the bind succeeded. If something is already listening there—especially the vLLM instance you detect earlier—the failure happens deep inside docker run instead of as a clear onboarding error.

🛠️ Minimal guard
+        const nimPort = 8000;
+        const portCheck = await checkPortAvailable(nimPort);
+        if (!portCheck.ok) {
+          console.error(`  Port ${nimPort} is already in use. Stop the existing local model server or choose another provider.`);
+          process.exit(1);
+        }
+
         console.log(`  Pulling NIM image for ${model}...`);
         nim.pullNimImage(model);
 
         console.log("  Starting NIM container...");
-        nimContainer = nim.startNimContainer(sandboxName, model);
+        nimContainer = nim.startNimContainer(sandboxName, model, nimPort);
 
         console.log("  Waiting for NIM to become healthy...");
-        if (!nim.waitForNimHealth()) {
+        if (!nim.waitForNimHealth(nimPort)) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 474 - 513, Before pulling/starting the local
NIM container, check that localhost:8000 is free and fail fast with a clear
error if it is occupied: call a port-check helper (or implement a quick TCP bind
probe) prior to nim.pullNimImage and nim.startNimContainer, and if the probe
shows the port in use print a descriptive error and process.exit(1). Add this
check in the same branch handling selected.key === "nim" right before the lines
that log "Pulling NIM image..." and reference nim.pullNimImage,
nim.startNimContainer, and nim.waitForNimHealth so the start sequence only runs
when port 8000 is available.

512-517: ⚠️ Potential issue | 🟠 Major

Tear down the failed NIM container before fallback.

When the health check times out, the code forgets the container but does not stop/remove it. That can keep port 8000 or GPU memory occupied for the next onboarding attempt.

🧹 Minimal cleanup
         if (!nim.waitForNimHealth()) {
           console.error("  NIM failed to start. Falling back to cloud API.");
+          nim.stopNimContainer(sandboxName);
           model = null;
           nimContainer = null;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 512 - 517, When nim.waitForNimHealth()
fails, the code currently nulls model and nimContainer but leaves the container
running; update the failure branch to stop and remove the NIM container before
falling back: call the appropriate teardown on nimContainer (e.g.,
nimContainer.stop() and nimContainer.remove() or
nim.stopContainer()/nim.teardownContainer() if such helpers exist), handle any
errors from those calls, then set model = null and nimContainer = null so
ports/GPU resources are freed for the next onboarding attempt.
bin/lib/nim.js (1)

185-197: ⚠️ Potential issue | 🟠 Major

Add per-request timeout to each health probe.

The loop's outer timeout won't fire if a single curl invocation stalls on a localhost socket. runCapture() does not apply a default timeout—it passes options directly to Node.js's execSync() without enforcing one. Pass timeout: 10000 (or similar) when calling runCapture() to bound each probe:

const result = runCapture(`curl -sf http://localhost:${port}/v1/models`, {
  ignoreError: true,
  timeout: 10000,  // 10 seconds per request
});

Alternatively, add curl's own --max-time 10 flag to the command itself.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 185 - 197, The health-check loop can hang if a
single runCapture call blocks; update the runCapture invocation in the while
loop (the call that builds the curl to http://localhost:${port}/v1/models) to
include a per-request timeout (e.g., add timeout: 10000 to the options passed to
runCapture) or alternatively add curl's --max-time 10 flag to the command string
so each probe respects a 10s cap; keep the existing ignoreError: true and leave
the surrounding loop logic (start, timeout, port) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 185-197: The health-check loop can hang if a single runCapture
call blocks; update the runCapture invocation in the while loop (the call that
builds the curl to http://localhost:${port}/v1/models) to include a per-request
timeout (e.g., add timeout: 10000 to the options passed to runCapture) or
alternatively add curl's --max-time 10 flag to the command string so each probe
respects a 10s cap; keep the existing ignoreError: true and leave the
surrounding loop logic (start, timeout, port) unchanged.

In `@bin/lib/onboard.js`:
- Around line 474-513: Before pulling/starting the local NIM container, check
that localhost:8000 is free and fail fast with a clear error if it is occupied:
call a port-check helper (or implement a quick TCP bind probe) prior to
nim.pullNimImage and nim.startNimContainer, and if the probe shows the port in
use print a descriptive error and process.exit(1). Add this check in the same
branch handling selected.key === "nim" right before the lines that log "Pulling
NIM image..." and reference nim.pullNimImage, nim.startNimContainer, and
nim.waitForNimHealth so the start sequence only runs when port 8000 is
available.
- Around line 512-517: When nim.waitForNimHealth() fails, the code currently
nulls model and nimContainer but leaves the container running; update the
failure branch to stop and remove the NIM container before falling back: call
the appropriate teardown on nimContainer (e.g., nimContainer.stop() and
nimContainer.remove() or nim.stopContainer()/nim.teardownContainer() if such
helpers exist), handle any errors from those calls, then set model = null and
nimContainer = null so ports/GPU resources are freed for the next onboarding
attempt.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 84e5cbf7-f697-45dd-bc58-2e0a7a6a9795

📥 Commits

Reviewing files that changed from the base of the PR and between cadbc5f and e9fdff4.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • bin/lib/onboard.js

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/lib/nim.js (1)

213-238: ⚠️ Potential issue | 🟡 Minor

Hardcoded port in health check may cause incorrect status for non-default ports.

nimStatus() always checks health on port 8000 (line 229), but startNimContainer() accepts a custom port parameter. If a NIM container is started on a different port, the status check will incorrectly report it as unhealthy.

Consider either:

  1. Storing the port in the registry alongside nimContainer, or
  2. Accepting port as a parameter to nimStatus().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 213 - 238, nimStatus currently hardcodes port
8000 when curling the health endpoint, so containers started with
startNimContainer(port) can be mis-reported as unhealthy; modify nimStatus to
accept an optional port parameter (nimStatus(sandboxName, port)) or read the
port from wherever nimContainer/registry stores it, then use that port in the
curl health check instead of 8000; update all callers (e.g., any place that
queries nimStatus after startNimContainer) to pass the actual port or ensure the
registry entry for nimContainer includes the port and nimStatus reads it before
performing the curl; keep the existing return shape and fall back to 8000 only
if no port is available.
🧹 Nitpick comments (2)
bin/lib/onboard.js (1)

323-323: Consider extracting shellQuote to a shared utility.

The shellQuote helper is defined identically in three places:

  • bin/lib/nim.js:163 (inside startNimContainer)
  • bin/lib/onboard.js:323 (inside createSandbox)
  • bin/lib/onboard.js:578 (inside setupInference)

Extracting this to a shared module (e.g., bin/lib/runner.js or a new bin/lib/shell.js) would reduce duplication and ensure consistent escaping behavior.

♻️ Example extraction

In a shared module (e.g., bin/lib/shell.js):

/** Escape a value for safe interpolation into a POSIX shell command. */
const shellQuote = (value) => `'${String(value).replace(/'/g, "'\\''")}'`;

module.exports = { shellQuote };

Then import where needed:

+const { shellQuote } = require("./shell");
 // ...
-const shellQuote = (value) => `'${String(value).replace(/'/g, "'\\''")}'`;

Also applies to: 578-578

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` at line 323, Duplicate shell escaping logic is defined
inline in createSandbox, setupInference, and startNimContainer as the shellQuote
helper; extract this into a single shared module (e.g., bin/lib/shell.js or
bin/lib/runner.js) that exports shellQuote, then import and use that exported
shellQuote in createSandbox, setupInference, and startNimContainer to remove the
three identical inline definitions and ensure consistent behavior across
bin/lib/onboard.js and bin/lib/nim.js.
bin/lib/nim.js (1)

180-197: Unused variable interval.

The interval constant (line 182) is defined as 5000 but never used. The sleep on line 196 uses a hardcoded "5" instead.

🔧 Proposed fix
 function waitForNimHealth(port = 8000, timeout = 300) {
   const start = Date.now();
-  const interval = 5000;
+  const intervalSec = 5;
   console.log(`  Waiting for NIM health on port ${port} (timeout: ${timeout}s)...`);
 
   while ((Date.now() - start) / 1000 < timeout) {
     try {
       const result = runCapture(`curl -sf http://localhost:${port}/v1/models`, {
         ignoreError: true,
       });
       if (result) {
         console.log("  NIM is healthy.");
         return true;
       }
     } catch {}
     // Synchronous sleep via spawnSync
-    require("child_process").spawnSync("sleep", ["5"]);
+    require("child_process").spawnSync("sleep", [String(intervalSec)]);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 180 - 197, The function waitForNimHealth defines
an unused constant interval = 5000 but the sleep call uses a hardcoded "5";
replace the hardcoded sleep duration with the interval value (convert ms to
seconds) or remove interval and use a clearly named constant; specifically
update waitForNimHealth to call require("child_process").spawnSync("sleep",
[String(Math.floor(interval / 1000))]) (or set interval to seconds and pass
String(interval)) so the defined interval variable is actually used and the
linter warning is resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 213-238: nimStatus currently hardcodes port 8000 when curling the
health endpoint, so containers started with startNimContainer(port) can be
mis-reported as unhealthy; modify nimStatus to accept an optional port parameter
(nimStatus(sandboxName, port)) or read the port from wherever
nimContainer/registry stores it, then use that port in the curl health check
instead of 8000; update all callers (e.g., any place that queries nimStatus
after startNimContainer) to pass the actual port or ensure the registry entry
for nimContainer includes the port and nimStatus reads it before performing the
curl; keep the existing return shape and fall back to 8000 only if no port is
available.

---

Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 180-197: The function waitForNimHealth defines an unused constant
interval = 5000 but the sleep call uses a hardcoded "5"; replace the hardcoded
sleep duration with the interval value (convert ms to seconds) or remove
interval and use a clearly named constant; specifically update waitForNimHealth
to call require("child_process").spawnSync("sleep", [String(Math.floor(interval
/ 1000))]) (or set interval to seconds and pass String(interval)) so the defined
interval variable is actually used and the linter warning is resolved.

In `@bin/lib/onboard.js`:
- Line 323: Duplicate shell escaping logic is defined inline in createSandbox,
setupInference, and startNimContainer as the shellQuote helper; extract this
into a single shared module (e.g., bin/lib/shell.js or bin/lib/runner.js) that
exports shellQuote, then import and use that exported shellQuote in
createSandbox, setupInference, and startNimContainer to remove the three
identical inline definitions and ensure consistent behavior across
bin/lib/onboard.js and bin/lib/nim.js.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a8cc1add-1a1e-47f9-afca-7af5dd12fc2d

📥 Commits

Reviewing files that changed from the base of the PR and between e9fdff4 and d7da4e7.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • bin/lib/onboard.js

@wscurran wscurran added security Something isn't secure Docker Support for Docker containerization enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. labels Mar 19, 2026
@brianwtaylor brianwtaylor reopened this Mar 19, 2026
@brianwtaylor brianwtaylor changed the title fix: pass NGC_API_KEY to NIM container and sandbox env security: add shell-injection prevention and pass NGC_API_KEY to NIM container Mar 19, 2026
…container

- Add shellQuote() to runner.js for safe shell interpolation
- Pass NGC_API_KEY (with NVIDIA_API_KEY fallback) to NIM docker container,
  fixing issue #210 where containers cannot pull model weights
- Quote CHAT_UI_URL, NVIDIA_API_KEY, and NGC_API_KEY in sandbox env
- Double-quote model names in openshell inference set commands
- Move shellQuote from onboard.js local function to shared runner.js export
- Add 11 tests covering shellQuote, NGC_API_KEY escaping, and env fallback

Intentionally does not touch --credential lines (deferred to #330).

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
@brianwtaylor
Copy link
Copy Markdown
Contributor Author

brianwtaylor commented Mar 20, 2026

repositioned this to follow ericksoa. pending additional validation and review

@brianwtaylor brianwtaylor closed this by deleting the head repository Mar 20, 2026
@wscurran wscurran requested a review from drobison00 March 23, 2026 16:41
@wscurran wscurran added the priority: high Important issue that should be resolved in the next release label Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Docker Support for Docker containerization enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. priority: high Important issue that should be resolved in the next release security Something isn't secure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants