security: add shell-injection prevention and pass NGC_API_KEY to NIM container#219
security: add shell-injection prevention and pass NGC_API_KEY to NIM container#219brianwtaylor wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
f8b24f5 to
2184f5d
Compare
2184f5d to
ccad222
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Nim management helpers (image lookup, pull, start, stop, health, status), GPU detection, injects escaped Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant Host as Host OS
participant Registry as Image Registry
participant Docker as Docker Engine
participant NIM as NIM Container
CLI->>Host: startNimContainer(sandbox, model, port, env keys)
Host->>Host: detectGpu()
Host->>Registry: getImageForModel(model) / pullNimImage()
Registry-->>Host: image available
Host->>Docker: docker run --name container -p port:8000 -e NGC_API_KEY='...' -e NVIDIA_API_KEY='...' <image>
Docker-->>Host: container started
Host->>NIM: waitForNimHealth(port) (poll /health)
NIM-->>Host: /health -> healthy
Host-->>CLI: return container status (running, healthy, device info)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/nim.test.js (2)
107-115: Fallback tests use static strings rather than environment variable simulation.These tests verify JavaScript's
||operator behavior with literal strings, but they don't actually test the environment variable fallback innim.js. The assertions will always pass because"ngc-primary" || "nvidia-fallback"is evaluated at parse time.If testing fallback behavior is important, consider mocking
process.env.NGC_API_KEYandprocess.env.NVIDIA_API_KEYand calling the actual module function.♻️ Suggested approach for actual fallback testing
it("prefers NGC_API_KEY over NVIDIA_API_KEY in fallback", () => { const originalNgc = process.env.NGC_API_KEY; const originalNvidia = process.env.NVIDIA_API_KEY; try { process.env.NGC_API_KEY = "ngc-primary"; process.env.NVIDIA_API_KEY = "nvidia-fallback"; // Call the actual function or extract the fallback logic to a testable helper const ngcKey = process.env.NGC_API_KEY || process.env.NVIDIA_API_KEY || ""; assert.equal(ngcKey, "ngc-primary"); } finally { process.env.NGC_API_KEY = originalNgc; process.env.NVIDIA_API_KEY = originalNvidia; } });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nim.test.js` around lines 107 - 115, The tests in nim.test.js currently assert JavaScript OR behavior with string literals instead of exercising the real environment-variable fallback; update the two tests ("prefers NGC_API_KEY over NVIDIA_API_KEY in fallback" and "falls back to NVIDIA_API_KEY when NGC_API_KEY empty") to set and restore process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY around calling the actual fallback logic (either by importing the helper from nim.js or evaluating process.env.NGC_API_KEY || process.env.NVIDIA_API_KEY || ""), asserting the returned value; ensure you save original env values and restore them in a finally block so tests are isolated.
78-116: Tests validate the escaping algorithm but don't exercise the actual module code.The tests duplicate the escaping logic inline rather than calling
nim.startNimContainerwith mocked environment variables. This verifies the escaping pattern is correct but doesn't confirm the module applies it properly.Consider adding integration tests that mock
process.envand verify the actualstartNimContainerfunction generates the expected docker command (you could mock therunfunction to capture the command string).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nim.test.js` around lines 78 - 116, Current tests inline-implement escaping instead of exercising the real code; update the tests to mock process.env and call the actual startNimContainer function, then intercept/mock the run function to capture the generated docker command string and assert it contains the correctly single-quoted/escaped NGC_API_KEY behavior. Specifically, set process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY for the different scenarios, call startNimContainer(...), and verify the captured run invocation includes the expected "-e NGC_API_KEY='...'" (with embedded quotes escaped as "'\\''") and that precedence/fallback logic prefers NGC_API_KEY over NVIDIA_API_KEY; restore process.env and restore the mocked run after each test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/nim.test.js`:
- Around line 107-115: The tests in nim.test.js currently assert JavaScript OR
behavior with string literals instead of exercising the real
environment-variable fallback; update the two tests ("prefers NGC_API_KEY over
NVIDIA_API_KEY in fallback" and "falls back to NVIDIA_API_KEY when NGC_API_KEY
empty") to set and restore process.env.NGC_API_KEY and
process.env.NVIDIA_API_KEY around calling the actual fallback logic (either by
importing the helper from nim.js or evaluating process.env.NGC_API_KEY ||
process.env.NVIDIA_API_KEY || ""), asserting the returned value; ensure you save
original env values and restore them in a finally block so tests are isolated.
- Around line 78-116: Current tests inline-implement escaping instead of
exercising the real code; update the tests to mock process.env and call the
actual startNimContainer function, then intercept/mock the run function to
capture the generated docker command string and assert it contains the correctly
single-quoted/escaped NGC_API_KEY behavior. Specifically, set
process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY for the different
scenarios, call startNimContainer(...), and verify the captured run invocation
includes the expected "-e NGC_API_KEY='...'" (with embedded quotes escaped as
"'\\''") and that precedence/fallback logic prefers NGC_API_KEY over
NVIDIA_API_KEY; restore process.env and restore the mocked run after each test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e699047a-0eff-41ee-aa17-7bbd37bc84d7
📒 Files selected for processing (3)
bin/lib/nim.jsbin/lib/onboard.jstest/nim.test.js
427c0c7 to
1f73f13
Compare
|
@coderabbitai Which specific functions are missing docstrings for the Docstring Coverage check? Please list them. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 31923
|
| Function | Signature |
|---|---|
containerName |
containerName(sandboxName) |
getImageForModel |
getImageForModel(modelName) |
listModels |
listModels() |
detectGpu |
detectGpu() |
pullNimImage |
pullNimImage(model) |
waitForNimHealth |
waitForNimHealth(port = 8000, timeout = 300) |
stopNimContainer |
stopNimContainer(sandboxName) |
nimStatus |
nimStatus(sandboxName) |
bin/lib/onboard.js — missing docstrings (11 functions)
| Function | Signature |
|---|---|
step |
step(n, total, msg) |
isDockerRunning |
isDockerRunning() |
isOpenshellInstalled |
isOpenshellInstalled() |
installOpenshell |
installOpenshell() |
preflight |
preflight() |
startGateway |
startGateway(gpu) |
setupNim |
setupNim(sandboxName, gpu) |
setupInference |
setupInference(sandboxName, model, provider) |
setupOpenclaw |
setupOpenclaw(sandboxName) |
setupPolicies |
setupPolicies(sandboxName) |
printDashboard |
printDashboard(sandboxName, model, provider) |
onboard |
onboard() |
Note:
test/nim.test.jsuses onlydescribe/itblocks (no exported functions), so it is typically excluded from docstring coverage checks.
Would you like me to generate JSDoc stubs for all of these, or address them in a follow-up issue?
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
bin/lib/onboard.js (1)
223-235:⚠️ Potential issue | 🔴 CriticalEscape
CHAT_UI_URLbefore shell interpolation.
CHAT_UI_URLis injected into a shell command without quoting/escaping, so a crafted value can break command parsing or inject shell tokens.🔧 Proposed fix
- const chatUiUrl = process.env.CHAT_UI_URL || 'http://127.0.0.1:18789'; - const envArgs = [`CHAT_UI_URL=${chatUiUrl}`]; + const chatUiUrl = process.env.CHAT_UI_URL || "http://127.0.0.1:18789"; + const safeChatUiUrl = chatUiUrl.replace(/'/g, "'\\''"); + const envArgs = [`CHAT_UI_URL='${safeChatUiUrl}'`];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 223 - 235, CHAT_UI_URL is interpolated raw into the shell command which allows shell injection; create a sanitized value like the other keys before building envArgs: compute safeChatUiUrl = (process.env.CHAT_UI_URL || chatUiUrl).replace(/'/g, "'\\''") and push `CHAT_UI_URL='${safeChatUiUrl}'` into envArgs, then use envArgs.join(" ") in the run(...) call (symbols: chatUiUrl, envArgs, run(...)). Ensure you always wrap the escaped value in single quotes when pushing to envArgs.bin/lib/nim.js (1)
213-234:⚠️ Potential issue | 🟡 Minor
nimStatushealth probe is locked to port 8000.If
startNimContaineris called with a non-default port,nimStatuscan report incorrect health on Line 229.🔧 Proposed fix
/** * Check the running status and health of a sandbox's NIM container. * `@param` {string} sandboxName - Sandbox identifier. + * `@param` {number} [port=8000] - Host port where NIM is exposed. * `@returns` {{ running: boolean, healthy?: boolean, container: string, state?: string }} */ -function nimStatus(sandboxName) { +function nimStatus(sandboxName, port = 8000) { const name = containerName(sandboxName); try { @@ let healthy = false; if (state === "running") { - const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, { + const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, { ignoreError: true, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 213 - 234, nimStatus currently hardcodes probing localhost:8000 which breaks health checks when startNimContainer mapped NIM to a different host port; change nimStatus so that when state === "running" it inspects the container's port mapping (via docker inspect) to find the HostPort for the container's 8000/tcp (or the configured NIM service port) and then curl localhost:<HostPort>/v1/models instead of always using 8000. Update the logic in nimStatus (use containerName(sandboxName) to get the container) to parse the docker inspect output for the mapped host port and fall back to 8000 if no mapping is found.
🧹 Nitpick comments (2)
bin/lib/onboard.js (1)
229-232: Align sandboxNGC_API_KEYbehavior with NIM fallback semantics.Right now, the sandbox only gets
NGC_API_KEYwhen that exact env var is present. To matchstartNimContainerbehavior, setNGC_API_KEYfromNVIDIA_API_KEYwhenNGC_API_KEYis absent.♻️ Proposed alignment
- if (process.env.NGC_API_KEY) { - const safeKey = process.env.NGC_API_KEY.replace(/'/g, "'\\''"); + const sandboxNgcKey = process.env.NGC_API_KEY || process.env.NVIDIA_API_KEY; + if (sandboxNgcKey) { + const safeKey = sandboxNgcKey.replace(/'/g, "'\\''"); envArgs.push(`NGC_API_KEY='${safeKey}'`); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 229 - 232, The sandbox only adds NGC_API_KEY to envArgs when process.env.NGC_API_KEY exists, but startNimContainer falls back to process.env.NVIDIA_API_KEY; update the logic around envArgs to use NVIDIA_API_KEY when NGC_API_KEY is not set: check process.env.NGC_API_KEY first, otherwise use process.env.NVIDIA_API_KEY, apply the same safeKey escaping (`replace(/'/g, "'\\''")`) and push the resulting `NGC_API_KEY='...''` string into envArgs so sandbox behavior matches startNimContainer (references: envArgs, process.env.NGC_API_KEY, process.env.NVIDIA_API_KEY).bin/lib/nim.js (1)
150-170: Validateportbefore composing the docker command.
portis documented as numeric but not validated at runtime. A non-integer value can produce an invalid command string.🔧 Proposed validation
function startNimContainer(sandboxName, model, port = 8000) { + if (!Number.isInteger(port) || port < 1 || port > 65535) { + throw new Error(`Invalid port: ${port}`); + } const name = containerName(sandboxName); const image = getImageForModel(model);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 150 - 170, The startNimContainer function currently accepts port without runtime validation; add validation in startNimContainer to coerce port to a number (e.g., Number(port) or parseInt), verify it is an integer and within the valid TCP port range (1–65535), and if invalid log an error (e.g., console.error with the offending value) and call process.exit(1) before composing the docker run command; ensure you use the validated numeric port variable when building envFlags and the docker run string so malformed inputs cannot produce an invalid command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 213-234: nimStatus currently hardcodes probing localhost:8000
which breaks health checks when startNimContainer mapped NIM to a different host
port; change nimStatus so that when state === "running" it inspects the
container's port mapping (via docker inspect) to find the HostPort for the
container's 8000/tcp (or the configured NIM service port) and then curl
localhost:<HostPort>/v1/models instead of always using 8000. Update the logic in
nimStatus (use containerName(sandboxName) to get the container) to parse the
docker inspect output for the mapped host port and fall back to 8000 if no
mapping is found.
In `@bin/lib/onboard.js`:
- Around line 223-235: CHAT_UI_URL is interpolated raw into the shell command
which allows shell injection; create a sanitized value like the other keys
before building envArgs: compute safeChatUiUrl = (process.env.CHAT_UI_URL ||
chatUiUrl).replace(/'/g, "'\\''") and push `CHAT_UI_URL='${safeChatUiUrl}'` into
envArgs, then use envArgs.join(" ") in the run(...) call (symbols: chatUiUrl,
envArgs, run(...)). Ensure you always wrap the escaped value in single quotes
when pushing to envArgs.
---
Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 150-170: The startNimContainer function currently accepts port
without runtime validation; add validation in startNimContainer to coerce port
to a number (e.g., Number(port) or parseInt), verify it is an integer and within
the valid TCP port range (1–65535), and if invalid log an error (e.g.,
console.error with the offending value) and call process.exit(1) before
composing the docker run command; ensure you use the validated numeric port
variable when building envFlags and the docker run string so malformed inputs
cannot produce an invalid command.
In `@bin/lib/onboard.js`:
- Around line 229-232: The sandbox only adds NGC_API_KEY to envArgs when
process.env.NGC_API_KEY exists, but startNimContainer falls back to
process.env.NVIDIA_API_KEY; update the logic around envArgs to use
NVIDIA_API_KEY when NGC_API_KEY is not set: check process.env.NGC_API_KEY first,
otherwise use process.env.NVIDIA_API_KEY, apply the same safeKey escaping
(`replace(/'/g, "'\\''")`) and push the resulting `NGC_API_KEY='...''` string
into envArgs so sandbox behavior matches startNimContainer (references: envArgs,
process.env.NGC_API_KEY, process.env.NVIDIA_API_KEY).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8a2e24a6-7169-4933-a959-950d835a8beb
📒 Files selected for processing (2)
bin/lib/nim.jsbin/lib/onboard.js
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/lib/nim.js (1)
218-238:⚠️ Potential issue | 🟡 MinorHardcoded port 8000 in
nimStatusignores the port parameter used elsewhere.The
waitForNimHealthfunction accepts a configurableportparameter, butnimStatushardcodes port 8000 when checking health. If a NIM container is started on a different port,nimStatuswill check the wrong endpoint.🐛 Proposed fix to use the actual container port
-function nimStatus(sandboxName) { +function nimStatus(sandboxName, port = 8000) { const name = containerName(sandboxName); try { const state = runCapture( `docker inspect --format '{{.State.Status}}' ${name} 2>/dev/null`, { ignoreError: true } ); if (!state) return { running: false, container: name }; let healthy = false; if (state === "running") { - const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, { + const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, { ignoreError: true, }); healthy = !!health; } return { running: state === "running", healthy, container: name, state };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 218 - 238, nimStatus currently hardcodes port 8000 when probing health; change nimStatus to accept an optional port parameter (e.g., nimStatus(sandboxName, port = 8000)) and use that port in the health curl check instead of 8000, then update callers (notably waitForNimHealth) to pass the configured port through; keep the default at 8000 so behavior is unchanged when no port is supplied.
🧹 Nitpick comments (1)
test/nim.test.js (1)
107-115: Tests verify JS||semantics rather than actual env var fallback.These tests use hardcoded string literals (
"ngc-primary" || "nvidia-fallback") which test JavaScript's||operator behavior in isolation. They don't exercise the actualprocess.env.NGC_API_KEY || process.env.NVIDIA_API_KEYlogic innim.js.Consider adding integration tests that mock
process.envto verify the real implementation's fallback behavior, but this is fine for now as a sanity check of the pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nim.test.js` around lines 107 - 115, The tests currently assert JS || behavior using hardcoded literals (the ngcKey variables in the two it blocks) instead of exercising the real fallback in nim.js; update each test to set process.env.NGC_API_KEY and process.env.NVIDIA_API_KEY to the desired values (e.g., set NGC_API_KEY="ngc-primary" and NVIDIA_API_KEY="nvidia-fallback" for the first case, and NGC_API_KEY="" with NVIDIA_API_KEY="nvidia-fallback" for the second), then import/require the nim.js module or call its exported function that reads these env vars so the module evaluates the actual fallback, assert the returned value, and finally restore/delete the modified process.env entries to avoid cross-test pollution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 218-238: nimStatus currently hardcodes port 8000 when probing
health; change nimStatus to accept an optional port parameter (e.g.,
nimStatus(sandboxName, port = 8000)) and use that port in the health curl check
instead of 8000, then update callers (notably waitForNimHealth) to pass the
configured port through; keep the default at 8000 so behavior is unchanged when
no port is supplied.
---
Nitpick comments:
In `@test/nim.test.js`:
- Around line 107-115: The tests currently assert JS || behavior using hardcoded
literals (the ngcKey variables in the two it blocks) instead of exercising the
real fallback in nim.js; update each test to set process.env.NGC_API_KEY and
process.env.NVIDIA_API_KEY to the desired values (e.g., set
NGC_API_KEY="ngc-primary" and NVIDIA_API_KEY="nvidia-fallback" for the first
case, and NGC_API_KEY="" with NVIDIA_API_KEY="nvidia-fallback" for the second),
then import/require the nim.js module or call its exported function that reads
these env vars so the module evaluates the actual fallback, assert the returned
value, and finally restore/delete the modified process.env entries to avoid
cross-test pollution.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 90da016d-0491-431c-8a97-871918a66562
📒 Files selected for processing (3)
bin/lib/nim.jsbin/lib/onboard.jstest/nim.test.js
445376c to
cadbc5f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 255-259: The CHAT_UI_URL value is currently injected into the env
shell command unescaped; use the same single-quote-escaping helper used for
NVIDIA_API_KEY and NGC_API_KEY to wrap CHAT_UI_URL so spaces or shell
metacharacters cannot break or inject into the onboarding command, and apply the
same change to the other env-construction site that mirrors this pattern (the
other block that builds the env string later in the file). Locate the env
construction that references CHAT_UI_URL and replace the raw interpolation with
the helper-escaped version (same helper function name used for
NVIDIA_API_KEY/NGC_API_KEY) in both places.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 853216d5-7c69-466e-aeea-b90e47369994
📒 Files selected for processing (3)
bin/lib/nim.jsbin/lib/onboard.jstest/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/nim.test.js
3354b58 to
d7da4e7
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
bin/lib/onboard.js (2)
474-513:⚠️ Potential issue | 🟠 MajorCheck port 8000 before starting local NIM.
This path always binds NIM to
localhost:8000, but preflight never reserves that port and the later health/provider wiring assumes the bind succeeded. If something is already listening there—especially the vLLM instance you detect earlier—the failure happens deep insidedocker runinstead of as a clear onboarding error.🛠️ Minimal guard
+ const nimPort = 8000; + const portCheck = await checkPortAvailable(nimPort); + if (!portCheck.ok) { + console.error(` Port ${nimPort} is already in use. Stop the existing local model server or choose another provider.`); + process.exit(1); + } + console.log(` Pulling NIM image for ${model}...`); nim.pullNimImage(model); console.log(" Starting NIM container..."); - nimContainer = nim.startNimContainer(sandboxName, model); + nimContainer = nim.startNimContainer(sandboxName, model, nimPort); console.log(" Waiting for NIM to become healthy..."); - if (!nim.waitForNimHealth()) { + if (!nim.waitForNimHealth(nimPort)) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 474 - 513, Before pulling/starting the local NIM container, check that localhost:8000 is free and fail fast with a clear error if it is occupied: call a port-check helper (or implement a quick TCP bind probe) prior to nim.pullNimImage and nim.startNimContainer, and if the probe shows the port in use print a descriptive error and process.exit(1). Add this check in the same branch handling selected.key === "nim" right before the lines that log "Pulling NIM image..." and reference nim.pullNimImage, nim.startNimContainer, and nim.waitForNimHealth so the start sequence only runs when port 8000 is available.
512-517:⚠️ Potential issue | 🟠 MajorTear down the failed NIM container before fallback.
When the health check times out, the code forgets the container but does not stop/remove it. That can keep port 8000 or GPU memory occupied for the next onboarding attempt.
🧹 Minimal cleanup
if (!nim.waitForNimHealth()) { console.error(" NIM failed to start. Falling back to cloud API."); + nim.stopNimContainer(sandboxName); model = null; nimContainer = null;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 512 - 517, When nim.waitForNimHealth() fails, the code currently nulls model and nimContainer but leaves the container running; update the failure branch to stop and remove the NIM container before falling back: call the appropriate teardown on nimContainer (e.g., nimContainer.stop() and nimContainer.remove() or nim.stopContainer()/nim.teardownContainer() if such helpers exist), handle any errors from those calls, then set model = null and nimContainer = null so ports/GPU resources are freed for the next onboarding attempt.bin/lib/nim.js (1)
185-197:⚠️ Potential issue | 🟠 MajorAdd per-request timeout to each health probe.
The loop's outer timeout won't fire if a single
curlinvocation stalls on a localhost socket.runCapture()does not apply a default timeout—it passes options directly to Node.js'sexecSync()without enforcing one. Passtimeout: 10000(or similar) when callingrunCapture()to bound each probe:const result = runCapture(`curl -sf http://localhost:${port}/v1/models`, { ignoreError: true, timeout: 10000, // 10 seconds per request });Alternatively, add curl's own
--max-time 10flag to the command itself.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 185 - 197, The health-check loop can hang if a single runCapture call blocks; update the runCapture invocation in the while loop (the call that builds the curl to http://localhost:${port}/v1/models) to include a per-request timeout (e.g., add timeout: 10000 to the options passed to runCapture) or alternatively add curl's --max-time 10 flag to the command string so each probe respects a 10s cap; keep the existing ignoreError: true and leave the surrounding loop logic (start, timeout, port) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 185-197: The health-check loop can hang if a single runCapture
call blocks; update the runCapture invocation in the while loop (the call that
builds the curl to http://localhost:${port}/v1/models) to include a per-request
timeout (e.g., add timeout: 10000 to the options passed to runCapture) or
alternatively add curl's --max-time 10 flag to the command string so each probe
respects a 10s cap; keep the existing ignoreError: true and leave the
surrounding loop logic (start, timeout, port) unchanged.
In `@bin/lib/onboard.js`:
- Around line 474-513: Before pulling/starting the local NIM container, check
that localhost:8000 is free and fail fast with a clear error if it is occupied:
call a port-check helper (or implement a quick TCP bind probe) prior to
nim.pullNimImage and nim.startNimContainer, and if the probe shows the port in
use print a descriptive error and process.exit(1). Add this check in the same
branch handling selected.key === "nim" right before the lines that log "Pulling
NIM image..." and reference nim.pullNimImage, nim.startNimContainer, and
nim.waitForNimHealth so the start sequence only runs when port 8000 is
available.
- Around line 512-517: When nim.waitForNimHealth() fails, the code currently
nulls model and nimContainer but leaves the container running; update the
failure branch to stop and remove the NIM container before falling back: call
the appropriate teardown on nimContainer (e.g., nimContainer.stop() and
nimContainer.remove() or nim.stopContainer()/nim.teardownContainer() if such
helpers exist), handle any errors from those calls, then set model = null and
nimContainer = null so ports/GPU resources are freed for the next onboarding
attempt.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 84e5cbf7-f697-45dd-bc58-2e0a7a6a9795
📒 Files selected for processing (2)
bin/lib/nim.jsbin/lib/onboard.js
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/lib/nim.js (1)
213-238:⚠️ Potential issue | 🟡 MinorHardcoded port in health check may cause incorrect status for non-default ports.
nimStatus()always checks health on port 8000 (line 229), butstartNimContainer()accepts a customportparameter. If a NIM container is started on a different port, the status check will incorrectly report it as unhealthy.Consider either:
- Storing the port in the registry alongside
nimContainer, or- Accepting
portas a parameter tonimStatus().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 213 - 238, nimStatus currently hardcodes port 8000 when curling the health endpoint, so containers started with startNimContainer(port) can be mis-reported as unhealthy; modify nimStatus to accept an optional port parameter (nimStatus(sandboxName, port)) or read the port from wherever nimContainer/registry stores it, then use that port in the curl health check instead of 8000; update all callers (e.g., any place that queries nimStatus after startNimContainer) to pass the actual port or ensure the registry entry for nimContainer includes the port and nimStatus reads it before performing the curl; keep the existing return shape and fall back to 8000 only if no port is available.
🧹 Nitpick comments (2)
bin/lib/onboard.js (1)
323-323: Consider extractingshellQuoteto a shared utility.The
shellQuotehelper is defined identically in three places:
bin/lib/nim.js:163(insidestartNimContainer)bin/lib/onboard.js:323(insidecreateSandbox)bin/lib/onboard.js:578(insidesetupInference)Extracting this to a shared module (e.g.,
bin/lib/runner.jsor a newbin/lib/shell.js) would reduce duplication and ensure consistent escaping behavior.♻️ Example extraction
In a shared module (e.g.,
bin/lib/shell.js):/** Escape a value for safe interpolation into a POSIX shell command. */ const shellQuote = (value) => `'${String(value).replace(/'/g, "'\\''")}'`; module.exports = { shellQuote };Then import where needed:
+const { shellQuote } = require("./shell"); // ... -const shellQuote = (value) => `'${String(value).replace(/'/g, "'\\''")}'`;Also applies to: 578-578
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` at line 323, Duplicate shell escaping logic is defined inline in createSandbox, setupInference, and startNimContainer as the shellQuote helper; extract this into a single shared module (e.g., bin/lib/shell.js or bin/lib/runner.js) that exports shellQuote, then import and use that exported shellQuote in createSandbox, setupInference, and startNimContainer to remove the three identical inline definitions and ensure consistent behavior across bin/lib/onboard.js and bin/lib/nim.js.bin/lib/nim.js (1)
180-197: Unused variableinterval.The
intervalconstant (line 182) is defined as5000but never used. The sleep on line 196 uses a hardcoded"5"instead.🔧 Proposed fix
function waitForNimHealth(port = 8000, timeout = 300) { const start = Date.now(); - const interval = 5000; + const intervalSec = 5; console.log(` Waiting for NIM health on port ${port} (timeout: ${timeout}s)...`); while ((Date.now() - start) / 1000 < timeout) { try { const result = runCapture(`curl -sf http://localhost:${port}/v1/models`, { ignoreError: true, }); if (result) { console.log(" NIM is healthy."); return true; } } catch {} // Synchronous sleep via spawnSync - require("child_process").spawnSync("sleep", ["5"]); + require("child_process").spawnSync("sleep", [String(intervalSec)]); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 180 - 197, The function waitForNimHealth defines an unused constant interval = 5000 but the sleep call uses a hardcoded "5"; replace the hardcoded sleep duration with the interval value (convert ms to seconds) or remove interval and use a clearly named constant; specifically update waitForNimHealth to call require("child_process").spawnSync("sleep", [String(Math.floor(interval / 1000))]) (or set interval to seconds and pass String(interval)) so the defined interval variable is actually used and the linter warning is resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 213-238: nimStatus currently hardcodes port 8000 when curling the
health endpoint, so containers started with startNimContainer(port) can be
mis-reported as unhealthy; modify nimStatus to accept an optional port parameter
(nimStatus(sandboxName, port)) or read the port from wherever
nimContainer/registry stores it, then use that port in the curl health check
instead of 8000; update all callers (e.g., any place that queries nimStatus
after startNimContainer) to pass the actual port or ensure the registry entry
for nimContainer includes the port and nimStatus reads it before performing the
curl; keep the existing return shape and fall back to 8000 only if no port is
available.
---
Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 180-197: The function waitForNimHealth defines an unused constant
interval = 5000 but the sleep call uses a hardcoded "5"; replace the hardcoded
sleep duration with the interval value (convert ms to seconds) or remove
interval and use a clearly named constant; specifically update waitForNimHealth
to call require("child_process").spawnSync("sleep", [String(Math.floor(interval
/ 1000))]) (or set interval to seconds and pass String(interval)) so the defined
interval variable is actually used and the linter warning is resolved.
In `@bin/lib/onboard.js`:
- Line 323: Duplicate shell escaping logic is defined inline in createSandbox,
setupInference, and startNimContainer as the shellQuote helper; extract this
into a single shared module (e.g., bin/lib/shell.js or bin/lib/runner.js) that
exports shellQuote, then import and use that exported shellQuote in
createSandbox, setupInference, and startNimContainer to remove the three
identical inline definitions and ensure consistent behavior across
bin/lib/onboard.js and bin/lib/nim.js.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a8cc1add-1a1e-47f9-afca-7af5dd12fc2d
📒 Files selected for processing (2)
bin/lib/nim.jsbin/lib/onboard.js
4021948 to
e1789c8
Compare
e1789c8 to
9bdcc30
Compare
9bdcc30 to
37e30aa
Compare
…container - Add shellQuote() to runner.js for safe shell interpolation - Pass NGC_API_KEY (with NVIDIA_API_KEY fallback) to NIM docker container, fixing issue #210 where containers cannot pull model weights - Quote CHAT_UI_URL, NVIDIA_API_KEY, and NGC_API_KEY in sandbox env - Double-quote model names in openshell inference set commands - Move shellQuote from onboard.js local function to shared runner.js export - Add 11 tests covering shellQuote, NGC_API_KEY escaping, and env fallback Intentionally does not touch --credential lines (deferred to #330). Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
37e30aa to
f2de42d
Compare
|
repositioned this to follow ericksoa. pending additional validation and review |
Summary
shellQuote()torunner.js— single-quote wrapper that prevents shell injection in interpolated valuesNGC_API_KEY(withNVIDIA_API_KEYfallback) to NIM docker containers, fixing NGC_API_KEY is not passed to Nim container #210 where containers cannot pull model weights from NGCCHAT_UI_URL,NVIDIA_API_KEY, andNGC_API_KEYwhen passed to sandbox envopenshell inference setcommandsshellQuotefromonboard.jsinto the sharedrunner.jsexportIntentionally does not touch
--credentiallines — that's #330's territory.Test plan
Automated Tests
11 new tests covering:
shellQuote: simple values, embedded quotes, empty strings, metacharacters, type coercionManual verification
git diff upstream/main --statshows 4 files, ~109 lines--credentiallines unchanged (complements fix(security): stop leaking API keys in process args visible via ps aux #330)