test: add GPU detection tests with dependency injection#142
test: add GPU detection tests with dependency injection#142brianwtaylor wants to merge 9 commits intoNVIDIA:mainfrom
Conversation
b949de7 to
0ee1b3f
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds dependency injection to GPU detection ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Test as "Client / Test"
participant Detect as "detectGpu(opts)"
participant Runner as "runCmd (injected)"
participant System as "OS / External CLI"
Test->>Detect: call detectGpu({ runCmd, platform })
Detect->>Runner: request command (e.g., nvidia-smi, free, sysctl, system_profiler)
Runner->>System: execute shell command
System-->>Runner: command stdout/stderr or error
Runner-->>Detect: return captured output / error
Detect->>Detect: parse outputs, apply fallbacks, determine GPU type/size/cores
Detect-->>Test: return detection object (or null)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/nim.js`:
- Around line 26-33: The JSDoc promises cores?: number but the detector can
return cores: null; update the runtime in the GPU detection function (the
"Detect GPU hardware" function in bin/lib/nim.js that builds the returned
object) so it omits the cores property entirely when the value is unknown
(null/undefined), preserving the optional-number contract, and keep the JSDoc as
cores?: number; ensure the returned object only includes cores when it's a valid
number.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bd571467-8df9-4090-91e7-54c2240aed23
📒 Files selected for processing (1)
bin/lib/nim.js
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
bin/lib/nim.js (2)
132-137:⚠️ Potential issue | 🟠 MajorAvoid
process.exit(1)inside library functions.Line 136 and Line 149 terminate the entire process from helper functions, which makes these APIs unsafe for reuse and testing. Throw an error and let the CLI layer decide whether to exit.
🛠️ Suggested fix
function pullNimImage(model) { const image = getImageForModel(model); if (!image) { - console.error(` Unknown model: ${model}`); - process.exit(1); + throw new Error(`Unknown model: ${model}`); } console.log(` Pulling NIM image: ${image}`); run(`docker pull ${image}`); return image; } function startNimContainer(sandboxName, model, port = 8000) { const name = containerName(sandboxName); const image = getImageForModel(model); if (!image) { - console.error(` Unknown model: ${model}`); - process.exit(1); + throw new Error(`Unknown model: ${model}`); }Also applies to: 147-150
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 132 - 137, The helper functions (e.g., pullNimImage) currently call process.exit(1) when getImageForModel(model) returns falsy (and similarly at the other spot around lines 147–150); replace those process.exit calls with throwing a descriptive Error (e.g., throw new Error(`Unknown model: ${model}`)) so callers can handle termination at the CLI layer instead of library code, update any other helper functions that use process.exit to throw instead, and ensure the CLI entrypoint catches these Errors and exits with the appropriate status.
193-210:⚠️ Potential issue | 🟠 Major
nimStatusignores configured port and can report false unhealthy state.
startNimContainer()allows custom ports, but Line 205 always checkslocalhost:8000. For non-default ports this returns incorrect health.✅ Suggested fix
-/** `@param` {string} sandboxName `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */ -function nimStatus(sandboxName) { +/** `@param` {string} sandboxName `@param` {number} [port=8000] `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */ +function nimStatus(sandboxName, port = 8000) { const name = containerName(sandboxName); try { @@ let healthy = false; if (state === "running") { - const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, { + const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, { ignoreError: true, }); healthy = !!health; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 193 - 210, nimStatus currently hardcodes localhost:8000 for the health check which breaks setups where startNimContainer starts the container on a custom port; update nimStatus to determine the actual port used for that container (either by reading the same configuration/env used by startNimContainer or by querying the container inspect output for the HostPort) and use that port in the curl health check URL instead of :8000; keep the existing ignoreError behavior and return fields (running, healthy, container, state) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/nim.js`:
- Around line 10-11: containerName currently returns an interpolated string from
unvalidated sandboxName which can lead to command injection when used in shell
commands (the places where containerName() is later interpolated into commands);
modify containerName to validate and sanitize sandboxName before returning (for
example: enforce a strict allowlist regex like /^[a-z0-9_-]+$/i, throw or reject
inputs that don't match, and optionally normalize or escape input), or perform
this validation at the call sites that build shell commands (where
containerName() is used) to ensure only safe characters are allowed; update any
callers to handle validation errors gracefully.
---
Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 132-137: The helper functions (e.g., pullNimImage) currently call
process.exit(1) when getImageForModel(model) returns falsy (and similarly at the
other spot around lines 147–150); replace those process.exit calls with throwing
a descriptive Error (e.g., throw new Error(`Unknown model: ${model}`)) so
callers can handle termination at the CLI layer instead of library code, update
any other helper functions that use process.exit to throw instead, and ensure
the CLI entrypoint catches these Errors and exits with the appropriate status.
- Around line 193-210: nimStatus currently hardcodes localhost:8000 for the
health check which breaks setups where startNimContainer starts the container on
a custom port; update nimStatus to determine the actual port used for that
container (either by reading the same configuration/env used by
startNimContainer or by querying the container inspect output for the HostPort)
and use that port in the curl health check URL instead of :8000; keep the
existing ignoreError behavior and return fields (running, healthy, container,
state) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0c8872b6-d201-4dbc-97c1-5901a45b55d8
📒 Files selected for processing (1)
bin/lib/nim.js
49ccca8 to
7765ca4
Compare
Add dependency injection to detectGpu() via an optional opts parameter, enabling deterministic tests for all 4 code paths: standard NVIDIA, DGX Spark GB10 unified memory, Apple Silicon, and no-GPU fallback. Signed-off-by: Brian Taylor <brian@briantaylor.xyz> Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
- Add test for the sysctl hw.memsize fallback when system_profiler reports no VRAM (the actual Apple Silicon code path) - Rename existing Apple test to clarify it covers the discrete VRAM parsing branch - Use more specific mock pattern "query-gpu=name" to avoid substring collisions
Aligns runtime behavior with JSDoc contract (cores?: number). When system_profiler does not report core count, the property is now omitted entirely rather than set to null.
Adds suggestModelsForGpu() that ranks NIM models by VRAM fit and marks the optimal model as recommended. Also surfaces GPU name in NVIDIA detection for better display during onboarding.
- Validate sandboxName in containerName() to prevent command injection via shell metacharacters (regex: /^[a-zA-Z0-9][a-zA-Z0-9_.-]*$/) - Replace process.exit(1) with throw in pullNimImage and startNimContainer so callers can handle errors gracefully - Accept port parameter in nimStatus() instead of hardcoding 8000 - Add tests for sandbox name validation and throw behavior
Verify the NIM port is available before pulling the image and starting the container. Without this check, a port conflict (e.g. an existing vLLM instance) causes an opaque docker failure instead of a clear onboarding error with the blocking process identified.
Without a timeout, a single curl invocation can stall indefinitely on a localhost socket, defeating the outer polling loop's timeout. Add curl --max-time 10 and execSync timeout: 15000 as a belt-and-suspenders bound on each probe iteration.
413b25a to
8f18ff9
Compare
|
Thanks for adding optional dependency injection to the GPU detection tests, this will help ensure our tests are more robust and reliable. |
Summary
optsparameter todetectGpu()for dependency injection ofrunCaptureandprocess.platform— existing callers are unaffected (backward compatible)suggestModelsForGpu()to rank NIM models by VRAM fit and mark the optimal model as recommendedcontainerName()to prevent command injection, replaceprocess.exit(1)with thrown errors, accept configurable port innimStatus()GPU detection previously had zero test coverage. Any regression in
detectGpu()silently breaks onboarding for entire hardware classes (DGX Spark, Apple Silicon, multi-GPU) with no CI signal.Test plan
Automated Tests
28 tests covering GPU detection (8 injected paths: standard NVIDIA, multi-GPU, DGX Spark GB10, Apple Silicon discrete/unified, no-GPU), model recommendation (7 VRAM-based ranking tests), container name sanitization (3 tests), and NIM lifecycle (pull, health, status).