test: add GPU detection tests with dependency injection by brianwtaylor · Pull Request #142 · NVIDIA/NemoClaw

brianwtaylor · 2026-03-17T06:10:20Z

Summary

Add optional opts parameter to detectGpu() for dependency injection of runCapture and process.platform — existing callers are unaffected (backward compatible)
Enrich GPU detection output: surface GPU name for NVIDIA, detect DGX Spark GB10 unified memory
Add suggestModelsForGpu() to rank NIM models by VRAM fit and mark the optimal model as recommended
Harden nim.js: validate sandbox names in containerName() to prevent command injection, replace process.exit(1) with thrown errors, accept configurable port in nimStatus()
Add port 8000 preflight check before starting local NIM container
Add per-request timeout to NIM health probe to prevent hung curl from defeating the polling loop

GPU detection previously had zero test coverage. Any regression in detectGpu() silently breaks onboarding for entire hardware classes (DGX Spark, Apple Silicon, multi-GPU) with no CI signal.

Test plan

Automated Tests

node --test test/nim.test.js

28 tests covering GPU detection (8 injected paths: standard NVIDIA, multi-GPU, DGX Spark GB10, Apple Silicon discrete/unified, no-GPU), model recommendation (7 VRAM-based ranking tests), container name sanitization (3 tests), and NIM lifecycle (pull, health, status).

coderabbitai · 2026-03-18T00:08:20Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds dependency injection to GPU detection (detectGpu(opts)), enriches detection output, and centralizes CLI invocation through an injectable runCmd. Also introduces model/image helpers and container lifecycle functions (pull/start/stop/wait/status) in bin/lib/nim.js. Tests added to validate injected GPU detection scenarios.

Changes

Cohort / File(s)	Summary
Nim library (GPU, images, containers) `bin/lib/nim.js`	Refactors GPU detection to `detectGpu(opts)` with injectable `runCmd` and `platform`; richer detection result (type, count, totalMemoryMB, perGpuMB, nimCapable, optional spark/name/cores); adds `containerName`, `getImageForModel`, `listModels`, `pullNimImage`, `startNimContainer`, `waitForNimHealth`, `stopNimContainer`, `nimStatus`; routes CLI calls through injected runner; minor data-shape safety tweaks and docs.
Tests (injected GPU detection) `test/nim.test.js`	Adds "detectGpu (injected)" suite and `mockRunCapture` to simulate CLI outputs across many scenarios (NVIDIA single/multi, DGX Spark GB10, Spark failures, macOS discrete/Apple Silicon, no-GPU, non-GB10 NVIDIA).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Test as "Client / Test"
    participant Detect as "detectGpu(opts)"
    participant Runner as "runCmd (injected)"
    participant System as "OS / External CLI"

    Test->>Detect: call detectGpu({ runCmd, platform })
    Detect->>Runner: request command (e.g., nvidia-smi, free, sysctl, system_profiler)
    Runner->>System: execute shell command
    System-->>Runner: command stdout/stderr or error
    Runner-->>Detect: return captured output / error
    Detect->>Detect: parse outputs, apply fallbacks, determine GPU type/size/cores
    Detect-->>Test: return detection object (or null)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hop through logs and capture shells,
Inject my runner, count the spells,
From CUDA fans to Apple cores,
I wag my whiskers, check the scores,
A tiny drum — containers roar! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test: add GPU detection tests with dependency injection' accurately describes the main change—adding tests for GPU detection with dependency injection support.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/nim.js`:
- Around line 26-33: The JSDoc promises cores?: number but the detector can
return cores: null; update the runtime in the GPU detection function (the
"Detect GPU hardware" function in bin/lib/nim.js that builds the returned
object) so it omits the cores property entirely when the value is unknown
(null/undefined), preserving the optional-number contract, and keep the JSDoc as
cores?: number; ensure the returned object only includes cores when it's a valid
number.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd571467-8df9-4090-91e7-54c2240aed23

📥 Commits

Reviewing files that changed from the base of the PR and between 25ddffd and 02f5f5f.

📒 Files selected for processing (1)

bin/lib/nim.js

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

bin/lib/nim.js (2)

132-137: ⚠️ Potential issue | 🟠 Major

Avoid process.exit(1) inside library functions.

Line 136 and Line 149 terminate the entire process from helper functions, which makes these APIs unsafe for reuse and testing. Throw an error and let the CLI layer decide whether to exit.

🛠️ Suggested fix

 function pullNimImage(model) {
   const image = getImageForModel(model);
   if (!image) {
-    console.error(`  Unknown model: ${model}`);
-    process.exit(1);
+    throw new Error(`Unknown model: ${model}`);
   }
   console.log(`  Pulling NIM image: ${image}`);
   run(`docker pull ${image}`);
   return image;
 }

 function startNimContainer(sandboxName, model, port = 8000) {
   const name = containerName(sandboxName);
   const image = getImageForModel(model);
   if (!image) {
-    console.error(`  Unknown model: ${model}`);
-    process.exit(1);
+    throw new Error(`Unknown model: ${model}`);
   }

Also applies to: 147-150

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 132 - 137, The helper functions (e.g.,
pullNimImage) currently call process.exit(1) when getImageForModel(model)
returns falsy (and similarly at the other spot around lines 147–150); replace
those process.exit calls with throwing a descriptive Error (e.g., throw new
Error(`Unknown model: ${model}`)) so callers can handle termination at the CLI
layer instead of library code, update any other helper functions that use
process.exit to throw instead, and ensure the CLI entrypoint catches these
Errors and exits with the appropriate status.

193-210: ⚠️ Potential issue | 🟠 Major

nimStatus ignores configured port and can report false unhealthy state.

startNimContainer() allows custom ports, but Line 205 always checks localhost:8000. For non-default ports this returns incorrect health.

✅ Suggested fix

-/** `@param` {string} sandboxName `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */
-function nimStatus(sandboxName) {
+/** `@param` {string} sandboxName `@param` {number} [port=8000] `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */
+function nimStatus(sandboxName, port = 8000) {
   const name = containerName(sandboxName);
   try {
@@
     let healthy = false;
     if (state === "running") {
-      const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, {
+      const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, {
         ignoreError: true,
       });
       healthy = !!health;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 193 - 210, nimStatus currently hardcodes
localhost:8000 for the health check which breaks setups where startNimContainer
starts the container on a custom port; update nimStatus to determine the actual
port used for that container (either by reading the same configuration/env used
by startNimContainer or by querying the container inspect output for the
HostPort) and use that port in the curl health check URL instead of :8000; keep
the existing ignoreError behavior and return fields (running, healthy,
container, state) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/nim.js`:
- Around line 10-11: containerName currently returns an interpolated string from
unvalidated sandboxName which can lead to command injection when used in shell
commands (the places where containerName() is later interpolated into commands);
modify containerName to validate and sanitize sandboxName before returning (for
example: enforce a strict allowlist regex like /^[a-z0-9_-]+$/i, throw or reject
inputs that don't match, and optionally normalize or escape input), or perform
this validation at the call sites that build shell commands (where
containerName() is used) to ensure only safe characters are allowed; update any
callers to handle validation errors gracefully.

---

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 132-137: The helper functions (e.g., pullNimImage) currently call
process.exit(1) when getImageForModel(model) returns falsy (and similarly at the
other spot around lines 147–150); replace those process.exit calls with throwing
a descriptive Error (e.g., throw new Error(`Unknown model: ${model}`)) so
callers can handle termination at the CLI layer instead of library code, update
any other helper functions that use process.exit to throw instead, and ensure
the CLI entrypoint catches these Errors and exits with the appropriate status.
- Around line 193-210: nimStatus currently hardcodes localhost:8000 for the
health check which breaks setups where startNimContainer starts the container on
a custom port; update nimStatus to determine the actual port used for that
container (either by reading the same configuration/env used by
startNimContainer or by querying the container inspect output for the HostPort)
and use that port in the curl health check URL instead of :8000; keep the
existing ignoreError behavior and return fields (running, healthy, container,
state) unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0c8872b6-d201-4dbc-97c1-5901a45b55d8

📥 Commits

Reviewing files that changed from the base of the PR and between dc6f316 and cd21224.

📒 Files selected for processing (1)

bin/lib/nim.js

Add dependency injection to detectGpu() via an optional opts parameter, enabling deterministic tests for all 4 code paths: standard NVIDIA, DGX Spark GB10 unified memory, Apple Silicon, and no-GPU fallback. Signed-off-by: Brian Taylor <brian@briantaylor.xyz> Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>

- Add test for the sysctl hw.memsize fallback when system_profiler reports no VRAM (the actual Apple Silicon code path) - Rename existing Apple test to clarify it covers the discrete VRAM parsing branch - Use more specific mock pattern "query-gpu=name" to avoid substring collisions

Aligns runtime behavior with JSDoc contract (cores?: number). When system_profiler does not report core count, the property is now omitted entirely rather than set to null.

Adds suggestModelsForGpu() that ranks NIM models by VRAM fit and marks the optimal model as recommended. Also surfaces GPU name in NVIDIA detection for better display during onboarding.

- Validate sandboxName in containerName() to prevent command injection via shell metacharacters (regex: /^[a-zA-Z0-9][a-zA-Z0-9_.-]*$/) - Replace process.exit(1) with throw in pullNimImage and startNimContainer so callers can handle errors gracefully - Accept port parameter in nimStatus() instead of hardcoding 8000 - Add tests for sandbox name validation and throw behavior

Verify the NIM port is available before pulling the image and starting the container. Without this check, a port conflict (e.g. an existing vLLM instance) causes an opaque docker failure instead of a clear onboarding error with the blocking process identified.

Without a timeout, a single curl invocation can stall indefinitely on a localhost socket, defeating the outer polling loop's timeout. Add curl --max-time 10 and execSync timeout: 15000 as a belt-and-suspenders bound on each probe iteration.

wscurran · 2026-03-20T22:14:00Z

Thanks for adding optional dependency injection to the GPU detection tests, this will help ensure our tests are more robust and reliable.

brianwtaylor force-pushed the test/gpu-detection-spark-paths branch from b949de7 to 0ee1b3f Compare March 18, 2026 00:08

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Comment thread bin/lib/nim.js

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Comment thread bin/lib/nim.js

brianwtaylor mentioned this pull request Mar 18, 2026

feat: recommend optimal NIM model based on detected GPU VRAM #270

Closed

kagura-agent mentioned this pull request Mar 18, 2026

fix(gpu): detect Jetson Thor and Orin unified memory GPUs #308

Closed

brianwtaylor force-pushed the test/gpu-detection-spark-paths branch from 49ccca8 to 7765ca4 Compare March 18, 2026 20:33

brianwtaylor added 9 commits March 18, 2026 18:10

docs: add JSDoc to detectGpu for docstring coverage check

4e4d7c1

fix: omit cores property when unknown instead of returning null

3f50def

Aligns runtime behavior with JSDoc contract (cores?: number). When system_profiler does not report core count, the property is now omitted entirely rather than set to null.

docs: add JSDoc to all nim.js functions for docstring coverage

4cfea55

feat(nim): add GPU model pre-selector with recommended model

6f74978

Adds suggestModelsForGpu() that ranks NIM models by VRAM fit and marks the optimal model as recommended. Also surfaces GPU name in NVIDIA detection for better display during onboarding.

brianwtaylor force-pushed the test/gpu-detection-spark-paths branch from 413b25a to 8f18ff9 Compare March 19, 2026 01:12

brianwtaylor closed this by deleting the head repository Mar 20, 2026

wscurran added the enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. label Mar 20, 2026

brianwtaylor mentioned this pull request Mar 20, 2026

feat: recommend optimal NIM model based on detected GPU VRAM #537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add GPU detection tests with dependency injection#142

test: add GPU detection tests with dependency injection#142
brianwtaylor wants to merge 9 commits intoNVIDIA:mainfrom
brianwtaylor:test/gpu-detection-spark-paths

brianwtaylor commented Mar 17, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

wscurran commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brianwtaylor commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Automated Tests

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wscurran commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brianwtaylor commented Mar 17, 2026 •

edited

Loading

coderabbitai bot commented Mar 18, 2026 •

edited

Loading