Skip to content

feat: recommend optimal NIM model based on detected GPU VRAM#537

Closed
brianwtaylor wants to merge 2 commits intoNVIDIA:mainfrom
brianwtaylor:test/gpu-detection-spark-paths
Closed

feat: recommend optimal NIM model based on detected GPU VRAM#537
brianwtaylor wants to merge 2 commits intoNVIDIA:mainfrom
brianwtaylor:test/gpu-detection-spark-paths

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 20, 2026

Continuation of closed #142 and #270.

Summary

  • Refactor detectGpu() to accept injected dependencies (DI) for runCapture and platform, making it fully testable without hardware
  • Query GPU name via nvidia-smi --query-gpu=name for display during onboard
  • Add suggestModelsForGpu(gpu) — filters NIM models by VRAM, sorts descending, marks the largest model at ≤90% VRAM as recommended
  • Update setupNim() in onboard to use suggestModelsForGpu() with (recommended) tag in model selection
  • Add JSDoc to all exported functions in nim.js
  • Add 12 tests: 6 for detectGpu (NVIDIA, multi-GPU, DGX Spark, macOS discrete, Apple Silicon, fallback) and 6 for suggestModelsForGpu

Relates to #404 — improves detectGpu() testability.

Test plan

Automated Tests

npm test -- --grep "nim"

Summary by CodeRabbit

  • New Features

    • Intelligent model recommendation: filters and ranks models by GPU VRAM, marks a recommended default, and surfaces it in prompts and non-interactive defaults.
  • Bug Fixes / Improvements

    • Improved GPU detection (includes GPU names, better macOS/Apple Silicon handling, preserves core/VRAM info).
    • More robust health checks with tighter timeouts and configurable ports.
    • Clearer error handling for container/image failures.
    • Stricter sandbox name validation to prevent invalid names.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

📝 Walkthrough

Walkthrough

bin/lib/nim.js adds dependency injection to GPU detection, returns richer GPU metadata (NVIDIA name, DGX Spark unified-memory handling), and introduces suggestModelsForGpu(gpu) to rank/recommend models by VRAM fit; container-name validation, error throwing (instead of exits), configurable nimStatus port, and hardened health-check timeouts were added. Tests extended accordingly.

Changes

Cohort / File(s) Summary
Core GPU detection & helpers
bin/lib/nim.js
detectGpu()detectGpu(opts) accepts injected runCapture and platform; NVIDIA detection now includes name; DGX Spark (GB10) unified-memory handling added; macOS detection may omit cores when absent; new suggestModelsForGpu(gpu) ranks models by VRAM fit and marks one recommended.
Container lifecycle & validation
bin/lib/nim.js
containerName(sandboxName) now validates input (presence, ≤63 chars, lowercase/digits/dashes) and throws Error on invalid names; pullNimImage and startNimContainer throw Error for unknown models (replacing process.exit behavior); nimStatus(sandboxName)nimStatus(sandboxName, port=8000) and health probes add curl --connect-timeout 2 --max-time 3 and use the provided port.
Tests: GPU, suggestions & container names
test/nim.test.js
Expanded tests for containerName validation and injected detectGpu paths (NVIDIA multi‑GPU, DGX GB10, macOS discrete/unified, Apple Silicon, no‑GPU); added suggestModelsForGpu tests (filtering by VRAM, ordering, single recommended within 90% threshold).
Onboarding changes
bin/lib/onboard.js
Preflight logs include gpu.name when present; onboarding now uses nim.suggestModelsForGpu(gpu) and derives the default selection from the model marked recommended; interactive prompts display “(recommended)”.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through logs and counted VRAM bright,

I named the GPUs and nudged models right.
I banished exits, made checks kind and neat,
picked one to recommend — a tasty treat.
Carrot-coded cheers for changes light and spry. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main feature: recommending optimal NIM models based on GPU VRAM detection.
Linked Issues check ✅ Passed All key requirements from #142 are met: detectGpu() accepts injected opts, suggestModelsForGpu() ranks models by VRAM, containerName() validates inputs, error handling replaces process.exit(), nimStatus() accepts port parameter, health probes have timeouts, and comprehensive tests cover all GPU detection paths and model recommendation logic.
Out of Scope Changes check ✅ Passed All changes directly support the core objectives: GPU detection improvements, model recommendation logic, sandbox name validation, error handling hardening, and corresponding test coverage. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
bin/lib/nim.js (3)

135-137: ⚠️ Potential issue | 🟠 Major

Avoid process.exit(1) in library functions.

Hard-exiting here can terminate parent CLI/test processes unexpectedly. Throw an error and let callers decide handling.

🛠️ Proposed fix
   if (!image) {
-    console.error(`  Unknown model: ${model}`);
-    process.exit(1);
+    throw new Error(`Unknown model: ${model}`);
   }

Also applies to: 148-150

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 135 - 137, In bin/lib/nim.js replace the hard
process.exit(1) calls in the "Unknown model" handling (the console.error(` 
Unknown model: ${model}`) branches at the locations around lines showing that
message, including the second occurrence near lines 148-150) with throwing an
Error (e.g. throw new Error(`Unknown model: ${model}`)) so the library surface
returns an exception instead of terminating the process; ensure callers of the
functions that perform this model validation are prepared to catch or propagate
the error.

170-172: ⚠️ Potential issue | 🟠 Major

Add per-request timeout to the health probe curl.

The loop timeout does not help if one curl invocation hangs; this can block the polling loop indefinitely.

⏱️ Proposed fix
-      const result = runCapture(`curl -sf http://localhost:${port}/v1/models`, {
+      const result = runCapture(
+        `curl -sf --connect-timeout 2 --max-time 3 http://localhost:${port}/v1/models`,
+        {
         ignoreError: true,
-      });
+        }
+      );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 170 - 172, The health-check curl call can hang
indefinitely; update the invocation that builds the command passed to runCapture
(the line using runCapture(`curl -sf http://localhost:${port}/v1/models`, {
ignoreError: true })) to include per-request curl timeouts (e.g. add --max-time
<seconds> and optionally --connect-timeout <seconds>) so each curl returns
within a bounded time; keep the call signature to runCapture and preserve
ignoreError: true while tuning timeout values appropriate for the probe.

193-206: ⚠️ Potential issue | 🟠 Major

nimStatus health check is pinned to port 8000.

startNimContainer supports custom ports, but nimStatus always probes localhost:8000, which can report false unhealthy states.

🔧 Proposed fix
-/** `@param` {string} sandboxName `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */
-function nimStatus(sandboxName) {
+/** `@param` {string} sandboxName `@param` {number} [port=8000] `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */
+function nimStatus(sandboxName, port = 8000) {
@@
-      const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, {
+      const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, {
         ignoreError: true,
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 193 - 206, nimStatus currently probes
localhost:8000 regardless of actual mapping, causing false unhealthy reports;
modify nimStatus (which uses containerName and runCapture) to determine the host
port mapped to the container's internal port instead of hardcoding 8000 — e.g.,
after confirming the container is running, call docker inspect (via runCapture)
to read NetworkSettings.Ports for the container returned by
containerName(sandboxName) to extract the HostPort for the container's service
and then use that host port in the curl health check; ensure this logic handles
unmapped ports and falls back to reporting running:false or a safe default
consistent with startNimContainer's port configuration.
🧹 Nitpick comments (1)
test/nim.test.js (1)

160-175: Add one assertion for cores omission when unavailable.

Current tests verify presence of cores, but not the updated contract that the property is omitted (not null) when absent.

✅ Suggested test addition
+    it("omits cores when system_profiler does not report it", () => {
+      const gpu = nim.detectGpu({
+        platform: "darwin",
+        runCapture: mockRunCapture([
+          ["memory.total", new Error("no nvidia-smi")],
+          ["query-gpu=name", new Error("no nvidia-smi")],
+          ["system_profiler", "Chipset Model: Apple M4\n      VRAM (Total): 8 GB"],
+        ]),
+      });
+      assert.equal(gpu.type, "apple");
+      assert.equal("cores" in gpu, false);
+    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nim.test.js` around lines 160 - 175, Add a test asserting that when
Apple GPU detection runs without a "Total Number of Cores" value, the returned
object omits the cores property (i.e., gpu.cores === undefined, not null).
Update or add a test using nim.detectGpu with platform "darwin" and a runCapture
response for "system_profiler" that lacks the "Total Number of Cores" line, then
include an assertion that gpu.cores is undefined (reference nim.detectGpu and
the test in test/nim.test.js).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/nim.js`:
- Around line 10-11: containerName currently interpolates arbitrary sandboxName
into shell strings, creating a command-injection risk; update the
containerName(sandboxName) function to validate and sanitize sandboxName (e.g.,
allow only a strict whitelist such as [a-z0-9-_], or percent-encode/hex-encode
input) and return a safe deterministic string, and/or throw an error on invalid
characters; also ensure callers that build shell commands use the sanitized
containerName (or proper argument-passing APIs) rather than raw sandboxName to
eliminate injection vectors.

---

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 135-137: In bin/lib/nim.js replace the hard process.exit(1) calls
in the "Unknown model" handling (the console.error(`  Unknown model: ${model}`)
branches at the locations around lines showing that message, including the
second occurrence near lines 148-150) with throwing an Error (e.g. throw new
Error(`Unknown model: ${model}`)) so the library surface returns an exception
instead of terminating the process; ensure callers of the functions that perform
this model validation are prepared to catch or propagate the error.
- Around line 170-172: The health-check curl call can hang indefinitely; update
the invocation that builds the command passed to runCapture (the line using
runCapture(`curl -sf http://localhost:${port}/v1/models`, { ignoreError: true
})) to include per-request curl timeouts (e.g. add --max-time <seconds> and
optionally --connect-timeout <seconds>) so each curl returns within a bounded
time; keep the call signature to runCapture and preserve ignoreError: true while
tuning timeout values appropriate for the probe.
- Around line 193-206: nimStatus currently probes localhost:8000 regardless of
actual mapping, causing false unhealthy reports; modify nimStatus (which uses
containerName and runCapture) to determine the host port mapped to the
container's internal port instead of hardcoding 8000 — e.g., after confirming
the container is running, call docker inspect (via runCapture) to read
NetworkSettings.Ports for the container returned by containerName(sandboxName)
to extract the HostPort for the container's service and then use that host port
in the curl health check; ensure this logic handles unmapped ports and falls
back to reporting running:false or a safe default consistent with
startNimContainer's port configuration.

---

Nitpick comments:
In `@test/nim.test.js`:
- Around line 160-175: Add a test asserting that when Apple GPU detection runs
without a "Total Number of Cores" value, the returned object omits the cores
property (i.e., gpu.cores === undefined, not null). Update or add a test using
nim.detectGpu with platform "darwin" and a runCapture response for
"system_profiler" that lacks the "Total Number of Cores" line, then include an
assertion that gpu.cores is undefined (reference nim.detectGpu and the test in
test/nim.test.js).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 061be972-e91f-44a2-a548-fa5c9eb03b62

📥 Commits

Reviewing files that changed from the base of the PR and between 867f27d and 656eba1.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • test/nim.test.js

Comment thread bin/lib/nim.js
@brianwtaylor brianwtaylor force-pushed the test/gpu-detection-spark-paths branch from 656eba1 to 2741afd Compare March 21, 2026 02:36
@brianwtaylor brianwtaylor changed the title test: add GPU detection tests with dependency injection feat: recommend optimal NIM model based on detected GPU VRAM Mar 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/nim.test.js (1)

145-176: Minor inconsistency in mock patterns between similar test cases.

Lines 150 and 166 use different patterns ("name" vs "query-gpu=name") for simulating nvidia-smi failure on macOS. Both work since they throw errors, but using consistent patterns would improve test readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nim.test.js` around lines 145 - 176, The two tests calling nim.detectGpu
use inconsistent runCapture mock keys ("name" in the first test and
"query-gpu=name" in the second) which reduces readability; update one so both
use the same mock pattern (either both "name" or both "query-gpu=name") when
simulating nvidia-smi failures, e.g., change the first test's mockRunCapture
entry to match the second, and keep references to nim.detectGpu and
mockRunCapture unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/nim.test.js`:
- Around line 145-176: The two tests calling nim.detectGpu use inconsistent
runCapture mock keys ("name" in the first test and "query-gpu=name" in the
second) which reduces readability; update one so both use the same mock pattern
(either both "name" or both "query-gpu=name") when simulating nvidia-smi
failures, e.g., change the first test's mockRunCapture entry to match the
second, and keep references to nim.detectGpu and mockRunCapture unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9be487f-c563-4dc7-8655-401ae66daba7

📥 Commits

Reviewing files that changed from the base of the PR and between 656eba1 and 2741afd.

📒 Files selected for processing (3)
  • bin/lib/nim.js
  • bin/lib/onboard.js
  • test/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/lib/nim.js

@brianwtaylor
Copy link
Copy Markdown
Contributor Author

Addressed all CodeRabbit findings in the latest push:

Inline (critical):

  • containerName() now validates sandboxName against /^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$/ and throws on invalid input

Outside-diff fixes (all 3):

  • Replaced process.exit(1) with throw new Error() in pullNimImage and startNimContainer — library functions shouldn't hard-exit
  • Added --connect-timeout 2 --max-time 3 to health probe curl in both waitForNimHealth and nimStatus
  • nimStatus now accepts an optional port parameter (default 8000) instead of hardcoding it

Nitpicks (both):

  • Added test for cores property omission when system_profiler doesn't report core count
  • Normalized mock patterns — both macOS tests now use "query-gpu=name" consistently

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
bin/lib/nim.js (4)

201-218: Unused interval variable.

const interval = 5000; on line 201 is declared but never used. The sleep duration is hardcoded to "5" at line 218. Consider using the variable or removing it.

♻️ Use interval variable
   const interval = 5000;
   console.log(`  Waiting for NIM health on port ${port} (timeout: ${timeout}s)...`);

   while ((Date.now() - start) / 1000 < timeout) {
     try {
       const result = runCapture(
         `curl -sf --connect-timeout 2 --max-time 3 http://localhost:${port}/v1/models`,
         {
           ignoreError: true,
         }
       );
       if (result) {
         console.log("  NIM is healthy.");
         return true;
       }
     } catch {}
     // Synchronous sleep via spawnSync
-    require("child_process").spawnSync("sleep", ["5"]);
+    require("child_process").spawnSync("sleep", [String(interval / 1000)]);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 201 - 218, The local const interval declared
near the NIM health loop is unused; either remove it or use it for the sleep
call and any messaging. Fix by replacing the hardcoded sleep string in the loop
(the spawnSync("sleep", ["5"]) call) with a value derived from interval (e.g.,
spawnSync("sleep", [String(interval / 1000)]) or otherwise convert interval to
seconds), and optionally use interval in the initial status message; the
relevant symbols to update are the interval constant, the spawnSync sleep
invocation, and the surrounding NIM health loop that uses start and runCapture.

160-166: No model recommended when all fitting models exceed 90% VRAM.

If all models that fit within totalMemoryMB require more than 90% of VRAM, no model is marked recommended: true. This may be intentional (conservative), but could leave users without guidance when a model would technically work.

Consider whether to fall back to recommending the smallest fitting model, or document this behavior so callers can handle the no-recommendation case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 160 - 166, The code currently sets recommended
based on threshold = vram * 0.9 and may leave all models un-recommended if every
fitting model exceeds 90% VRAM; update the logic in the block that iterates fits
(the fits.map using variables threshold, recommended and m.minGpuMemoryMB) so
that if after the map no model has recommended === true you mark the single
fitting model with the smallest minGpuMemoryMB as recommended (i.e., find the
min by m.minGpuMemoryMB among fits and set its recommended flag to true), or
alternatively document this behavior for callers if you prefer to keep
conservative behavior.

64-71: perGpuMB assumes homogeneous GPUs in multi-GPU setups.

Line 69 returns perGpuMB: perGpuMB[0], which is only the first GPU's memory. For heterogeneous multi-GPU systems (e.g., mixed A100/A10), this could mislead downstream logic that assumes uniform GPU memory. Consider documenting this assumption or returning the minimum value for conservative VRAM estimation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 64 - 71, The code currently exposes perGpuMB:
perGpuMB[0], assuming homogeneous GPUs; change this to provide a conservative
estimate and avoid misleading consumers by replacing perGpuMB: perGpuMB[0] with
a single conservative value (e.g., the minimum across the perGpuMB array, such
as Math.min(...perGpuMB)), or alternatively return the full perGpuMB array (and
update any consumers) if heterogeneous reporting is desired; update the returned
object from the function that constructs this device info (the object containing
type, name, count, totalMemoryMB, perGpuMB, nimCapable) so perGpuMB reflects the
minimum VRAM or the full array and add a brief inline comment clarifying the
choice.

181-193: Consider validating port parameter.

The port parameter is interpolated into the shell command at line 193 without validation. While the default is 8000 and typical callers pass integers, invalid values (NaN, negative, >65535) would produce malformed Docker commands or unexpected behavior.

🛡️ Optional validation
 function startNimContainer(sandboxName, model, port = 8000) {
+  if (!Number.isInteger(port) || port < 1 || port > 65535) {
+    throw new Error(`Invalid port: ${port}`);
+  }
   const name = containerName(sandboxName);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 181 - 193, The startNimContainer function
interpolates the port directly into a shell command, which can produce malformed
Docker commands if port is non-numeric or out of range; validate the port
parameter (the port variable) before interpolation: coerce/parse it to an
integer (e.g. parseInt), ensure Number.isInteger and that it falls within
1–65535, and either throw a clear Error (or fallback to the default 8000) when
invalid; update the validation near the top of startNimContainer (before using
port in run/docker run) and keep existing symbols containerName,
getImageForModel and run unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 201-218: The local const interval declared near the NIM health
loop is unused; either remove it or use it for the sleep call and any messaging.
Fix by replacing the hardcoded sleep string in the loop (the spawnSync("sleep",
["5"]) call) with a value derived from interval (e.g., spawnSync("sleep",
[String(interval / 1000)]) or otherwise convert interval to seconds), and
optionally use interval in the initial status message; the relevant symbols to
update are the interval constant, the spawnSync sleep invocation, and the
surrounding NIM health loop that uses start and runCapture.
- Around line 160-166: The code currently sets recommended based on threshold =
vram * 0.9 and may leave all models un-recommended if every fitting model
exceeds 90% VRAM; update the logic in the block that iterates fits (the fits.map
using variables threshold, recommended and m.minGpuMemoryMB) so that if after
the map no model has recommended === true you mark the single fitting model with
the smallest minGpuMemoryMB as recommended (i.e., find the min by
m.minGpuMemoryMB among fits and set its recommended flag to true), or
alternatively document this behavior for callers if you prefer to keep
conservative behavior.
- Around line 64-71: The code currently exposes perGpuMB: perGpuMB[0], assuming
homogeneous GPUs; change this to provide a conservative estimate and avoid
misleading consumers by replacing perGpuMB: perGpuMB[0] with a single
conservative value (e.g., the minimum across the perGpuMB array, such as
Math.min(...perGpuMB)), or alternatively return the full perGpuMB array (and
update any consumers) if heterogeneous reporting is desired; update the returned
object from the function that constructs this device info (the object containing
type, name, count, totalMemoryMB, perGpuMB, nimCapable) so perGpuMB reflects the
minimum VRAM or the full array and add a brief inline comment clarifying the
choice.
- Around line 181-193: The startNimContainer function interpolates the port
directly into a shell command, which can produce malformed Docker commands if
port is non-numeric or out of range; validate the port parameter (the port
variable) before interpolation: coerce/parse it to an integer (e.g. parseInt),
ensure Number.isInteger and that it falls within 1–65535, and either throw a
clear Error (or fallback to the default 8000) when invalid; update the
validation near the top of startNimContainer (before using port in run/docker
run) and keep existing symbols containerName, getImageForModel and run
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b6b83c8d-613c-4ab7-8c02-78aefd852c69

📥 Commits

Reviewing files that changed from the base of the PR and between 2741afd and e9f56c9.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • test/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/nim.test.js

- Validate sandboxName in containerName() to prevent shell injection
- Replace process.exit(1) with throw in pullNimImage/startNimContainer
- Add per-request curl timeouts (--connect-timeout 2 --max-time 3) to
  health probes in waitForNimHealth and nimStatus
- Accept port parameter in nimStatus instead of hardcoding 8000
- Add containerName validation tests and cores-omission test
- Normalize mock patterns in macOS GPU detection tests
@brianwtaylor brianwtaylor force-pushed the test/gpu-detection-spark-paths branch from e9f56c9 to d926f45 Compare March 21, 2026 20:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
bin/lib/nim.js (2)

153-167: Consider edge case: no model marked recommended if all exceed 90% threshold.

If a GPU's only fitting models require >90% of VRAM (e.g., 9500MB model on 10000MB GPU), the returned array has no recommended: true entry. Downstream code (e.g., onboard.js finding defaultModelIndex) should handle this gracefully.

This may be intentional as a safety margin, but worth confirming the caller handles an empty recommendation.

#!/bin/bash
# Check how onboard.js handles the case where no model is recommended
rg -n -A5 'suggestModelsForGpu|recommended' bin/lib/onboard.js
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 153 - 167, suggestModelsForGpu can return no
recommended model if every fit exceeds the 0.9 threshold; update
suggestModelsForGpu so after building fits and applying the threshold logic
(variables: fits, threshold, recommended) you check if none were marked
recommended and, if fits is non-empty, mark the highest-capacity fit (first
element after the current sort) as recommended; this guarantees downstream logic
(e.g., onboard.js defaultModelIndex lookup) always finds a recommended model
when any fit exists.

64-71: Multi-GPU: perGpuMB reports only the first GPU's memory.

When multiple GPUs are present with different VRAM sizes (e.g., mixed GPU configurations), perGpuMB: perGpuMB[0] reflects only the first GPU. If downstream code uses perGpuMB for per-GPU model sizing decisions, it may over- or under-estimate capacity on heterogeneous setups.

If homogeneous GPUs are assumed, consider adding a brief comment. Otherwise, returning the minimum or the full array may be more robust.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 64 - 71, The returned object currently sets
perGpuMB to only the first GPU (perGpuMB[0]), which misreports capacity in
mixed-GPU systems; update the return in the object (the block that builds the
NVIDIA info) to return the full perGpuMB array (or alternatively return the
minimum value if a single scalar is required by downstream code) and ensure any
downstream consumers expect an array (or adjust them if you choose the minimum).
Also add a short inline comment near perGpuMB explaining the shape (array of
per-GPU VRAMs) so future readers know this handles heterogeneous GPUs; keep
totalMemoryMB and nimCapable unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 153-167: suggestModelsForGpu can return no recommended model if
every fit exceeds the 0.9 threshold; update suggestModelsForGpu so after
building fits and applying the threshold logic (variables: fits, threshold,
recommended) you check if none were marked recommended and, if fits is
non-empty, mark the highest-capacity fit (first element after the current sort)
as recommended; this guarantees downstream logic (e.g., onboard.js
defaultModelIndex lookup) always finds a recommended model when any fit exists.
- Around line 64-71: The returned object currently sets perGpuMB to only the
first GPU (perGpuMB[0]), which misreports capacity in mixed-GPU systems; update
the return in the object (the block that builds the NVIDIA info) to return the
full perGpuMB array (or alternatively return the minimum value if a single
scalar is required by downstream code) and ensure any downstream consumers
expect an array (or adjust them if you choose the minimum). Also add a short
inline comment near perGpuMB explaining the shape (array of per-GPU VRAMs) so
future readers know this handles heterogeneous GPUs; keep totalMemoryMB and
nimCapable unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b550f299-260e-4950-ac5f-da5abb926b26

📥 Commits

Reviewing files that changed from the base of the PR and between e9f56c9 and d926f45.

📒 Files selected for processing (3)
  • bin/lib/nim.js
  • bin/lib/onboard.js
  • test/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (2)
  • bin/lib/onboard.js
  • test/nim.test.js

@wscurran wscurran added enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. priority: medium Issue that should be addressed in upcoming releases labels Mar 23, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Thanks for submitting this proposed feature to recommend optimal NIM models based on detected GPU VRAM, which could help improve the performance and usability of NemoClaw for users with different GPU configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. priority: medium Issue that should be addressed in upcoming releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants