feat: recommend optimal NIM model based on detected GPU VRAM#537
feat: recommend optimal NIM model based on detected GPU VRAM#537brianwtaylor wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthrough
Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
bin/lib/nim.js (3)
135-137:⚠️ Potential issue | 🟠 MajorAvoid
process.exit(1)in library functions.Hard-exiting here can terminate parent CLI/test processes unexpectedly. Throw an error and let callers decide handling.
🛠️ Proposed fix
if (!image) { - console.error(` Unknown model: ${model}`); - process.exit(1); + throw new Error(`Unknown model: ${model}`); }Also applies to: 148-150
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 135 - 137, In bin/lib/nim.js replace the hard process.exit(1) calls in the "Unknown model" handling (the console.error(` Unknown model: ${model}`) branches at the locations around lines showing that message, including the second occurrence near lines 148-150) with throwing an Error (e.g. throw new Error(`Unknown model: ${model}`)) so the library surface returns an exception instead of terminating the process; ensure callers of the functions that perform this model validation are prepared to catch or propagate the error.
170-172:⚠️ Potential issue | 🟠 MajorAdd per-request timeout to the health probe curl.
The loop timeout does not help if one
curlinvocation hangs; this can block the polling loop indefinitely.⏱️ Proposed fix
- const result = runCapture(`curl -sf http://localhost:${port}/v1/models`, { + const result = runCapture( + `curl -sf --connect-timeout 2 --max-time 3 http://localhost:${port}/v1/models`, + { ignoreError: true, - }); + } + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 170 - 172, The health-check curl call can hang indefinitely; update the invocation that builds the command passed to runCapture (the line using runCapture(`curl -sf http://localhost:${port}/v1/models`, { ignoreError: true })) to include per-request curl timeouts (e.g. add --max-time <seconds> and optionally --connect-timeout <seconds>) so each curl returns within a bounded time; keep the call signature to runCapture and preserve ignoreError: true while tuning timeout values appropriate for the probe.
193-206:⚠️ Potential issue | 🟠 Major
nimStatushealth check is pinned to port 8000.
startNimContainersupports custom ports, butnimStatusalways probeslocalhost:8000, which can report false unhealthy states.🔧 Proposed fix
-/** `@param` {string} sandboxName `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */ -function nimStatus(sandboxName) { +/** `@param` {string} sandboxName `@param` {number} [port=8000] `@returns` {{running: boolean, healthy?: boolean, container: string, state?: string}} */ +function nimStatus(sandboxName, port = 8000) { @@ - const health = runCapture(`curl -sf http://localhost:8000/v1/models 2>/dev/null`, { + const health = runCapture(`curl -sf http://localhost:${port}/v1/models 2>/dev/null`, { ignoreError: true, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 193 - 206, nimStatus currently probes localhost:8000 regardless of actual mapping, causing false unhealthy reports; modify nimStatus (which uses containerName and runCapture) to determine the host port mapped to the container's internal port instead of hardcoding 8000 — e.g., after confirming the container is running, call docker inspect (via runCapture) to read NetworkSettings.Ports for the container returned by containerName(sandboxName) to extract the HostPort for the container's service and then use that host port in the curl health check; ensure this logic handles unmapped ports and falls back to reporting running:false or a safe default consistent with startNimContainer's port configuration.
🧹 Nitpick comments (1)
test/nim.test.js (1)
160-175: Add one assertion forcoresomission when unavailable.Current tests verify presence of
cores, but not the updated contract that the property is omitted (notnull) when absent.✅ Suggested test addition
+ it("omits cores when system_profiler does not report it", () => { + const gpu = nim.detectGpu({ + platform: "darwin", + runCapture: mockRunCapture([ + ["memory.total", new Error("no nvidia-smi")], + ["query-gpu=name", new Error("no nvidia-smi")], + ["system_profiler", "Chipset Model: Apple M4\n VRAM (Total): 8 GB"], + ]), + }); + assert.equal(gpu.type, "apple"); + assert.equal("cores" in gpu, false); + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nim.test.js` around lines 160 - 175, Add a test asserting that when Apple GPU detection runs without a "Total Number of Cores" value, the returned object omits the cores property (i.e., gpu.cores === undefined, not null). Update or add a test using nim.detectGpu with platform "darwin" and a runCapture response for "system_profiler" that lacks the "Total Number of Cores" line, then include an assertion that gpu.cores is undefined (reference nim.detectGpu and the test in test/nim.test.js).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/nim.js`:
- Around line 10-11: containerName currently interpolates arbitrary sandboxName
into shell strings, creating a command-injection risk; update the
containerName(sandboxName) function to validate and sanitize sandboxName (e.g.,
allow only a strict whitelist such as [a-z0-9-_], or percent-encode/hex-encode
input) and return a safe deterministic string, and/or throw an error on invalid
characters; also ensure callers that build shell commands use the sanitized
containerName (or proper argument-passing APIs) rather than raw sandboxName to
eliminate injection vectors.
---
Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 135-137: In bin/lib/nim.js replace the hard process.exit(1) calls
in the "Unknown model" handling (the console.error(` Unknown model: ${model}`)
branches at the locations around lines showing that message, including the
second occurrence near lines 148-150) with throwing an Error (e.g. throw new
Error(`Unknown model: ${model}`)) so the library surface returns an exception
instead of terminating the process; ensure callers of the functions that perform
this model validation are prepared to catch or propagate the error.
- Around line 170-172: The health-check curl call can hang indefinitely; update
the invocation that builds the command passed to runCapture (the line using
runCapture(`curl -sf http://localhost:${port}/v1/models`, { ignoreError: true
})) to include per-request curl timeouts (e.g. add --max-time <seconds> and
optionally --connect-timeout <seconds>) so each curl returns within a bounded
time; keep the call signature to runCapture and preserve ignoreError: true while
tuning timeout values appropriate for the probe.
- Around line 193-206: nimStatus currently probes localhost:8000 regardless of
actual mapping, causing false unhealthy reports; modify nimStatus (which uses
containerName and runCapture) to determine the host port mapped to the
container's internal port instead of hardcoding 8000 — e.g., after confirming
the container is running, call docker inspect (via runCapture) to read
NetworkSettings.Ports for the container returned by containerName(sandboxName)
to extract the HostPort for the container's service and then use that host port
in the curl health check; ensure this logic handles unmapped ports and falls
back to reporting running:false or a safe default consistent with
startNimContainer's port configuration.
---
Nitpick comments:
In `@test/nim.test.js`:
- Around line 160-175: Add a test asserting that when Apple GPU detection runs
without a "Total Number of Cores" value, the returned object omits the cores
property (i.e., gpu.cores === undefined, not null). Update or add a test using
nim.detectGpu with platform "darwin" and a runCapture response for
"system_profiler" that lacks the "Total Number of Cores" line, then include an
assertion that gpu.cores is undefined (reference nim.detectGpu and the test in
test/nim.test.js).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 061be972-e91f-44a2-a548-fa5c9eb03b62
📒 Files selected for processing (2)
bin/lib/nim.jstest/nim.test.js
656eba1 to
2741afd
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/nim.test.js (1)
145-176: Minor inconsistency in mock patterns between similar test cases.Lines 150 and 166 use different patterns (
"name"vs"query-gpu=name") for simulating nvidia-smi failure on macOS. Both work since they throw errors, but using consistent patterns would improve test readability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nim.test.js` around lines 145 - 176, The two tests calling nim.detectGpu use inconsistent runCapture mock keys ("name" in the first test and "query-gpu=name" in the second) which reduces readability; update one so both use the same mock pattern (either both "name" or both "query-gpu=name") when simulating nvidia-smi failures, e.g., change the first test's mockRunCapture entry to match the second, and keep references to nim.detectGpu and mockRunCapture unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/nim.test.js`:
- Around line 145-176: The two tests calling nim.detectGpu use inconsistent
runCapture mock keys ("name" in the first test and "query-gpu=name" in the
second) which reduces readability; update one so both use the same mock pattern
(either both "name" or both "query-gpu=name") when simulating nvidia-smi
failures, e.g., change the first test's mockRunCapture entry to match the
second, and keep references to nim.detectGpu and mockRunCapture unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f9be487f-c563-4dc7-8655-401ae66daba7
📒 Files selected for processing (3)
bin/lib/nim.jsbin/lib/onboard.jstest/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- bin/lib/nim.js
|
Addressed all CodeRabbit findings in the latest push: Inline (critical):
Outside-diff fixes (all 3):
Nitpicks (both):
|
There was a problem hiding this comment.
🧹 Nitpick comments (4)
bin/lib/nim.js (4)
201-218: Unusedintervalvariable.
const interval = 5000;on line 201 is declared but never used. The sleep duration is hardcoded to"5"at line 218. Consider using the variable or removing it.♻️ Use interval variable
const interval = 5000; console.log(` Waiting for NIM health on port ${port} (timeout: ${timeout}s)...`); while ((Date.now() - start) / 1000 < timeout) { try { const result = runCapture( `curl -sf --connect-timeout 2 --max-time 3 http://localhost:${port}/v1/models`, { ignoreError: true, } ); if (result) { console.log(" NIM is healthy."); return true; } } catch {} // Synchronous sleep via spawnSync - require("child_process").spawnSync("sleep", ["5"]); + require("child_process").spawnSync("sleep", [String(interval / 1000)]); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 201 - 218, The local const interval declared near the NIM health loop is unused; either remove it or use it for the sleep call and any messaging. Fix by replacing the hardcoded sleep string in the loop (the spawnSync("sleep", ["5"]) call) with a value derived from interval (e.g., spawnSync("sleep", [String(interval / 1000)]) or otherwise convert interval to seconds), and optionally use interval in the initial status message; the relevant symbols to update are the interval constant, the spawnSync sleep invocation, and the surrounding NIM health loop that uses start and runCapture.
160-166: No model recommended when all fitting models exceed 90% VRAM.If all models that fit within
totalMemoryMBrequire more than 90% of VRAM, no model is markedrecommended: true. This may be intentional (conservative), but could leave users without guidance when a model would technically work.Consider whether to fall back to recommending the smallest fitting model, or document this behavior so callers can handle the no-recommendation case.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 160 - 166, The code currently sets recommended based on threshold = vram * 0.9 and may leave all models un-recommended if every fitting model exceeds 90% VRAM; update the logic in the block that iterates fits (the fits.map using variables threshold, recommended and m.minGpuMemoryMB) so that if after the map no model has recommended === true you mark the single fitting model with the smallest minGpuMemoryMB as recommended (i.e., find the min by m.minGpuMemoryMB among fits and set its recommended flag to true), or alternatively document this behavior for callers if you prefer to keep conservative behavior.
64-71:perGpuMBassumes homogeneous GPUs in multi-GPU setups.Line 69 returns
perGpuMB: perGpuMB[0], which is only the first GPU's memory. For heterogeneous multi-GPU systems (e.g., mixed A100/A10), this could mislead downstream logic that assumes uniform GPU memory. Consider documenting this assumption or returning the minimum value for conservative VRAM estimation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 64 - 71, The code currently exposes perGpuMB: perGpuMB[0], assuming homogeneous GPUs; change this to provide a conservative estimate and avoid misleading consumers by replacing perGpuMB: perGpuMB[0] with a single conservative value (e.g., the minimum across the perGpuMB array, such as Math.min(...perGpuMB)), or alternatively return the full perGpuMB array (and update any consumers) if heterogeneous reporting is desired; update the returned object from the function that constructs this device info (the object containing type, name, count, totalMemoryMB, perGpuMB, nimCapable) so perGpuMB reflects the minimum VRAM or the full array and add a brief inline comment clarifying the choice.
181-193: Consider validatingportparameter.The
portparameter is interpolated into the shell command at line 193 without validation. While the default is 8000 and typical callers pass integers, invalid values (NaN, negative, >65535) would produce malformed Docker commands or unexpected behavior.🛡️ Optional validation
function startNimContainer(sandboxName, model, port = 8000) { + if (!Number.isInteger(port) || port < 1 || port > 65535) { + throw new Error(`Invalid port: ${port}`); + } const name = containerName(sandboxName);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 181 - 193, The startNimContainer function interpolates the port directly into a shell command, which can produce malformed Docker commands if port is non-numeric or out of range; validate the port parameter (the port variable) before interpolation: coerce/parse it to an integer (e.g. parseInt), ensure Number.isInteger and that it falls within 1–65535, and either throw a clear Error (or fallback to the default 8000) when invalid; update the validation near the top of startNimContainer (before using port in run/docker run) and keep existing symbols containerName, getImageForModel and run unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 201-218: The local const interval declared near the NIM health
loop is unused; either remove it or use it for the sleep call and any messaging.
Fix by replacing the hardcoded sleep string in the loop (the spawnSync("sleep",
["5"]) call) with a value derived from interval (e.g., spawnSync("sleep",
[String(interval / 1000)]) or otherwise convert interval to seconds), and
optionally use interval in the initial status message; the relevant symbols to
update are the interval constant, the spawnSync sleep invocation, and the
surrounding NIM health loop that uses start and runCapture.
- Around line 160-166: The code currently sets recommended based on threshold =
vram * 0.9 and may leave all models un-recommended if every fitting model
exceeds 90% VRAM; update the logic in the block that iterates fits (the fits.map
using variables threshold, recommended and m.minGpuMemoryMB) so that if after
the map no model has recommended === true you mark the single fitting model with
the smallest minGpuMemoryMB as recommended (i.e., find the min by
m.minGpuMemoryMB among fits and set its recommended flag to true), or
alternatively document this behavior for callers if you prefer to keep
conservative behavior.
- Around line 64-71: The code currently exposes perGpuMB: perGpuMB[0], assuming
homogeneous GPUs; change this to provide a conservative estimate and avoid
misleading consumers by replacing perGpuMB: perGpuMB[0] with a single
conservative value (e.g., the minimum across the perGpuMB array, such as
Math.min(...perGpuMB)), or alternatively return the full perGpuMB array (and
update any consumers) if heterogeneous reporting is desired; update the returned
object from the function that constructs this device info (the object containing
type, name, count, totalMemoryMB, perGpuMB, nimCapable) so perGpuMB reflects the
minimum VRAM or the full array and add a brief inline comment clarifying the
choice.
- Around line 181-193: The startNimContainer function interpolates the port
directly into a shell command, which can produce malformed Docker commands if
port is non-numeric or out of range; validate the port parameter (the port
variable) before interpolation: coerce/parse it to an integer (e.g. parseInt),
ensure Number.isInteger and that it falls within 1–65535, and either throw a
clear Error (or fallback to the default 8000) when invalid; update the
validation near the top of startNimContainer (before using port in run/docker
run) and keep existing symbols containerName, getImageForModel and run
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b6b83c8d-613c-4ab7-8c02-78aefd852c69
📒 Files selected for processing (2)
bin/lib/nim.jstest/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/nim.test.js
- Validate sandboxName in containerName() to prevent shell injection - Replace process.exit(1) with throw in pullNimImage/startNimContainer - Add per-request curl timeouts (--connect-timeout 2 --max-time 3) to health probes in waitForNimHealth and nimStatus - Accept port parameter in nimStatus instead of hardcoding 8000 - Add containerName validation tests and cores-omission test - Normalize mock patterns in macOS GPU detection tests
e9f56c9 to
d926f45
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
bin/lib/nim.js (2)
153-167: Consider edge case: no model marked recommended if all exceed 90% threshold.If a GPU's only fitting models require >90% of VRAM (e.g., 9500MB model on 10000MB GPU), the returned array has no
recommended: trueentry. Downstream code (e.g.,onboard.jsfindingdefaultModelIndex) should handle this gracefully.This may be intentional as a safety margin, but worth confirming the caller handles an empty recommendation.
#!/bin/bash # Check how onboard.js handles the case where no model is recommended rg -n -A5 'suggestModelsForGpu|recommended' bin/lib/onboard.js🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 153 - 167, suggestModelsForGpu can return no recommended model if every fit exceeds the 0.9 threshold; update suggestModelsForGpu so after building fits and applying the threshold logic (variables: fits, threshold, recommended) you check if none were marked recommended and, if fits is non-empty, mark the highest-capacity fit (first element after the current sort) as recommended; this guarantees downstream logic (e.g., onboard.js defaultModelIndex lookup) always finds a recommended model when any fit exists.
64-71: Multi-GPU:perGpuMBreports only the first GPU's memory.When multiple GPUs are present with different VRAM sizes (e.g., mixed GPU configurations),
perGpuMB: perGpuMB[0]reflects only the first GPU. If downstream code usesperGpuMBfor per-GPU model sizing decisions, it may over- or under-estimate capacity on heterogeneous setups.If homogeneous GPUs are assumed, consider adding a brief comment. Otherwise, returning the minimum or the full array may be more robust.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/nim.js` around lines 64 - 71, The returned object currently sets perGpuMB to only the first GPU (perGpuMB[0]), which misreports capacity in mixed-GPU systems; update the return in the object (the block that builds the NVIDIA info) to return the full perGpuMB array (or alternatively return the minimum value if a single scalar is required by downstream code) and ensure any downstream consumers expect an array (or adjust them if you choose the minimum). Also add a short inline comment near perGpuMB explaining the shape (array of per-GPU VRAMs) so future readers know this handles heterogeneous GPUs; keep totalMemoryMB and nimCapable unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 153-167: suggestModelsForGpu can return no recommended model if
every fit exceeds the 0.9 threshold; update suggestModelsForGpu so after
building fits and applying the threshold logic (variables: fits, threshold,
recommended) you check if none were marked recommended and, if fits is
non-empty, mark the highest-capacity fit (first element after the current sort)
as recommended; this guarantees downstream logic (e.g., onboard.js
defaultModelIndex lookup) always finds a recommended model when any fit exists.
- Around line 64-71: The returned object currently sets perGpuMB to only the
first GPU (perGpuMB[0]), which misreports capacity in mixed-GPU systems; update
the return in the object (the block that builds the NVIDIA info) to return the
full perGpuMB array (or alternatively return the minimum value if a single
scalar is required by downstream code) and ensure any downstream consumers
expect an array (or adjust them if you choose the minimum). Also add a short
inline comment near perGpuMB explaining the shape (array of per-GPU VRAMs) so
future readers know this handles heterogeneous GPUs; keep totalMemoryMB and
nimCapable unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b550f299-260e-4950-ac5f-da5abb926b26
📒 Files selected for processing (3)
bin/lib/nim.jsbin/lib/onboard.jstest/nim.test.js
🚧 Files skipped from review as they are similar to previous changes (2)
- bin/lib/onboard.js
- test/nim.test.js
|
Thanks for submitting this proposed feature to recommend optimal NIM models based on detected GPU VRAM, which could help improve the performance and usability of NemoClaw for users with different GPU configurations. |
Continuation of closed #142 and #270.
Summary
detectGpu()to accept injected dependencies (DI) forrunCaptureandplatform, making it fully testable without hardwarenvidia-smi --query-gpu=namefor display during onboardsuggestModelsForGpu(gpu)— filters NIM models by VRAM, sorts descending, marks the largest model at ≤90% VRAM as recommendedsetupNim()in onboard to usesuggestModelsForGpu()with(recommended)tag in model selectionnim.jsdetectGpu(NVIDIA, multi-GPU, DGX Spark, macOS discrete, Apple Silicon, fallback) and 6 forsuggestModelsForGpuRelates to #404 — improves
detectGpu()testability.Test plan
Automated Tests
Summary by CodeRabbit
New Features
Bug Fixes / Improvements