Skip to content

feat: recommend optimal NIM model based on detected GPU VRAM#270

Closed
brianwtaylor wants to merge 7 commits intoNVIDIA:mainfrom
brianwtaylor:feat/gpu-model-preselector
Closed

feat: recommend optimal NIM model based on detected GPU VRAM#270
brianwtaylor wants to merge 7 commits intoNVIDIA:mainfrom
brianwtaylor:feat/gpu-model-preselector

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 18, 2026

Summary

  • Adds suggestModelsForGpu() that ranks NIM models by VRAM fit and marks the optimal model (using <=90% VRAM) as recommended
  • Surfaces GPU name in NVIDIA detection for better onboarding display
  • Integrates model recommendation into onboarding: interactive mode shows "(recommended)" tags, non-interactive mode auto-selects the best fit

Addresses #66 — users select models that don't fit their GPU, leading to pull failures or OOM crashes. This PR guides them to the right model based on detected VRAM.

Test plan

Automated Tests

node --test test/nim.test.js

Tests cover VRAM-based model ranking, recommended flag assignment, edge cases (no GPU, non-nimCapable GPU, exact VRAM fit), and GPU detection with name enrichment.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Introduce dependency-injectable GPU detection, richer GPU metadata, a new model-suggestion API that ranks and flags a recommended model for given GPU capabilities, integrate recommendations into onboarding selection flow, and extend tests to cover detection and suggestion logic.

Changes

Cohort / File(s) Summary
GPU detection & model suggestion
bin/lib/nim.js
Refactored detectGpu to detectGpu(opts) with injectable runCapture and platform; adds richer GPU fields (name, optional cores, spark); added suggestModelsForGpu(gpu) to filter/sort models and mark one recommended; additional JSDoc-wrapped helpers exported.
Onboarding integration
bin/lib/onboard.js
Switched model candidate source to nim.suggestModelsForGpu(gpu); display now includes GPU name when present; interactive listings append (recommended); non-interactive defaults prefer the recommended model.
Tests
test/nim.test.js
Added injected-runCapture tests exercising NVIDIA (single/multi/GB10), DGX Spark handling, macOS discrete and unified-memory detection, no-GPU path, and comprehensive suggestModelsForGpu behavior (filtering, sorting, single recommended model, ≤90% constraint).

Sequence Diagram(s)

sequenceDiagram
  participant Onboard as Onboard CLI
  participant Nim as bin/lib/nim.js
  participant Runner as runCapture/runner
  participant User as User

  Onboard->>Nim: detectGpu(opts with runCapture, platform)
  Nim->>Runner: run platform-specific commands
  Runner-->>Nim: command outputs
  Nim-->>Onboard: gpu object (type,count,totalMemoryMB, name, cores?, spark?)
  Onboard->>Nim: suggestModelsForGpu(gpu)
  Nim-->>Onboard: models[] (one entry marked recommended)
  Onboard->>User: present model choices (default = recommended)
  User-->>Onboard: selected model
  Onboard->>Nim: startNimContainer(selectedModel)
  Nim->>Runner: pull/start container
  Runner-->>Nim: container status
  Nim-->>Onboard: nimStatus
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I sniffed the GPUs beneath the moon,

counted memory, hummed a tune.
I picked a model snug and tight,
flagged the one that fits just right. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title directly and clearly summarizes the main change: introducing GPU-aware model recommendation based on detected VRAM.
Docstring Coverage ✅ Passed Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@brianwtaylor brianwtaylor force-pushed the feat/gpu-model-preselector branch 2 times, most recently from e59d8d0 to 746b69b Compare March 18, 2026 20:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
bin/lib/onboard.js (1)

448-476: ⚠️ Potential issue | 🟠 Major

Use the recommended model as the default selection path.

nim.suggestModelsForGpu(gpu) computes a recommended candidate, but the current defaults still pick models[0] (largest fit) in both interactive and non-interactive flows. That can bypass the intended VRAM headroom and increase startup/OOM failures.

Suggested fix
       } else {
         let sel;
+        const defaultModelIndex = Math.max(0, models.findIndex((m) => m.recommended));
         if (isNonInteractive()) {
           if (requestedModel) {
             sel = models.find((m) => m.name === requestedModel);
             if (!sel) {
               console.error(`  Unsupported NEMOCLAW_MODEL for NIM: ${requestedModel}`);
               process.exit(1);
             }
           } else {
-            sel = models[0];
+            sel = models[defaultModelIndex];
           }
           console.log(`  [non-interactive] NIM model: ${sel.name}`);
         } else {
           console.log("");
           console.log("  Models that fit your GPU:");
           models.forEach((m, i) => {
             const tag = m.recommended ? " (recommended)" : "";
             console.log(`    ${i + 1}) ${m.name} (min ${m.minGpuMemoryMB} MB)${tag}`);
           });
           console.log("");

-          const modelChoice = await prompt(`  Choose model [1]: `);
-          const midx = parseInt(modelChoice || "1", 10) - 1;
-          sel = models[midx] || models[0];
+          const defaultChoice = String(defaultModelIndex + 1);
+          const modelChoice = await prompt(`  Choose model [${defaultChoice}]: `);
+          const midx = parseInt(modelChoice || defaultChoice, 10) - 1;
+          sel = models[midx] || models[defaultModelIndex];
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 448 - 476, The selection logic for NIM
models currently defaults to models[0] in both interactive and non-interactive
flows; change it to prefer the model marked recommended by
nim.suggestModelsForGpu(gpu). Update the non-interactive branch in the sel
assignment so that when requestedModel is not provided it picks models.find(m =>
m.recommended) || models[0]; in the interactive branch, set the default choice
index (used when modelChoice is empty) to the index of the recommended model
(const defaultIndex = models.findIndex(m => m.recommended); use defaultIndex >=
0 ? defaultIndex : 0) and then compute midx relative to that default; keep the
requestedModel validation (models.find) and fallback behavior intact.
bin/lib/nim.js (1)

75-94: ⚠️ Potential issue | 🟡 Minor

Include name in the GB10 fallback return object.

The fallback detects GB10 from nameOutput but drops it from the returned GPU object. This causes NVIDIA name display to be missing in onboarding for Spark systems.

Suggested fix
   try {
     const nameOutput = runCmd(
       "nvidia-smi --query-gpu=name --format=csv,noheader,nounits",
       { ignoreError: true }
     );
     if (nameOutput && nameOutput.includes("GB10")) {
+      const name = nameOutput.split("\n")[0].trim();
       // GB10 has 128GB unified memory shared with Grace CPU — use system RAM
       let totalMemoryMB = 0;
       try {
         const memLine = runCmd("free -m | awk '/Mem:/ {print $2}'", { ignoreError: true });
         if (memLine) totalMemoryMB = parseInt(memLine.trim(), 10) || 0;
       } catch {}
       return {
         type: "nvidia",
+        name,
         count: 1,
         totalMemoryMB,
         perGpuMB: totalMemoryMB,
         nimCapable: true,
         spark: true,
       };
     }
   } catch {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 75 - 94, The GB10 special-case branch detects
the GPU via nameOutput but omits the GPU name in the returned object; update the
return in that branch (the block using runCmd and memLine) to include a name
property set from nameOutput (e.g., trimmed/first-line value) so the GPU name is
preserved for onboarding display (refer to nameOutput, runCmd, memLine in the
nim.js GB10 branch).
🧹 Nitpick comments (1)
bin/lib/nim.js (1)

141-143: Remove the stray JSDoc block above suggestModelsForGpu.

The first JSDoc line describes pullNimImage(model) but is attached to suggestModelsForGpu(gpu), which makes generated docs misleading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 141 - 143, The JSDoc for pullNimImage(model) is
mistakenly placed immediately above suggestModelsForGpu(gpu), causing incorrect
documentation; move the JSDoc block so it directly precedes the pullNimImage
function or delete the stray JSDoc above suggestModelsForGpu. Locate the comment
block and either relocate it to the pullNimImage declaration or remove it
entirely so suggestModelsForGpu only has its correct JSDoc (or none).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 75-94: The GB10 special-case branch detects the GPU via nameOutput
but omits the GPU name in the returned object; update the return in that branch
(the block using runCmd and memLine) to include a name property set from
nameOutput (e.g., trimmed/first-line value) so the GPU name is preserved for
onboarding display (refer to nameOutput, runCmd, memLine in the nim.js GB10
branch).

In `@bin/lib/onboard.js`:
- Around line 448-476: The selection logic for NIM models currently defaults to
models[0] in both interactive and non-interactive flows; change it to prefer the
model marked recommended by nim.suggestModelsForGpu(gpu). Update the
non-interactive branch in the sel assignment so that when requestedModel is not
provided it picks models.find(m => m.recommended) || models[0]; in the
interactive branch, set the default choice index (used when modelChoice is
empty) to the index of the recommended model (const defaultIndex =
models.findIndex(m => m.recommended); use defaultIndex >= 0 ? defaultIndex : 0)
and then compute midx relative to that default; keep the requestedModel
validation (models.find) and fallback behavior intact.

---

Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 141-143: The JSDoc for pullNimImage(model) is mistakenly placed
immediately above suggestModelsForGpu(gpu), causing incorrect documentation;
move the JSDoc block so it directly precedes the pullNimImage function or delete
the stray JSDoc above suggestModelsForGpu. Locate the comment block and either
relocate it to the pullNimImage declaration or remove it entirely so
suggestModelsForGpu only has its correct JSDoc (or none).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ae6ec2a-40d1-4187-b422-7b7a3d97d285

📥 Commits

Reviewing files that changed from the base of the PR and between 2b5febe and 746b69b.

📒 Files selected for processing (3)
  • bin/lib/nim.js
  • bin/lib/onboard.js
  • test/nim.test.js

@wscurran wscurran added Getting Started Use this label to identify setup, installation, or onboarding issues. Local Models Running NemoClaw with local models labels Mar 18, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Good suggestion. Other users are reporting the same experience in misfitting model to gpu.

@wscurran wscurran added the priority: medium Issue that should be addressed in upcoming releases label Mar 18, 2026
Add dependency injection to detectGpu() via an optional opts parameter,
enabling deterministic tests for all 4 code paths: standard NVIDIA,
DGX Spark GB10 unified memory, Apple Silicon, and no-GPU fallback.

Signed-off-by: Brian Taylor <brian@briantaylor.xyz>
Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
- Add test for the sysctl hw.memsize fallback when system_profiler
  reports no VRAM (the actual Apple Silicon code path)
- Rename existing Apple test to clarify it covers the discrete VRAM
  parsing branch
- Use more specific mock pattern "query-gpu=name" to avoid substring
  collisions
Aligns runtime behavior with JSDoc contract (cores?: number).
When system_profiler does not report core count, the property is
now omitted entirely rather than set to null.
Adds suggestModelsForGpu() that ranks NIM models by VRAM fit and
marks the optimal model as recommended. Also surfaces GPU name
in NVIDIA detection for better display during onboarding.
…e GB10 name, remove stray JSDoc

- Default interactive and non-interactive model selection to the
  recommended model instead of models[0]
- Include GPU name in the GB10 fallback return so Spark users see their
  GPU name during onboarding
- Remove stray pullNimImage JSDoc attached to suggestModelsForGpu
- Add name assertion to DGX Spark GB10 test
@brianwtaylor brianwtaylor force-pushed the feat/gpu-model-preselector branch from 3d4833b to a990641 Compare March 19, 2026 01:12
@brianwtaylor brianwtaylor closed this by deleting the head repository Mar 20, 2026
@wscurran wscurran removed the priority: medium Issue that should be addressed in upcoming releases label Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Getting Started Use this label to identify setup, installation, or onboarding issues. Local Models Running NemoClaw with local models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants