Skip to content

feat: recommend optimal NIM model based on detected GPU VRAM#542

Closed
brianwtaylor wants to merge 1 commit intoNVIDIA:mainfrom
brianwtaylor:feat/gpu-model-preselector
Closed

feat: recommend optimal NIM model based on detected GPU VRAM#542
brianwtaylor wants to merge 1 commit intoNVIDIA:mainfrom
brianwtaylor:feat/gpu-model-preselector

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 21, 2026

Summary

  • Add suggestModelsForGpu(gpu) to nim.js — ranks NIM models by fit for detected GPU, marks the largest model using ≤90% VRAM as recommended
  • Integrate model suggestions into the onboarding flow — default selection is now the recommended model (not the largest)
  • Add dependency-injectable GPU detection (detectGpu(opts)) for testability
  • Include GPU name in DGX Spark GB10 fallback path
  • Add comprehensive tests: GPU detection (NVIDIA, Apple Silicon, unified memory, DGX Spark, fallback) and model suggestion (filtering, sorting, recommendation logic)

Motivation

Users frequently select NIM models too large for their GPU, causing OOM failures during startup. This is especially common on DGX Spark where the unified memory architecture makes VRAM limits non-obvious.

Continuation of closed #270. wscurran noted: "Other users are reporting the same experience in misfitting model to gpu."

Related to #404 — Jetson Orin Nano detectGpu() returns null on unified memory; the DI refactor and fallback path improvements here help address this
Related to #511 — Jetson AGX (aarch64) installation failures include GPU detection as a contributing factor

Test plan

Automated Tests

npm test -- --grep "suggestModelsForGpu|detectGpu"

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

📝 Walkthrough

Walkthrough

This PR introduces GPU model recommendation by refactoring GPU detection to support dependency injection, adding GPU name enrichment for NVIDIA devices, creating a new suggestModelsForGpu() function that filters and ranks models by VRAM fit, and integrating model recommendations into the onboarding flow with visual indicators and smart defaults.

Changes

Cohort / File(s) Summary
GPU Detection & Recommendation
bin/lib/nim.js
Refactored detectGpu() to accept options for dependency injection (runCapture, platform), added GPU name field for NVIDIA GPUs and DGX Spark GB10 detection, changed macOS cores handling to conditional spread, and introduced new suggestModelsForGpu(gpu) function that filters models by VRAM, sorts descending, marks one model as recommended (≤90% VRAM utilization), and exports both functions. JSDoc type annotations added for multiple functions.
Onboarding Integration
bin/lib/onboard.js
Updated GPU logging to conditionally include GPU model name, replaced direct model filtering with suggestModelsForGpu() call, added (recommended) tags in interactive model selection display, and modified default model selection logic to use the first recommended model index in both interactive and non-interactive modes instead of always defaulting to first model.
Test Coverage
test/nim.test.js
Added comprehensive test suites for detectGpu(opts) covering NVIDIA detection with name extraction, multi-GPU parsing, DGX Spark GB10 detection (including free -m fallback), macOS discrete and unified memory detection, and error handling; added test suite for suggestModelsForGpu() covering null/non-capable GPU handling, VRAM-based filtering and sorting, and recommended flag assignment constraints.

Sequence Diagram

sequenceDiagram
    participant User
    participant Onboarding as Onboarding Flow
    participant GpuDetect as detectGpu(opts)
    participant ModelSuggest as suggestModelsForGpu()
    participant ModelDB as Model Database

    User->>Onboarding: Start NIM setup
    Onboarding->>GpuDetect: Detect GPU (with injected runCapture)
    GpuDetect->>GpuDetect: Query nvidia-smi for name & memory
    GpuDetect-->>Onboarding: {totalMemoryMB, nimCapable, name?, spark?}
    
    Onboarding->>ModelSuggest: suggestModelsForGpu(gpu)
    ModelSuggest->>ModelDB: Fetch available NIM models
    ModelSuggest->>ModelSuggest: Filter by minGpuMemoryMB ≤ totalMemoryMB
    ModelSuggest->>ModelSuggest: Sort by memory descending
    ModelSuggest->>ModelSuggest: Mark first ≤90% fit as recommended
    ModelSuggest-->>Onboarding: [models] with recommended flags
    
    alt Interactive Mode
        Onboarding->>User: Show models with (recommended) tags
        User->>Onboarding: Select model or use default
    else Non-Interactive Mode
        Onboarding->>Onboarding: Auto-select recommended model
    end
    
    Onboarding-->>User: Model selected & NIM starts
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 With twitching whiskers, I hop through VRAM,
Recommending models—"This one's gram!"
No more mismatches, no crashing sprees,
Just perfect fits floating through memory trees ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main feature: recommending optimal NIM models based on detected GPU VRAM.
Linked Issues check ✅ Passed All linked issue #270 requirements are met: suggestModelsForGpu() implemented, GPU name surfaced, model recommendation integrated into onboarding, and comprehensive tests added.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the linked issue objectives; no unrelated modifications detected outside the scope of GPU-based model recommendation.
Docstring Coverage ✅ Passed Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/nim.test.js (1)

200-246: Consider adding an explicit “no recommended model” edge-case test.

You already verify the positive recommendation path well; adding one case where all fitting models exceed 90% VRAM would lock in fallback semantics and prevent future regressions.

Suggested test addition
   it("recommended model fits within 90% VRAM", () => {
     const vram = 32000;
     const models = nim.suggestModelsForGpu({ totalMemoryMB: vram, nimCapable: true });
     const rec = models.find((m) => m.recommended);
     if (rec) {
       assert.ok(rec.minGpuMemoryMB <= vram * 0.9,
         `recommended model (${rec.minGpuMemoryMB} MB) should fit within 90% of ${vram} MB`);
     }
   });

+  it("can return no recommended model when all fitting models exceed 90% VRAM", () => {
+    const models = nim.suggestModelsForGpu({ totalMemoryMB: 8000, nimCapable: true });
+    const recommended = models.filter((m) => m.recommended);
+    assert.equal(recommended.length, 0);
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nim.test.js` around lines 200 - 246, Add a test that verifies the edge
case where no model should be marked recommended: call nim.suggestModelsForGpu
with nimCapable: true and a totalMemoryMB value chosen so that the function
returns one or more fitting models but every returned model has minGpuMemoryMB >
0.9 * totalMemoryMB, then assert that models.filter(m => m.recommended).length
=== 0 (and optionally assert the returned list is non-empty) to lock in the
fallback semantics when no candidate fits within 90% VRAM; reference the tested
function nim.suggestModelsForGpu and the returned model fields minGpuMemoryMB
and recommended when adding this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/nim.test.js`:
- Around line 200-246: Add a test that verifies the edge case where no model
should be marked recommended: call nim.suggestModelsForGpu with nimCapable: true
and a totalMemoryMB value chosen so that the function returns one or more
fitting models but every returned model has minGpuMemoryMB > 0.9 *
totalMemoryMB, then assert that models.filter(m => m.recommended).length === 0
(and optionally assert the returned list is non-empty) to lock in the fallback
semantics when no candidate fits within 90% VRAM; reference the tested function
nim.suggestModelsForGpu and the returned model fields minGpuMemoryMB and
recommended when adding this test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c8e8f6a-6cf9-47bc-8a66-b4c7638b79ae

📥 Commits

Reviewing files that changed from the base of the PR and between 7d4c8e4 and 2741afd.

📒 Files selected for processing (3)
  • bin/lib/nim.js
  • bin/lib/onboard.js
  • test/nim.test.js

@brianwtaylor
Copy link
Copy Markdown
Contributor Author

Consolidated into #537.

@wscurran wscurran added enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). priority: medium Issue that should be addressed in upcoming releases labels Mar 23, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Thanks for adding the suggestModelsForGpu function to nim.js and integrating model suggestions into the onboarding flow, which should improve the user experience by defaulting to the recommended model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). priority: medium Issue that should be addressed in upcoming releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants