fix: onboard fails on GPUs with insufficient VRAM for local NIM by CalebDeLeeuwMisfits · Pull Request #836 · NVIDIA/NemoClaw

CalebDeLeeuwMisfits · 2026-03-24T21:10:51Z

Problem

nemoclaw onboard fails at sandbox creation with:

GPU sandbox requested, but the active gateway has no allocatable GPUs.

This happens because detectGpu() sets nimCapable: true for any NVIDIA GPU, regardless of VRAM. The --gpu flag is then passed to both openshell gateway start and openshell sandbox create — even when no NIM model fits in the available VRAM and the user selects cloud inference.

The sandbox silently fails to create (the error is piped through awk), but the CLI reports success. Then step 7 (policy presets) fails with "sandbox not found", leaving the user stuck.

Affects all consumer NVIDIA GPUs with <40GB VRAM (RTX 3060, 4060, 4070, etc.) and WSL2 environments where GPU passthrough to nested k3s containers is unavailable.

Fix

bin/lib/nim.js — nimCapable now checks whether at least one NIM model in nim-images.json has minGpuMemoryMB <= totalMemoryMB:

const canRunNim = nimImages.models.some((m) => m.minGpuMemoryMB <= totalMemoryMB);

bin/lib/onboard.js — Added an informational message during preflight when GPU is detected but too small for local NIM:

✓ NVIDIA GPU detected: 1 GPU(s), 8188 MB VRAM
ⓘ GPU VRAM too small for local NIM — will use cloud inference

Testing

Tested on WSL2 (Ubuntu) with an 8GB VRAM RTX GPU where the smallest NIM model requires 8192 MB.

Before: Onboard fails every time at step 3 — sandbox never created, policies can't apply.
After: Onboard completes successfully — gateway starts without --gpu, sandbox creates as CPU-only, cloud API inference works, agent responds.

Summary by CodeRabbit

Bug Fixes
- GPU capability detection now correctly validates VRAM requirements against model specifications, rather than assuming any detected GPU is suitable for local execution.
New Features
- Informational message now displays when a GPU lacks sufficient VRAM, notifying users that cloud inference will be used instead.

…Capable detectGpu() unconditionally set nimCapable=true for any NVIDIA GPU, even when no NIM model fits in the available VRAM. This caused onboard to pass --gpu to gateway and sandbox creation, which fails on systems where GPU passthrough is unavailable (e.g. WSL2) or the GPU has insufficient memory for any local model. Now nimCapable is only true when at least one NIM model in nim-images.json has minGpuMemoryMB <= totalMemoryMB. When false, onboard skips --gpu flags and auto-selects cloud inference, with a clear message explaining why. Tested on an 8GB VRAM GPU (RTX) where the smallest NIM model requires 8192MB — onboard now completes successfully using NVIDIA Cloud API.

coderabbitai · 2026-03-30T18:58:58Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7803fc0a-8b62-4471-996c-d22962b6cf7e

📥 Commits

Reviewing files that changed from the base of the PR and between bc509b9 and 2b5603d.

📒 Files selected for processing (2)

bin/lib/nim.js
bin/lib/onboard.js

📝 Walkthrough

Walkthrough

Updated GPU capability detection in NVIDIA path to conditionally set nimCapable based on whether available VRAM meets model requirements. Added informational logging in preflight flow when local NIM is unsuitable due to insufficient VRAM.

Changes

Cohort / File(s)	Summary
GPU Capability Detection `bin/lib/nim.js`	Modified `detectGpu()` NVIDIA path to compute `nimCapable` by checking if at least one model in `nimImages.models` has sufficient VRAM (`minGpuMemoryMB <= totalMemoryMB`), replacing unconditional `true` assignment.
Preflight Logging `bin/lib/onboard.js`	Added informational log emission in GPU detection flow when `gpu.nimCapable` is false, notifying that local NIM is unsuitable due to insufficient VRAM and cloud inference will be used.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 GPU memory checked with care,
VRAM requirements laid bare,
When models won't fit locally tight,
Cloud inference shines so bright!
A rabbit's code, precise and fair ✨

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…Capable (NVIDIA#836) detectGpu() unconditionally set nimCapable=true for any NVIDIA GPU, even when no NIM model fits in the available VRAM. This caused onboard to pass --gpu to gateway and sandbox creation, which fails on systems where GPU passthrough is unavailable (e.g. WSL2) or the GPU has insufficient memory for any local model. Now nimCapable is only true when at least one NIM model in nim-images.json has minGpuMemoryMB <= totalMemoryMB. When false, onboard skips --gpu flags and auto-selects cloud inference, with a clear message explaining why. Tested on an 8GB VRAM GPU (RTX) where the smallest NIM model requires 8192MB — onboard now completes successfully using NVIDIA Cloud API. Co-authored-by: Caleb de Leeuw <cdeleeuw@users.noreply.github.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

Merge origin/main to resolve conflicts from recent changes: - NVIDIA#1208 core blocker lifecycle regressions - NVIDIA#1200 Prettier formatting - NVIDIA#836 GPU VRAM checks Jetson detection now leverages main's UNIFIED_MEMORY_GPU_TAGS (Orin/Thor/Xavier) with added jetson flag and /proc/device-tree fallback. All 118 tests pass.

Merge origin/main into feat/jetson-orin-nano-support to resolve conflicts from recent changes (NVIDIA#1208, NVIDIA#1200, NVIDIA#836, NVIDIA#1221, NVIDIA#1223). Jetson detection now leverages main's UNIFIED_MEMORY_GPU_TAGS with added jetson flag and /proc/device-tree fallback. All 116 tests pass.

…Capable (#836) detectGpu() unconditionally set nimCapable=true for any NVIDIA GPU, even when no NIM model fits in the available VRAM. This caused onboard to pass --gpu to gateway and sandbox creation, which fails on systems where GPU passthrough is unavailable (e.g. WSL2) or the GPU has insufficient memory for any local model. Now nimCapable is only true when at least one NIM model in nim-images.json has minGpuMemoryMB <= totalMemoryMB. When false, onboard skips --gpu flags and auto-selects cloud inference, with a clear message explaining why. Tested on an 8GB VRAM GPU (RTX) where the smallest NIM model requires 8192MB — onboard now completes successfully using NVIDIA Cloud API. Co-authored-by: Caleb de Leeuw <cdeleeuw@users.noreply.github.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

…Capable (NVIDIA#836) detectGpu() unconditionally set nimCapable=true for any NVIDIA GPU, even when no NIM model fits in the available VRAM. This caused onboard to pass --gpu to gateway and sandbox creation, which fails on systems where GPU passthrough is unavailable (e.g. WSL2) or the GPU has insufficient memory for any local model. Now nimCapable is only true when at least one NIM model in nim-images.json has minGpuMemoryMB <= totalMemoryMB. When false, onboard skips --gpu flags and auto-selects cloud inference, with a clear message explaining why. Tested on an 8GB VRAM GPU (RTX) where the smallest NIM model requires 8192MB — onboard now completes successfully using NVIDIA Cloud API. Co-authored-by: Caleb de Leeuw <cdeleeuw@users.noreply.github.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

cv mentioned this pull request Mar 25, 2026

refactor(cli): shell consolidation, TypeScript migration & oclif #924

Open

wscurran mentioned this pull request Mar 25, 2026

nemoclaw onboard fails on Ubuntu 24.04/Zorin OS 18: k3s cannot pull images from Docker Hub (nested container networking) #455

Closed

2 tasks

cv approved these changes Mar 30, 2026

View reviewed changes

Merge branch 'main' into fix/nimcapable-vram-check

2b5603d

cv merged commit 805a958 into NVIDIA:main Mar 30, 2026
1 check was pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: onboard fails on GPUs with insufficient VRAM for local NIM#836

fix: onboard fails on GPUs with insufficient VRAM for local NIM#836
cv merged 2 commits intoNVIDIA:mainfrom
CalebDeLeeuwMisfits:fix/nimcapable-vram-check

CalebDeLeeuwMisfits commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CalebDeLeeuwMisfits commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CalebDeLeeuwMisfits commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading