Skip to content

fix(gpu): detect Jetson Thor and Orin unified memory GPUs#308

Closed
kagura-agent wants to merge 4 commits intoNVIDIA:mainfrom
kagura-agent:fix/jetson-gpu-detection
Closed

fix(gpu): detect Jetson Thor and Orin unified memory GPUs#308
kagura-agent wants to merge 4 commits intoNVIDIA:mainfrom
kagura-agent:fix/jetson-gpu-detection

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

@kagura-agent kagura-agent commented Mar 18, 2026

Summary

Closes #300

The unified-memory fallback in detectGpu() only checked for "GB10" (DGX Spark). Jetson Thor and Orin report different chip names via nvidia-smi --query-gpu=name, causing GPU detection to fail and fallback to cloud inference.

Changes

bin/lib/nim.js

  • Extract UNIFIED_MEMORY_CHIPS = ["GB10", "Thor", "Orin", "Xavier"] constant
  • Use .some(chip => nameOutput.includes(chip)) instead of .includes("GB10")
  • Add unifiedMemory: true and name properties to the return object
  • Keep spark property true only for GB10 (backward compat)

test/nim-jetson.test.js (new)

  • 11 unit tests covering:
    • Chip name matching for Thor, Orin, GB10, Xavier
    • Real-world nvidia-smi output strings ("Orin (nvgpu)", "NVIDIA Thor", "Orin Nano")
    • Negative cases (RTX 4090, A100, H100 should NOT match)
    • spark flag logic
    • Multi-line name extraction

Testing

All 54 tests pass (43 existing + 11 new):

npm test

Notes

Summary by CodeRabbit

  • New Features

    • Broader detection and handling of unified-memory GPUs across several chip families.
    • Detection now reports unifiedMemory, extracts the primary device name line, and sets a special-case flag only for the specific GB10 family.
    • Improved memory sizing for unified-memory devices using system RAM information.
  • Tests

    • Added tests covering chip-family matching, name extraction, special-case flag behavior, and negative matching.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a55c3f3-f713-429c-8063-79955c9c6ca3

📥 Commits

Reviewing files that changed from the base of the PR and between 89e5cc6 and 752f1af.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • test/nim-jetson.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/nim-jetson.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/lib/nim.js

📝 Walkthrough

Walkthrough

Adds and exports a frozen UNIFIED_MEMORY_CHIPS list (GB10, Thor, Orin, Xavier), extends GPU detection fallback to match those names (case-insensitive), marks unified-memory GPUs and computes memory from system RAM, preserves spark only for GB10, and adds unit tests for matching and name extraction.

Changes

Cohort / File(s) Summary
GPU Detection Logic
bin/lib/nim.js
Add and export UNIFIED_MEMORY_CHIPS (["GB10","Thor","Orin","Xavier"]); extend unified-memory fallback to match any listed chip (case-insensitive); set unifiedMemory: true; compute totalMemoryMB from system RAM for unified devices; set spark only when name contains gb10; adjust returned name to first line of nvidia-smi output.
Test Coverage
test/nim-jetson.test.js
New tests verifying UNIFIED_MEMORY_CHIPS contents, substring matching behavior for Jetson/DGX name variants (GB10, Thor, Orin, Xavier), negative matches for discrete GPUs, spark derivation for GB10 only, and first-line name extraction.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as Node CLI
  participant lib as nim.detectGpu()
  participant nvsmi as nvidia-smi
  participant OS as System RAM

  CLI->>lib: invoke detectGpu()
  lib->>nvsmi: query name & memory
  alt nvidia-smi returns usable memory
    nvsmi-->>lib: memory, name
    lib-->>CLI: GPU descriptor (discrete)
  else nvidia-smi memory is [N/A]
    nvsmi-->>lib: name only
    lib->>lib: match name against UNIFIED_MEMORY_CHIPS
    alt match found
      lib->>OS: read total system RAM
      OS-->>lib: totalMemoryMB
      lib-->>CLI: GPU descriptor (unifiedMemory: true, spark if gb10)
    else no match
      lib-->>CLI: assume no GPU (CPU-only)
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through names — GB10, Thor, Orin, Xavier,
now unified chips are spotted in the field.
From nvidia-smi whispers to system RAM's hum,
no more missed GPUs — I dance, I yield.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: extending GPU detection to recognize Jetson Thor and Orin unified memory GPUs, which is the core objective.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #300 by extending unified-memory GPU detection beyond GB10 to include Thor, Orin, and Xavier chips with case-insensitive matching.
Out of Scope Changes check ✅ Passed All changes are scoped to the GPU detection feature: the constant definition, detection logic updates, and comprehensive test coverage with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
bin/lib/nim.js (1)

59-59: Consider case-insensitive matching for robustness.

The current substring check is case-sensitive. While nvidia-smi typically returns consistent casing, a case-insensitive match would be more defensive against unexpected output variations.

♻️ Suggested change for case-insensitive matching
-    if (nameOutput && UNIFIED_MEMORY_CHIPS.some((chip) => nameOutput.includes(chip))) {
+    if (nameOutput && UNIFIED_MEMORY_CHIPS.some((chip) => nameOutput.toLowerCase().includes(chip.toLowerCase()))) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` at line 59, The substring check against UNIFIED_MEMORY_CHIPS
is currently case-sensitive; update the condition that uses nameOutput and
UNIFIED_MEMORY_CHIPS (the if with nameOutput && UNIFIED_MEMORY_CHIPS.some(...))
to perform a case-insensitive comparison by normalizing both sides (e.g., call
toLowerCase() on nameOutput and on each chip string or use a case-insensitive
regex) before calling includes/some, and keep the existing null/undefined guard
for nameOutput.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/nim.js`:
- Line 59: The substring check against UNIFIED_MEMORY_CHIPS is currently
case-sensitive; update the condition that uses nameOutput and
UNIFIED_MEMORY_CHIPS (the if with nameOutput && UNIFIED_MEMORY_CHIPS.some(...))
to perform a case-insensitive comparison by normalizing both sides (e.g., call
toLowerCase() on nameOutput and on each chip string or use a case-insensitive
regex) before calling includes/some, and keep the existing null/undefined guard
for nameOutput.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4943f10e-5dba-4300-8e12-5ca21b084ca4

📥 Commits

Reviewing files that changed from the base of the PR and between 1e23347 and 0913f8e.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • test/nim-jetson.test.js

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/nim.js`:
- Line 71: The spark flag uses case-sensitive matching
(nameOutput.includes("GB10")) which is inconsistent with the earlier
case-insensitive chip detection that uses toLowerCase(); change spark to perform
a case-insensitive check (e.g., use nameOutput.toLowerCase().includes("gb10"))
so both unified memory detection and the spark flag behave consistently; update
the expression that sets spark accordingly (reference: the spark assignment and
the earlier toLowerCase() usage).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c29191fa-b7e3-4bfe-bfd5-b76d1f87465b

📥 Commits

Reviewing files that changed from the base of the PR and between 0913f8e and a148165.

📒 Files selected for processing (1)
  • bin/lib/nim.js

Comment thread bin/lib/nim.js Outdated
@wscurran wscurran added Platform: AGX Thor/Orin Support for Jetson AGX Thor and Orin enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. labels Mar 18, 2026
@kagura-agent
Copy link
Copy Markdown
Contributor Author

Addressed both CodeRabbit review comments in 89e5cc6:

  1. Line 59 (nitpick) — case-insensitive chip detection was already applied in the initial commit (nameOutput.toLowerCase().includes(chip.toLowerCase())).
  2. Line 71 (spark flag) — now uses nameOutput.toLowerCase().includes("gb10") for consistent case-insensitive matching. Updated test to cover mixed-case variants (gb10, Gb10, etc.).

All 54 tests pass.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
bin/lib/nim.js (1)

26-27: Consider freezing the exported array to prevent accidental mutation.

Since UNIFIED_MEMORY_CHIPS is exported and could be used by external consumers, making it immutable prevents accidental modifications that could affect detection logic.

♻️ Optional: freeze the constant
 // Chip names that use unified memory (VRAM not separately queryable)
-const UNIFIED_MEMORY_CHIPS = ["GB10", "Thor", "Orin", "Xavier"];
+const UNIFIED_MEMORY_CHIPS = Object.freeze(["GB10", "Thor", "Orin", "Xavier"]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 26 - 27, UNIFIED_MEMORY_CHIPS is a mutable
exported array which could be accidentally modified; make it immutable by
freezing it where it's declared (e.g., replace the raw array with
Object.freeze([...]) so UNIFIED_MEMORY_CHIPS is assigned to a frozen array) to
prevent runtime mutations that would break chip detection logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/nim.js`:
- Around line 26-27: UNIFIED_MEMORY_CHIPS is a mutable exported array which
could be accidentally modified; make it immutable by freezing it where it's
declared (e.g., replace the raw array with Object.freeze([...]) so
UNIFIED_MEMORY_CHIPS is assigned to a frozen array) to prevent runtime mutations
that would break chip detection logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a5e3b7ec-584c-4caf-89d7-70256f3df3af

📥 Commits

Reviewing files that changed from the base of the PR and between a148165 and 89e5cc6.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • test/nim-jetson.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/nim-jetson.test.js

Kagura Chen added 4 commits March 19, 2026 13:04
Closes NVIDIA#300

The unified-memory fallback in detectGpu() only checked for "GB10"
(DGX Spark). Jetson Thor and Orin report different chip names, causing
GPU detection to fail and fallback to cloud inference.

Add Thor, Orin, and Xavier to the unified-memory chip name list so all
current Jetson and DGX Spark platforms are detected correctly.
…tion

Address CodeRabbit review: nvidia-smi output casing may vary,
so normalize both sides with toLowerCase() for robustness.
Address CodeRabbit review comments:
- Use toLowerCase() for spark flag (consistent with chip detection)
- Update test to verify case-insensitive GB10 matching
@kagura-agent kagura-agent force-pushed the fix/jetson-gpu-detection branch from 89e5cc6 to 752f1af Compare March 19, 2026 05:06
@kagura-agent
Copy link
Copy Markdown
Contributor Author

Re: latest CodeRabbit review — the Object.freeze() suggestion for UNIFIED_MEMORY_CHIPS is already applied in the current code (line 27: const UNIFIED_MEMORY_CHIPS = Object.freeze(["GB10", "Thor", "Orin", "Xavier"])). ✅

@kagura-agent
Copy link
Copy Markdown
Contributor Author

Closing to reduce open PR count — I had too many PRs open, which adds review burden rather than helping. Happy to resubmit if this fix is still wanted.

mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
kjw3 added a commit that referenced this pull request Mar 31, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes #1191
Fixes #1160
Fixes #1182
Fixes #1162
Related #991

## Notes
- `#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- #1106
- #308
- #95
- #770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
laitingsheng pushed a commit that referenced this pull request Apr 2, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes #1191
Fixes #1160
Fixes #1182
Fixes #1162
Related #991

## Notes
- `#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- #1106
- #308
- #95
- #770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
lakamsani pushed a commit to lakamsani/NemoClaw that referenced this pull request Apr 4, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes NVIDIA#1191
Fixes NVIDIA#1160
Fixes NVIDIA#1182
Fixes NVIDIA#1162
Related NVIDIA#991

## Notes
- `NVIDIA#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- NVIDIA#1106
- NVIDIA#308
- NVIDIA#95
- NVIDIA#770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes NVIDIA#1191
Fixes NVIDIA#1160
Fixes NVIDIA#1182
Fixes NVIDIA#1162
Related NVIDIA#991

## Notes
- `NVIDIA#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- NVIDIA#1106
- NVIDIA#308
- NVIDIA#95
- NVIDIA#770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. Platform: AGX Thor/Orin Support for Jetson AGX Thor and Orin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nemoclaw fails to detect GPU on NVIDIA Jetson Thor and Orin

2 participants