Skip to content

fix: # fix(gateway): replace dead nvidia-device-plugin Helm repo URL#278

Closed
kagura-agent wants to merge 7 commits intoNVIDIA:mainfrom
kagura-agent:fix/241-fix-gateway-replace-dead-nvidi
Closed

fix: # fix(gateway): replace dead nvidia-device-plugin Helm repo URL#278
kagura-agent wants to merge 7 commits intoNVIDIA:mainfrom
kagura-agent:fix/241-fix-gateway-replace-dead-nvidi

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

@kagura-agent kagura-agent commented Mar 18, 2026

Fixes #241

Automated PR via GoGetAJob

Summary by CodeRabbit

  • New Features

    • Improved Ollama onboarding: detects reasoning-only models, offers or auto-creates non-reasoning chat variants, and improves model selection prompts (interactive and non-interactive).
    • Adds a GPU device-plugin repair script and automatic patching during onboarding for affected hosts.
  • Bug Fixes

    • Prevents blank responses from certain Ollama models and addresses GPU allocation failures during onboarding.
  • Documentation

    • Added GPU troubleshooting and Ollama reasoning-model guidance with workarounds.
  • Tests

    • Expanded CLI and onboarding test coverage (connect, logs, config, validation).

Kagura Chen added 3 commits March 18, 2026 11:05
…iant

Ollama reasoning models (e.g., nemotron-3-nano, deepseek-r1, qwq) put
their output into a 'reasoning' field instead of 'content' in the
OpenAI-compatible API response. OpenClaw reads 'content' only, so the
agent's response appears blank.

This commit:

1. Adds reasoning model detection during Ollama onboarding (both the
   legacy JS wizard and the new TS onboard command). Known reasoning
   model patterns are checked against the selected model name.

2. When a reasoning model is detected, the onboarding flow automatically
   creates a non-reasoning chat variant using 'ollama create' with a
   simple Modelfile wrapper. This variant writes to 'content' normally.

3. Lists locally available Ollama models during onboarding instead of
   hardcoding 'nemotron-3-nano', letting users pick from their pulled
   models and see which ones are reasoning models.

4. Documents the limitation and workaround in the inference provider
   switching guide.

Fixes NVIDIA#246
…board

The NVIDIA k8s-device-plugin Helm chart repository at
https://nvidia.github.io/k8s-device-plugin returns 404. This URL is
hardcoded in the gateway container image's k3s manifest, causing GPU
device plugin installation to fail on every GPU-enabled gateway startup.

This commit:

1. Adds scripts/fix-gpu-device-plugin.sh that downloads the chart tgz
   directly from the GitHub release and patches the HelmChart resource
   to use embedded chartContent (base64-encoded tgz), bypassing the
   dead Helm repo entirely.

2. Integrates the fix into the legacy onboard wizard (bin/lib/onboard.js)
   — when a GPU is detected, the patch is applied automatically after
   gateway startup, before sandbox creation.

3. Documents the issue and manual workaround in the GPU deployment guide.

The fix works for both online and air-gapped deployments since the chart
is embedded directly into the HelmChart resource.

Fixes NVIDIA#241
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Detects Ollama "reasoning" models during onboarding, lists local Ollama models, can create non-reasoning chat variants, patches GPU device-plugin HelmChart via a helper script, updates docs with troubleshooting, and adds tests for CLI connect/logs and onboard config/validation.

Changes

Cohort / File(s) Summary
Ollama onboarding & model handling
bin/lib/onboard.js, nemoclaw/src/commands/onboard.ts
Adds KNOWN_REASONING_MODEL_PATTERNS and isReasoningModel; lists local Ollama models; can create a chat-style non-reasoning variant via a temporary Modelfile and ollama create; prefers chat variants where available; logs warnings and provides interactive and non-interactive flows.
GPU device-plugin fix & scripts
scripts/fix-gpu-device-plugin.sh, docs/deployment/deploy-to-remote-gpu.md
Adds a script that patches the nvidia-device-plugin HelmChart by embedding a chart tgz (base64) and polling pod readiness; docs add troubleshooting for "no allocatable GPUs" and reference the workaround.
Docs: Ollama reasoning models
docs/inference/switch-inference-providers.md
New subsection describing Ollama reasoning-model behavior, warning about blank responses, and documenting the chat-variant workaround and example commands; notes onboard detection and guidance.
CLI command tests
nemoclaw/src/commands/connect.test.ts, nemoclaw/src/commands/logs.test.ts
Adds tests mocking child_process (spawn/exec) and PluginLogger; covers spawn behavior, exit codes/errors, flags (follow/lines), sandbox name resolution, and run/state display.
Onboard config & validation tests
nemoclaw/src/onboard/config.test.ts, nemoclaw/src/onboard/validate.test.ts
Adds in-memory-fs-backed tests for load/save/clear onboard config and comprehensive tests for validateApiKey and maskApiKey (HTTP status handling, timeout, error truncation, and masking rules).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Onboard as Onboard CLI
    participant Ollama as Ollama CLI/Daemon
    participant FS as Filesystem
    participant Logger
    User->>Onboard: start onboarding (interactive/non-interactive)
    Onboard->>Ollama: request list of local models
    Ollama-->>Onboard: returns model list (may include reasoning models)
    Onboard->>User: prompt or auto-select model
    alt selected model is reasoning
        Onboard->>FS: write temporary Modelfile for chat variant
        Onboard->>Ollama: `ollama create` chat-variant
        Ollama-->>Onboard: create success/failure
        Onboard->>Logger: log creation result and guidance
        Onboard->>Ollama: select chat-variant if created
    else non-reasoning model selected
        Onboard->>Ollama: select chosen model
    end
    Onboard->>Logger: surface final model choice and next steps
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I sniffed the models late at night,
Found reasoning whispers out of sight.
I wrote a Modelfile, gave ollama create a hop—
A chatty friend awakened, responses won’t stop. ✨🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main fix: replacing the dead NVIDIA device-plugin Helm repo URL in the gateway. It is specific, concise, and accurately reflects the primary change in the changeset.
Linked Issues check ✅ Passed The PR addresses all core objectives from issue #241: replaces the dead Helm repo reference with a direct chart URL, provides a workaround script (fix-gpu-device-plugin.sh), adds documentation and onboarding enhancements, and includes comprehensive test coverage for related features.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the GPU device plugin issue (#241) or supporting Ollama reasoning models during onboarding. No unrelated modifications detected; changes maintain focus on restoring GPU functionality and improving inference provider configuration.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
bin/lib/onboard.js (1)

184-191: Consider scoping the GPU plugin patch to NVIDIA GPUs only.

The patch runs for any detected GPU (including Apple Silicon), but the NVIDIA device plugin is only relevant for NVIDIA GPUs. While ignoreError: true prevents failures, this adds unnecessary work on non-NVIDIA systems.

♻️ Proposed refinement
-  if (gpu) {
+  if (gpu && gpu.type === "nvidia") {
     console.log("  Patching GPU device plugin (workaround for dead Helm repo)...");
     run(`bash "${path.join(SCRIPTS, "fix-gpu-device-plugin.sh")}" 2>&1 || true`, { ignoreError: true });
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 184 - 191, The patch currently runs whenever
`gpu` is truthy, which can trigger on non-NVIDIA hardware; update the logic in
onboard.js so the GPU device-plugin patch only runs for NVIDIA GPUs by checking
for an NVIDIA vendor before invoking `run(...)`. Concretely, detect NVIDIA
presence (e.g., test for `nvidia-smi` on PATH or inspect vendor via existing GPU
detection code), and only call the `run(\`bash "${path.join(SCRIPTS,
"fix-gpu-device-plugin.sh")}" ...\`)` when that NVIDIA check succeeds; keep the
`gpu` guard but add the NVIDIA-specific condition so Apple Silicon or other GPUs
skip the patch. Use the existing symbols `gpu`, `run`, `SCRIPTS`, and
`"fix-gpu-device-plugin.sh"` to locate and modify the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 318-320: The fallback assignment of model = "nemotron-3-nano"
conflicts with the KNOWN_REASONING_MODEL_PATTERNS (e.g., /^nemotron.*nano/i) so
the chosen default is inadvertently classified as a reasoning model; change the
fallback to a non-reasoning model name (or make it configurable) — update the
else branch that sets model to use a different default string that does not
match KNOWN_REASONING_MODEL_PATTERNS or add an explicit override/flag to skip
reasoning-model detection for that fallback (refer to the variable model and the
constant KNOWN_REASONING_MODEL_PATTERNS to locate the logic).
- Around line 55-68: The createChatVariant function builds a shell command by
interpolating modelfile and variantName, creating a command-injection risk;
change it to use child_process.execFileSync (or execFileSync) with explicit
arguments instead of a shell string, pass the Modelfile content via stdin (e.g.,
supply modelfile as the first argument to the spawned process's stdin), and call
ollama with ['create', variantName] so modelfile and variantName are never
shell-interpolated; ensure you preserve the same stdio and timeout options and
return variantName on success or null on caught errors.

In `@docs/inference/switch-inference-providers.md`:
- Around line 84-91: Update the variant name to match onboarding's generated
convention: replace instances of "nemotron-nano-chat" used in the ollama create
command and the openshell instruction with the onboarding-style name
"nemotron-3-nano-chat" so the example uses the same "<base-without-tag>-chat"
pattern; ensure the ollama create line (echo "FROM ...") and the openshell
inference set --model value both reference the new "nemotron-3-nano-chat"
variant.

In `@nemoclaw/src/commands/onboard.ts`:
- Around line 404-410: The code falls back to DEFAULT_MODELS when no models are
pulled, which provides hosted NVIDIA IDs that break local Ollama; change the
branch that builds modelOptions (where DEFAULT_MODELS is mapped and promptSelect
is called to set model) to detect a local Ollama endpoint (e.g., check your
Ollama config/endpoint for localhost or a local socket) and, if local, do NOT
use DEFAULT_MODELS; instead present either an empty/interactive choice that
instructs the user to pull local models or prompt for a manual model ID (use a
text prompt alternative to promptSelect) so model remains valid for local Ollama
rather than defaulting to hosted IDs. Ensure this logic lives alongside the
existing model variable assignment and promptSelect usage so callers of model
behave the same.
- Around line 57-66: The execSync call in createOllamaChatVariant is vulnerable
to command injection because it interpolates baseModel and variantName into a
shell string; change it to call the Ollama CLI without shell interpolation (use
spawn/execFile or execFileSync with argument arrays and pass the model content
via stdin) and/or validate/strictly sanitize baseModel and variantName before
use. Specifically, modify the execSync(`echo '${modelfile}' | ollama create
${variantName}`) usage in createOllamaChatVariant to use a non-shell API (e.g.,
spawn/execFile with args ["create", variantName] and write modelfile to the
child's stdin) and ensure baseModel/variantName are restricted to an expected
pattern before invoking the process.

In `@nemoclaw/src/onboard/config.test.ts`:
- Around line 55-57: The test's beforeEach currently calls vi.resetAllMocks(),
which also resets the custom node:fs mock implementation; change this to
vi.clearAllMocks() so you only clear call history and preserve mock
implementations—update the beforeEach that references vi.resetAllMocks() to call
vi.clearAllMocks() and keep the node:fs mock factory intact.

In `@nemoclaw/src/onboard/validate.test.ts`:
- Around line 39-41: The test defines a realistic-looking API key in the
constant apiKey which triggers secret scanners; replace that value with a
clearly synthetic placeholder (e.g., "test-api-key-0000") so tests still run but
do not resemble real secrets—update the apiKey constant in validate.test.ts (and
any direct references to endpoint/apiKey in the same test) to the synthetic
token.

In `@scripts/fix-gpu-device-plugin.sh`:
- Around line 28-39: The script currently ignores curl failures and swallows
kubectl patch errors; update the curl invocation for CHART_URL to use -f (curl
-fsSL "$CHART_URL" -o "$CHART_TGZ") and immediately check its exit status so the
script exits on download HTTP errors, ensure base64 encoding for CHART_TGZ sets
CHART_B64 but also fails the script if base64 returns non-zero, and remove the
"2>/dev/null || true" suppression from the openshell doctor exec -- kubectl
patch helmchart nvidia-device-plugin call so the patch failure surfaces (or
explicitly test its exit code and exit non-zero with an explanatory message)
rather than continuing with invalid chart content.

---

Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 184-191: The patch currently runs whenever `gpu` is truthy, which
can trigger on non-NVIDIA hardware; update the logic in onboard.js so the GPU
device-plugin patch only runs for NVIDIA GPUs by checking for an NVIDIA vendor
before invoking `run(...)`. Concretely, detect NVIDIA presence (e.g., test for
`nvidia-smi` on PATH or inspect vendor via existing GPU detection code), and
only call the `run(\`bash "${path.join(SCRIPTS, "fix-gpu-device-plugin.sh")}"
...\`)` when that NVIDIA check succeeds; keep the `gpu` guard but add the
NVIDIA-specific condition so Apple Silicon or other GPUs skip the patch. Use the
existing symbols `gpu`, `run`, `SCRIPTS`, and `"fix-gpu-device-plugin.sh"` to
locate and modify the code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd4deba8-9dea-44ab-b54a-edacc9e9580f

📥 Commits

Reviewing files that changed from the base of the PR and between 1e23347 and 28b86ee.

📒 Files selected for processing (9)
  • bin/lib/onboard.js
  • docs/deployment/deploy-to-remote-gpu.md
  • docs/inference/switch-inference-providers.md
  • nemoclaw/src/commands/connect.test.ts
  • nemoclaw/src/commands/logs.test.ts
  • nemoclaw/src/commands/onboard.ts
  • nemoclaw/src/onboard/config.test.ts
  • nemoclaw/src/onboard/validate.test.ts
  • scripts/fix-gpu-device-plugin.sh

Comment thread bin/lib/onboard.js
Comment thread bin/lib/onboard.js
Comment thread docs/inference/switch-inference-providers.md Outdated
Comment thread nemoclaw/src/commands/onboard.ts Outdated
Comment thread nemoclaw/src/commands/onboard.ts Outdated
Comment thread nemoclaw/src/onboard/config.test.ts
Comment thread nemoclaw/src/onboard/validate.test.ts
Comment thread scripts/fix-gpu-device-plugin.sh Outdated
…cript

- Eliminate shell injection in createChatVariant (JS) and
  createOllamaChatVariant (TS) by writing Modelfile to a temp file and
  using execFileSync('ollama', ['create', name, '-f', tmpFile])
- Scope GPU device plugin patch to NVIDIA GPUs only (gpu.type === 'nvidia')
- Fix default model conflict: nemotron-3-nano matches reasoning patterns,
  use nemotron-3-nano-chat variant as default with setup instructions
- Fix Ollama-with-no-models fallback in TS: suggest pulling local models
  instead of offering hosted NVIDIA cloud model IDs
- Add fail-fast error handling in fix-gpu-device-plugin.sh: check download
  success, verify non-empty file, report patch failures instead of swallowing
- Replace secret-shaped test token with synthetic placeholder to avoid
  Gitleaks scanner blocks
- Switch vi.resetAllMocks() to vi.clearAllMocks() in test files to preserve
  vi.mock() factory implementations
- Align docs variant name to nemotron-3-nano-chat (matches onboarding)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
nemoclaw/src/commands/onboard.ts (2)

416-421: Fallback to llama3.2 assumes the model will be pulled later.

When no Ollama models are found, the code defaults to llama3.2 and logs suggestions to pull a model. This is reasonable, but the user won't be prompted to actually pull the model before continuing. Consider whether onboarding should pause or provide clearer next steps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/onboard.ts` around lines 416 - 421, The code in the
onboarding flow sets model = "llama3.2" when no Ollama models are found (the
else branch that logs suggestions), but never forces or prompts the user to
actually pull a model; update the onboarding logic in the same function (where
model is assigned) to instead halt progress and prompt the user to pull a model
(or auto-run the pull if interactive), e.g., present a clear prompt/confirmation
to run "ollama pull llama3.2" (or accept a different model name), validate the
pull succeeded before assigning model and continuing, and surface any pull
errors via logger so downstream code never proceeds with an unavailable model.

39-51: Consider using execFileSync instead of execSync with shell command.

The listOllamaModels function shells out to curl. While this is safe (no user input interpolated), using execFileSync with explicit arguments would be more consistent with the security-conscious approach in createOllamaChatVariant.

♻️ Suggested refactor
 function listOllamaModels(): string[] {
   try {
-    const raw = execSync("curl -sf http://localhost:11434/api/tags", {
+    const raw = execFileSync("curl", ["-sf", "http://localhost:11434/api/tags"], {
       encoding: "utf-8",
       stdio: ["pipe", "pipe", "pipe"],
       timeout: 5000,
-    });
+    }).toString();
     const data = JSON.parse(raw) as { models?: Array<{ name: string }> };
     return (data.models ?? []).map((m) => m.name);
   } catch {
     return [];
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/onboard.ts` around lines 39 - 51, listOllamaModels
currently calls execSync with a shell string that invokes curl; change it to use
execFileSync to avoid shell interpretation (e.g., call execFileSync("curl",
["-sf", "http://localhost:11434/api/tags"], { encoding: "utf-8", stdio:
["pipe","pipe","pipe"], timeout: 5000 })) so the same options (encoding, stdio,
timeout) are preserved but arguments are passed explicitly; update the call
inside listOllamaModels and ensure any imports/types remain correct and error
handling (the catch returning []) is kept; follow the same pattern used by
createOllamaChatVariant for consistency.
scripts/fix-gpu-device-plugin.sh (1)

26-27: Use a unique temp file and cleanup trap for chart artifact.

The fixed /tmp filename is predictable and leaves artifacts behind. Prefer mktemp + trap cleanup.

♻️ Proposed refactor
-CHART_TGZ="/tmp/nvidia-device-plugin-${CHART_VERSION}.tgz"
+CHART_TGZ="$(mktemp -t nvidia-device-plugin-${CHART_VERSION}.XXXXXX.tgz)"
+trap 'rm -f "$CHART_TGZ"' EXIT
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/fix-gpu-device-plugin.sh` around lines 26 - 27, Replace the fixed
/tmp filename used by CHART_TGZ with a secure temporary file created via mktemp
and register a trap to remove that temp file on EXIT (and on signals) so the
chart artifact is unique and always cleaned up; update the CHART_TGZ assignment
and add a trap cleanup block referencing CHART_TGZ to ensure removal even on
errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/fix-gpu-device-plugin.sh`:
- Around line 52-59: The current success check only inspects .items[0] and exits
when that single pod is Running; change the loop to inspect all pods in the
nvidia-device-plugin namespace (use kubectl get pods -n nvidia-device-plugin -o
jsonpath='{.items[*].status.phase}' via openshell doctor exec) and require at
least one pod in Running AND verify node GPU allocatable by querying nodes
(kubectl get nodes -o jsonpath='{range
.items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}')
and only return success when a non-zero nvidia.com/gpu allocatable is present on
at least one node; keep retry loop/backoff logic and update the success message
to reflect both checks.

---

Nitpick comments:
In `@nemoclaw/src/commands/onboard.ts`:
- Around line 416-421: The code in the onboarding flow sets model = "llama3.2"
when no Ollama models are found (the else branch that logs suggestions), but
never forces or prompts the user to actually pull a model; update the onboarding
logic in the same function (where model is assigned) to instead halt progress
and prompt the user to pull a model (or auto-run the pull if interactive), e.g.,
present a clear prompt/confirmation to run "ollama pull llama3.2" (or accept a
different model name), validate the pull succeeded before assigning model and
continuing, and surface any pull errors via logger so downstream code never
proceeds with an unavailable model.
- Around line 39-51: listOllamaModels currently calls execSync with a shell
string that invokes curl; change it to use execFileSync to avoid shell
interpretation (e.g., call execFileSync("curl", ["-sf",
"http://localhost:11434/api/tags"], { encoding: "utf-8", stdio:
["pipe","pipe","pipe"], timeout: 5000 })) so the same options (encoding, stdio,
timeout) are preserved but arguments are passed explicitly; update the call
inside listOllamaModels and ensure any imports/types remain correct and error
handling (the catch returning []) is kept; follow the same pattern used by
createOllamaChatVariant for consistency.

In `@scripts/fix-gpu-device-plugin.sh`:
- Around line 26-27: Replace the fixed /tmp filename used by CHART_TGZ with a
secure temporary file created via mktemp and register a trap to remove that temp
file on EXIT (and on signals) so the chart artifact is unique and always cleaned
up; update the CHART_TGZ assignment and add a trap cleanup block referencing
CHART_TGZ to ensure removal even on errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 16b70437-a4c4-4e69-b526-ff46b3b197bc

📥 Commits

Reviewing files that changed from the base of the PR and between 28b86ee and 555bf7f.

📒 Files selected for processing (6)
  • bin/lib/onboard.js
  • docs/inference/switch-inference-providers.md
  • nemoclaw/src/commands/onboard.ts
  • nemoclaw/src/onboard/config.test.ts
  • nemoclaw/src/onboard/validate.test.ts
  • scripts/fix-gpu-device-plugin.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • nemoclaw/src/onboard/config.test.ts

Comment on lines +52 to +59
# Wait for device plugin to come up
echo "Waiting for device plugin pods..."
for i in $(seq 1 30); do
status=$(openshell doctor exec -- kubectl get pods -n nvidia-device-plugin -o jsonpath='{.items[0].status.phase}' 2>/dev/null || echo "Pending")
if [ "$status" = "Running" ]; then
echo "✓ nvidia-device-plugin is running"
exit 0
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Success check is too weak: one Running pod does not prove GPU allocatable recovery.

Current logic checks only the first pod (.items[0]) and exits success as soon as that one is Running. This can miss real states in multi-node setups and can still report success before nvidia.com/gpu is allocatable on nodes.

🔧 Proposed fix
 echo "Waiting for device plugin pods..."
 for i in $(seq 1 30); do
-  status=$(openshell doctor exec -- kubectl get pods -n nvidia-device-plugin -o jsonpath='{.items[0].status.phase}' 2>/dev/null || echo "Pending")
-  if [ "$status" = "Running" ]; then
-    echo "✓ nvidia-device-plugin is running"
-    exit 0
-  fi
+  running_count=$(
+    openshell doctor exec -- kubectl get pods -n nvidia-device-plugin \
+      -o jsonpath='{range .items[*]}{.status.phase}{"\n"}{end}' 2>/dev/null \
+      | grep -c '^Running$' || true
+  )
+  if [ "$running_count" -gt 0 ]; then
+    gpu_nodes=$(
+      openshell doctor exec -- kubectl get nodes \
+        -o jsonpath='{range .items[*]}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' 2>/dev/null \
+        | awk '$1 != "" && $1 != "0" {count++} END {print count+0}'
+    )
+    if [ "$gpu_nodes" -gt 0 ]; then
+      echo "✓ nvidia-device-plugin is running and GPUs are allocatable on ${gpu_nodes} node(s)"
+      exit 0
+    fi
+  fi
   sleep 2
 done
#!/usr/bin/env bash
set -euo pipefail

echo "Pods in nvidia-device-plugin namespace:"
openshell doctor exec -- kubectl get pods -n nvidia-device-plugin \
  -o custom-columns=NAME:.metadata.name,PHASE:.status.phase --no-headers

echo
echo "Allocatable GPUs per node:"
openshell doctor exec -- kubectl get nodes \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}'

Expected verification result: at least one pod in Running phase and at least one node with non-zero nvidia.com/gpu allocatable.

🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 54-54: i appears unused. Verify use (or export if used externally).

(SC2034)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/fix-gpu-device-plugin.sh` around lines 52 - 59, The current success
check only inspects .items[0] and exits when that single pod is Running; change
the loop to inspect all pods in the nvidia-device-plugin namespace (use kubectl
get pods -n nvidia-device-plugin -o jsonpath='{.items[*].status.phase}' via
openshell doctor exec) and require at least one pod in Running AND verify node
GPU allocatable by querying nodes (kubectl get nodes -o jsonpath='{range
.items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}')
and only return success when a non-zero nvidia.com/gpu allocatable is present on
at least one node; keep retry loop/backoff logic and update the success message
to reflect both checks.

…lback warnings

- Add early return in isReasoningModel() to skip names ending with '-chat'
- Add user guidance in experimental no-models fallback
- Log warning on non-interactive variant creation failure
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
nemoclaw/src/commands/onboard.ts (1)

18-89: Consider extracting shared logic to avoid duplication.

The reasoning model patterns and helper functions (isReasoningModel, listOllamaModels, createOllamaChatVariant) are duplicated between this file and bin/lib/onboard.js. If these files share a runtime context, consider extracting to a shared module to keep the patterns in sync.

This is a minor observation and can be deferred if the duplication is intentional due to different build/runtime contexts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/onboard.ts` around lines 18 - 89, Duplicate Ollama
helper logic (KNOWN_REASONING_MODEL_PATTERNS, isReasoningModel,
listOllamaModels, createOllamaChatVariant) exists in this file and
bin/lib/onboard.js; extract these into a single shared module (e.g.,
ollamaHelpers) exporting the patterns and functions, update this file to import
those exports instead of declaring them locally, and update bin/lib/onboard.js
to import the same module; ensure exported types/signatures match (string[]
return for listOllamaModels, PluginLogger param for createOllamaChatVariant) and
that any runtime/build differences (CommonJS vs ESM) are handled by adjusting
the import style or build config so both runtimes can consume the shared module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nemoclaw/src/commands/onboard.ts`:
- Around line 18-89: Duplicate Ollama helper logic
(KNOWN_REASONING_MODEL_PATTERNS, isReasoningModel, listOllamaModels,
createOllamaChatVariant) exists in this file and bin/lib/onboard.js; extract
these into a single shared module (e.g., ollamaHelpers) exporting the patterns
and functions, update this file to import those exports instead of declaring
them locally, and update bin/lib/onboard.js to import the same module; ensure
exported types/signatures match (string[] return for listOllamaModels,
PluginLogger param for createOllamaChatVariant) and that any runtime/build
differences (CommonJS vs ESM) are handled by adjusting the import style or build
config so both runtimes can consume the shared module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a5b35224-827c-4d7b-9a9e-ab9a9caa3970

📥 Commits

Reviewing files that changed from the base of the PR and between 555bf7f and 6510b5e.

📒 Files selected for processing (2)
  • bin/lib/onboard.js
  • nemoclaw/src/commands/onboard.ts

…ktemp

- Align TS fallback model from llama3.2 to nemotron-3-nano-chat for
  consistency with JS implementation (prevents reasoning-model blank responses)
- Replace execSync('curl ...') with execFileSync('curl', [...]) in
  listOllamaModels to avoid shell interpolation
- Use mktemp + trap in fix-gpu-device-plugin.sh for secure temp file handling

Addresses CodeRabbit review comments on PR NVIDIA#278.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
nemoclaw/src/commands/onboard.ts (1)

418-427: ⚠️ Potential issue | 🟠 Major

Avoid silently selecting a model that may not exist locally.

When no local models are found, setting model = "nemotron-3-nano-chat" can still produce a broken saved config and failed inference routing. Prefer prompting for an explicit local model ID (or aborting with instructions) instead of auto-selecting a likely-missing variant.

🛠️ Suggested adjustment
     } else {
-      // No models pulled yet — default to nemotron-3-nano-chat (non-reasoning variant)
-      // to avoid blank responses. The user still needs to pull the base model and
-      // create the variant after onboarding.
-      model = "nemotron-3-nano-chat";
-      logger.warn("No models found in local Ollama. Defaulting to nemotron-3-nano-chat.");
-      logger.warn("After onboarding, run:");
-      logger.warn("  ollama pull nemotron-3-nano");
-      logger.warn("  echo 'FROM nemotron-3-nano' | ollama create nemotron-3-nano-chat");
+      logger.warn("No models found in local Ollama.");
+      logger.warn("Run `ollama pull <model>` first, then enter a local model ID.");
+      model = await promptInput("Enter a local Ollama model ID");
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/onboard.ts` around lines 418 - 427, The code currently
assigns model = "nemotron-3-nano-chat" in the no-local-models branch which can
create an invalid saved config; instead, change this branch to abort onboarding
(throw or return an error) and log clear, actionable instructions for the user
to pull/create a local model (use the existing logger.warn messages as
guidance), do not set model to a default value, and ensure the calling code does
not write or save a config when model is unset; locate the assignment to model
and the surrounding else branch and replace it with an error/exit path that
prevents saving a broken model ID.
🧹 Nitpick comments (2)
scripts/fix-gpu-device-plugin.sh (1)

55-55: Unused loop variable i.

The loop counter i is only used to limit iterations, not within the loop body. You could suppress the shellcheck warning by prefixing with _ or adding a directive.

Optional fix
-for i in $(seq 1 30); do
+for _ in $(seq 1 30); do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/fix-gpu-device-plugin.sh` at line 55, The for-loop uses an unused
loop variable `i` (for i in $(seq 1 30); do); change the loop to use a
deliberately ignored variable (e.g., `_` instead of `i`) or add a shellcheck
directive to suppress the warning so the counter only limits iterations without
triggering an unused-variable warning; update the loop declaration that
currently uses `i` accordingly.
nemoclaw/src/commands/onboard.ts (1)

41-47: Prefer Node HTTP APIs over spawning curl for model listing.

Spawning curl adds an external binary dependency and can fail even when Node networking works. Using fetch/http here would reduce platform coupling and simplify error handling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/onboard.ts` around lines 41 - 47, The function
listOllamaModels currently shells out to curl via execFileSync; replace that
with a native Node HTTP/fetch call (use global fetch or node: http/https) to GET
http://localhost:11434/api/tags inside listOllamaModels, handle JSON parsing,
network errors and timeouts via AbortController, and return the same string[]
result; ensure you propagate/log errors similarly to the current try/catch and
remove the execFileSync dependency and associated stdio/timeout options.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoclaw/src/commands/onboard.ts`:
- Around line 67-68: Replace the predictable tmp path and non-exclusive writes
for Modelfile with a secure temp directory: use mkdtempSync (e.g.,
mkdtempSync(join(tmpdir(), 'nemoclaw-'))) to create a cryptographically-random
temp directory, join that directory with a stable filename instead of Date.now()
(the variable tmpFile), and write the file using writeFileSync(tmpFile,
modelfile, "utf-8" | with the "wx" flag to ensure exclusive create). After
processing, remove the temp directory atomically with rmSync(tempDir, {
recursive: true, force: true }). Apply the same change to the other occurrence
around the write in the same file (lines referenced in the review).

---

Duplicate comments:
In `@nemoclaw/src/commands/onboard.ts`:
- Around line 418-427: The code currently assigns model = "nemotron-3-nano-chat"
in the no-local-models branch which can create an invalid saved config; instead,
change this branch to abort onboarding (throw or return an error) and log clear,
actionable instructions for the user to pull/create a local model (use the
existing logger.warn messages as guidance), do not set model to a default value,
and ensure the calling code does not write or save a config when model is unset;
locate the assignment to model and the surrounding else branch and replace it
with an error/exit path that prevents saving a broken model ID.

---

Nitpick comments:
In `@nemoclaw/src/commands/onboard.ts`:
- Around line 41-47: The function listOllamaModels currently shells out to curl
via execFileSync; replace that with a native Node HTTP/fetch call (use global
fetch or node: http/https) to GET http://localhost:11434/api/tags inside
listOllamaModels, handle JSON parsing, network errors and timeouts via
AbortController, and return the same string[] result; ensure you propagate/log
errors similarly to the current try/catch and remove the execFileSync dependency
and associated stdio/timeout options.

In `@scripts/fix-gpu-device-plugin.sh`:
- Line 55: The for-loop uses an unused loop variable `i` (for i in $(seq 1 30);
do); change the loop to use a deliberately ignored variable (e.g., `_` instead
of `i`) or add a shellcheck directive to suppress the warning so the counter
only limits iterations without triggering an unused-variable warning; update the
loop declaration that currently uses `i` accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 293bf11f-88ad-49f6-a0ce-d0a19e6fee7b

📥 Commits

Reviewing files that changed from the base of the PR and between 6510b5e and a84bc20.

📒 Files selected for processing (2)
  • nemoclaw/src/commands/onboard.ts
  • scripts/fix-gpu-device-plugin.sh

Comment thread nemoclaw/src/commands/onboard.ts Outdated
Comment on lines +67 to +68
const tmpFile = join(tmpdir(), `nemoclaw-modelfile-${Date.now()}`);
writeFileSync(tmpFile, modelfile, "utf-8");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n nemoclaw/src/commands/onboard.ts | head -100

Repository: NVIDIA/NemoClaw

Length of output: 4025


🏁 Script executed:

cat nemoclaw/package.json | grep -A 5 '"engines"'

Repository: NVIDIA/NemoClaw

Length of output: 154


Use a securely created temporary Modelfile path.

Lines 67–68 use a predictable filename (Date.now()) and non-exclusive write, creating a symlink/overwrite window on shared systems. Use mkdtempSync to create a temporary directory with cryptographic randomness and write exclusively using the "wx" flag. Clean up the directory atomically with rmSync.

Proposed fix
-import { writeFileSync, unlinkSync } from "node:fs";
+import { mkdtempSync, rmSync, writeFileSync } from "node:fs";
 import { tmpdir } from "node:os";
 import { join } from "node:path";
@@
-    const tmpFile = join(tmpdir(), `nemoclaw-modelfile-${Date.now()}`);
-    writeFileSync(tmpFile, modelfile, "utf-8");
+    const tmpDir = mkdtempSync(join(tmpdir(), "nemoclaw-modelfile-"));
+    const tmpFile = join(tmpDir, "Modelfile");
+    writeFileSync(tmpFile, modelfile, { encoding: "utf-8", flag: "wx", mode: 0o600 });
@@
-        unlinkSync(tmpFile);
+        rmSync(tmpDir, { recursive: true, force: true });
       } catch {
         // ignore cleanup errors
       }

Also applies to lines 75–80.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/onboard.ts` around lines 67 - 68, Replace the
predictable tmp path and non-exclusive writes for Modelfile with a secure temp
directory: use mkdtempSync (e.g., mkdtempSync(join(tmpdir(), 'nemoclaw-'))) to
create a cryptographically-random temp directory, join that directory with a
stable filename instead of Date.now() (the variable tmpFile), and write the file
using writeFileSync(tmpFile, modelfile, "utf-8" | with the "wx" flag to ensure
exclusive create). After processing, remove the temp directory atomically with
rmSync(tempDir, { recursive: true, force: true }). Apply the same change to the
other occurrence around the write in the same file (lines referenced in the
review).

…l, fix lint

- Use mkdtempSync for cryptographically random temp directories
  instead of predictable Date.now()-based paths (TOCTOU fix)
- Prompt user for model ID when no local Ollama models found
  instead of silently defaulting to potentially non-existent model
- Fix unused loop variable in fix-gpu-device-plugin.sh
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
nemoclaw/src/commands/onboard.ts (1)

18-36: Consider extracting Ollama helpers to a shared module.

KNOWN_REASONING_MODEL_PATTERNS, isReasoningModel, listOllamaModels, and createOllamaChatVariant are duplicated nearly identically in bin/lib/onboard.js. If these need to stay in sync, consider extracting them to a shared module to avoid maintenance drift.

♻️ Suggested structure

Create a shared module (e.g., shared/ollama-helpers.js or a TypeScript equivalent) that exports these functions, and import them in both files. This ensures any future pattern additions or bug fixes apply consistently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/commands/onboard.ts` around lines 18 - 36, The duplicated Ollama
helper logic (KNOWN_REASONING_MODEL_PATTERNS, isReasoningModel,
listOllamaModels, createOllamaChatVariant) should be moved into a single shared
module and imported where needed; create a new module that exports those
symbols, replace the local copies with imports, and update callers to use the
shared exports so future changes (pattern updates or bug fixes) are centralized
and no longer duplicated across onboard.ts and the bin/lib variant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nemoclaw/src/commands/onboard.ts`:
- Around line 18-36: The duplicated Ollama helper logic
(KNOWN_REASONING_MODEL_PATTERNS, isReasoningModel, listOllamaModels,
createOllamaChatVariant) should be moved into a single shared module and
imported where needed; create a new module that exports those symbols, replace
the local copies with imports, and update callers to use the shared exports so
future changes (pattern updates or bug fixes) are centralized and no longer
duplicated across onboard.ts and the bin/lib variant.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0d123abb-eb8a-445d-87ca-6d10796fb9c9

📥 Commits

Reviewing files that changed from the base of the PR and between a84bc20 and 5bc2a18.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • nemoclaw/src/commands/onboard.ts
  • scripts/fix-gpu-device-plugin.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/fix-gpu-device-plugin.sh

@kagura-agent
Copy link
Copy Markdown
Contributor Author

Closing — this PR overlaps with #279 (test files) and bundles unrelated changes (GPU device plugin, docs, onboard). I'll submit focused PRs for individual fixes.

@wscurran wscurran added Local Models Running NemoClaw with local models NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels Mar 18, 2026
mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Local Models Running NemoClaw with local models NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants