Fix/exclude venv from build context by Junior00619 · Pull Request #1110 · NVIDIA/NemoClaw

Junior00619 · 2026-03-30T17:55:11Z

Summary

Prevent local Python virtual environments from being copied into the sandbox image build context. This aligns .venv handling with the existing node_modules cleanup pattern across onboarding and setup flows, and adds exclusion rules so developer-local artifacts do not leak into staged builds.

Related Issue

Fixes #774

Changes

remove staged nemoclaw-blueprint/.venv after blueprint copy in interactive onboarding
remove staged nemoclaw-blueprint/.venv after blueprint copy in scripted setup
exclude .venv from the VM deploy rsync path
add .venv ignore coverage in .dockerignore and .gitignore
add targeted test coverage to verify the exclusion is enforced in the relevant build-context paths

Type of Change

Code change for a new feature, bug fix, or refactor.
Code change with doc updates.
Doc only. Prose changes without code sample modifications.
Doc only. Includes code sample changes.

Testing

npx prek run --all-files passes (or equivalently make check).
npm test passes.
make docs builds without warnings. (for doc-only changes)

Additional validation:

npx vitest run test/build-context-clean.test.js passes
npx vitest run --project cli shows no new failures relative to clean HEAD
(385 passed, 5 failed, matching the same 5 pre-existing failures on unmodified main)

Checklist

General

I have read and followed the [contributing guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
I have read and followed the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). (for doc-only changes)

Code Changes

Formatters applied — npx prek run --all-files auto-fixes formatting (or make format for targeted runs).
Tests added or updated for new or changed behavior.
No secrets, API keys, or credentials committed.
Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Doc Changes

Follows the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). Try running the update-docs agent skill to draft changes while complying with the style guide. For example, prompt your agent with "/update-docs catch up the docs for the new changes I made in this PR."
New pages include SPDX license header and frontmatter, if creating a new page.
Cross-references and links verified.

Summary by CodeRabbit

Release Notes

New Features
- Added GPU-accelerated agent sandbox support with NVIDIA time-slicing to run multiple agents on a single GPU
- Added Docker container management tools accessible from within sandboxes
- Added Claw3D 3D virtual office integration
- Added sandbox filesystem mounting via SSHFS, backup/restore, and resume functionality after reboot
- Added autostart setup for persistent sandbox lifecycle management
Documentation
- Added deployment guide for running multiple GPU agents
- Added comprehensive setup documentation for new integrations and features
Chores
- Updated base environment with headless X11 tools and Docker CLI

- New `nemoclaw sandbox-init <name>` command that sets up workspace identity files (IDENTITY.md, SOUL.md, AGENTS.md, USER.md), applies network policies, configures GitHub credentials, and registers an agent entry in openclaw.json - All steps are safe to re-run; skips duplicate entries - Supports --agent-name, --agent-id, --parent-agent, --soul, --identity, --agents, --user, --policy, --no-github, --non-interactive flags - 15 unit tests covering sandbox validation, agent registration, idempotency, subagent wiring, policy deduplication, and file resolution

- Dockerfile.sandbox-ai: CUDA 12.6 + PyTorch + Node 22 sandbox image - scripts/post-onboard-gpu.sh: swap standard onboard sandbox for GPU image - scripts/add-gpu-agent.sh: create additional GPU agents with time-slicing

- bin/lib/sandbox-resume.js: starts gateway, port forward, returns auth token - bin/nemoclaw.js: wire resume into CLI dispatch - test/sandbox-resume.test.js: 8 tests covering resume flow

…box-mount-backup

…skills

- Dockerfiles: GPU-capable base and sandbox-ai images - onboard.js: GPU onboarding flow - sandbox-add-gpu-agent.js: add GPU agent to sandbox - sandbox-init.js: sandbox initialization with GPU support - sandbox-resume.js: resume flow updates - nemoclaw.js: GPU agent CLI commands - start-services.sh: docker-proxy service management - mount-sandbox.sh, nemoclaw-start.sh, post-onboard-gpu.sh: GPU scripts - extensions/docker-proxy: OpenClaw docker-proxy plugin - scripts/docker-proxy.js: docker proxy server - nemoclaw-blueprint: docker-proxy policy preset - .agents/skills/nemoclaw-docker-proxy: agent skill

- add opts.model to addGpuAgent JSDoc typedef - annotate ALLOWED_ROUTES as Array<[string, RegExp]> - cast socket to net.Socket for setNoDelay call

- extensions/claw3d: OpenClaw plugin with tools for office list, office map, send message, and studio settings via the Claw3D REST API - nemoclaw-blueprint/policies/presets/claw3d.yaml: network policy preset allowing agent access to host.openshell.internal:3000 - start-services.sh, sandbox-resume.js: fix gateway relay container lookup to search openshell-cluster-* containers for the sandbox pod rather than assuming openshell-cluster-<sandboxName>

detectGpu() queries nvidia-smi for VRAM, which returns [N/A] on unified-memory architectures. The existing fallback matched GPU names containing "GB10" (DGX Spark) but missed Jetson AGX Thor and Orin, leaving those devices undetected. Broaden the name check to ["GB10", "Thor", "Orin"]. Align nim.js with the runner.runCapture() indirection already used in sandbox-add-gpu-agent.js to enable mocked test coverage of the fallback path. Five new tests exercise each device tag, a desktop GPU negative case, and the standard VRAM-queryable early return. Fixes NVIDIA#300

nemoclaw-blueprint/ is copied into the Docker build context via cp -r during onboarding. If a developer has run uv sync locally, the resulting .venv directory (often hundreds of MB) is included in the staged context and baked into the sandbox image. Add rm -rf of nemoclaw-blueprint/.venv after staging in both onboard.js and setup.sh, matching the existing node_modules cleanup pattern. Add --exclude .venv to the rsync in nemoclaw.js. Also add .venv to .dockerignore and .gitignore as defense-in-depth. Fixes NVIDIA#774

coderabbitai · 2026-03-30T17:55:27Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces GPU-accelerated sandbox support, Docker Engine API access for sandboxes, Claw3D 3D office integration, and sandbox lifecycle management (mount/backup/resume). It adds comprehensive tooling for multi-agent GPU deployment, post-reboot recovery, local filesystem mounting via SSHFS, and fixes build-context hygiene by excluding .venv directories.

Changes

Cohort / File(s)	Summary
GPU Sandbox Support `.agents/skills/docs/nemoclaw-gpu-sandbox/SKILL.md`, `Dockerfile.sandbox-ai`, `scripts/post-onboard-gpu.sh`, `scripts/add-gpu-agent.sh`, `bin/lib/sandbox-add-gpu-agent.js`	Introduces CUDA-enabled GPU-accelerated sandbox image with PyTorch/ML libraries, GPU agent creation workflow with NVIDIA device-plugin time-slicing configuration, and automation for converting onboarded sandboxes to GPU-backed variants.
Docker Proxy Integration `.agents/skills/docs/nemoclaw-docker-proxy/SKILL.md`, `.agents/skills/nemoclaw-docker-proxy/SKILL.md`, `extensions/docker-proxy/*`, `nemoclaw-blueprint/policies/presets/docker-proxy.yaml`, `scripts/docker-proxy.js`, `scripts/start-services.sh`	Implements host-side HTTP proxy that restricts Docker Engine API access from sandboxes; includes Docker client wrapper, allowlist validation, body inspection for privileged/host-network rejection, and service startup integration.
Claw3D Integration `extensions/claw3d/*`, `nemoclaw-blueprint/policies/presets/claw3d.yaml`	Adds OpenClaw plugin for Claw3D 3D virtual office REST API with tools for office listing/getting, message sending, and studio settings management.
Sandbox Lifecycle Management `scripts/mount-sandbox.sh`, `scripts/backup-workspace.sh`, `scripts/resume.sh`, `scripts/resume-all-sandboxes.sh`, `scripts/setup-autostart.sh`, `bin/lib/sandbox-resume.js`, `.agents/skills/docs/nemoclaw-mount-filesystem/SKILL.md`	Adds SSHFS mounting for local data access, extends backup/restore to handle broader `/sandbox/.openclaw-data` structure, implements resume-from-reboot with gateway restart and port forwarding, and autostart via systemd user service.
CLI and Onboarding Updates `bin/nemoclaw.js`, `bin/lib/onboard.js`, `bin/lib/sandbox-init.js`, `bin/lib/nim.js`	Adds new global commands `sandbox-init`, `add-gpu-agent`, and `resume` with per-sandbox actions (mount/unmount/backup/restore); extends onboarding with `FORCE_GPU_SANDBOX` and `SKIP_PROBE` env flags; adds `sandboxInit` for agent registration and workspace bootstrap; expands unified-memory GPU detection for multiple NVIDIA devices.
Build Configuration and Context `Dockerfile`, `Dockerfile.base`, `.dockerignore`, `.gitignore`, `scripts/setup.sh`	Expands gateway LAN access (10.200.0.1), increases model context/token limits, adds X11/Docker tooling to base image, excludes `.venv` from Docker and git, and removes `.venv` during build-context staging.
Documentation `docs/deployment/run-multiple-gpu-agents.md`, `docs/index.md`	Adds guide for multi-GPU agent deployment on single physical GPU with time-slicing, and updates navigation index.
Tests `test/build-context-clean.test.js`, `test/cli.test.js`, `test/nim.test.js`, `test/sandbox-add-gpu-agent.test.js`, `test/sandbox-init.test.js`, `test/sandbox-resume.test.js`	Adds test coverage for `.venv` exclusion hygiene, new CLI commands (mount/unmount/resume), unified-memory GPU detection, GPU agent creation (ConfigMap/DaemonSet patching, image import, sandbox lifecycle), agent sandbox initialization and parent registration, and sandbox resume workflows.

Sequence Diagrams

sequenceDiagram
    participant User as User (CLI)
    participant Nemoclaw as nemoclaw add-gpu-agent
    participant Registry as Sandbox Registry
    participant Gateway as OpenShell Gateway<br/>(k3s Cluster)
    participant Containerd as k3s Containerd<br/>(Image Store)
    participant Sandbox as New GPU<br/>Sandbox Pod
    participant Parent as Parent Sandbox<br/>OpenClaw Config
    
    User->>Nemoclaw: add-gpu-agent agent-name<br/>--parent parent-name
    
    Nemoclaw->>Registry: Resolve parent sandbox
    Registry-->>Nemoclaw: Parent metadata
    
    Nemoclaw->>Gateway: Check allocatable<br/>nvidia.com/gpu count
    alt GPU count ≤ 1
        Nemoclaw->>Gateway: Apply nvidia-device-plugin<br/>ConfigMap with timeSlicing
        Nemoclaw->>Gateway: Patch DaemonSet to<br/>use config file
        Nemoclaw->>Gateway: Poll until GPU count > 1
    end
    
    Nemoclaw->>Containerd: Check if<br/>nemoclaw-sandbox-ai:v3<br/>exists
    alt Image not present
        Nemoclaw->>Containerd: Import GPU image<br/>from local Docker
    end
    
    Nemoclaw->>Gateway: Create sandbox from<br/>GPU image + --gpu
    Gateway->>Sandbox: Provision pod<br/>with GPU
    
    Nemoclaw->>Sandbox: Poll for Ready state<br/>(up to 60s)
    Sandbox-->>Nemoclaw: Ready
    
    Nemoclaw->>Sandbox: Start openclaw gateway<br/>via SSH proxy
    Sandbox->>Sandbox: Gateway listening<br/>on port 18789
    
    Nemoclaw->>Sandbox: Extract auth token
    Sandbox-->>Nemoclaw: token from openclaw.json
    
    Nemoclaw->>Parent: Update openclaw.json<br/>to register agent<br/>as subagent
    Parent-->>Nemoclaw: Config patched
    
    Nemoclaw->>Registry: Mark gpuEnabled: true,<br/>parentAgent reference
    Registry-->>Nemoclaw: Updated
    
    Nemoclaw-->>User: ✓ GPU agent created<br/>Dashboard URL + token

sequenceDiagram
    participant User as User (CLI)
    participant Resume as sandboxResume
    participant Openshell as openshell CLI<br/>(Cluster API)
    participant Sandbox as Sandbox Pod<br/>(OpenClaw)
    participant GatewayRelay as gateway-relay.py<br/>(Host)
    participant Dashboard as Dashboard<br/>Port 18789
    
    User->>Resume: nemoclaw sandbox resume
    
    Resume->>Openshell: sandbox list<br/>find target sandbox
    Openshell-->>Resume: Sandbox metadata
    
    Resume->>Sandbox: Check if gateway<br/>listening on 18789<br/>(kubectl exec ss -tlnp)
    alt Gateway not listening
        Resume->>Sandbox: Start openclaw gateway<br/>nohup with HOME=/sandbox
        Sandbox->>Sandbox: Gateway starts<br/>and listens
        Resume->>Resume: Poll until listening<br/>(max 30s)
    end
    
    Resume->>Openshell: Get sandbox container<br/>and IP
    Openshell-->>Resume: Container info
    
    Resume->>Openshell: Kill stale<br/>gateway-relay.py
    
    Resume->>Openshell: Start kubectl port-forward<br/>inside cluster container<br/>18789 → all interfaces
    
    Resume->>GatewayRelay: Start gateway-relay.py<br/>with cluster container IP
    GatewayRelay->>Dashboard: Forward port 18789
    
    Resume->>Sandbox: Extract auth token<br/>from openclaw.json<br/>(kubectl exec + python)
    Sandbox-->>Resume: token value
    
    Resume-->>User: ✓ Gateway started<br/>token, port forwarded

sequenceDiagram
    participant Agent as Agent inside<br/>Sandbox
    participant Proxy as docker-proxy.js<br/>(Host HTTP Server)<br/>Port 2376
    participant Allowlist as Request Validation<br/>(Allowlist + Body Check)
    participant Docker as Docker Engine<br/>Daemon (Socket/TCP)
    
    Agent->>Proxy: POST /v1.47/containers/create<br/>DOCKER_HOST=tcp://host.openshell.internal:2376
    
    Proxy->>Allowlist: Check method/path<br/>against allowlist
    Allowlist-->>Proxy: ✓ POST /containers/**<br/>allowed
    
    Proxy->>Allowlist: Buffer and parse<br/>JSON body
    Allowlist->>Allowlist: Validate: no Privileged,<br/>no HostNetwork,<br/>no dangerous CapAdd,<br/>no blocked mounts
    alt Validation fails
        Allowlist-->>Proxy: ✗ Forbidden
        Proxy-->>Agent: HTTP 403<br/>{ error: "..." }
    else Validation succeeds
        Allowlist-->>Proxy: ✓ Body valid
        Proxy->>Docker: Forward validated<br/>request + body<br/>to upstream Docker
        Docker-->>Proxy: Container created<br/>{ Id, Warnings }
        Proxy-->>Agent: HTTP 201<br/>Response forwarded
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

fix: harden installer and onboard resiliency #961: Modifies bin/lib/onboard.js for onboard-session/resume workflows; overlaps with this PR's onboarding GPU flag additions and sandboxResume integration.
fix: improve gateway lifecycle recovery #953: Updates gateway startup and onboarding resilience in bin/lib/onboard.js and scripts/setup.sh; relates to this PR's build-context cleanup and gateway initialization changes.

Poem

🐰 A sandbox garden grows so bright—
With GPUs humming through the night!
We mount the data, resume with care,
And Docker doors now open there.
Three 3D offices dance in space,
While cleanup keeps the build-context clean and safe! 🏗️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Fix/exclude venv from build context' accurately describes the main objective of the changeset—preventing .venv from being included in the Docker build context during sandbox image creation.
Linked Issues check	✅ Passed	All coding objectives from issue `#774` are met: .venv is excluded via .dockerignore, .gitignore, and removal in onboard.js/setup.sh scripts; rsync exclusion added to nemoclaw.js; comprehensive tests validate exclusion across all relevant paths.
Out of Scope Changes check	✅ Passed	While the PR includes significant scope beyond the core issue (`#774`)—multiple new skills documentation, GPU sandbox support, Docker proxy plugin, CLI commands, and numerous supporting scripts—these changes align with the broader NemoClaw enhancement roadmap and enable the GPU/Docker functionality referenced in related skill documentation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch fix/exclude-venv-from-build-context

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Junior00619 · 2026-03-30T18:05:48Z

Closing — discovered during rebase that #774 is already resolved on main via copyBuildContextDir() in onboard.js (which filters .venv) and clean-staged-tree.sh in setup.sh. No additional changes needed.

blacksaturn1 and others added 19 commits March 25, 2026 22:27

feat: add sandbox mount, backup, and resume CLI commands

bc42edc

feat: gpu sandbox enablement scripts

813d141

- Dockerfile.sandbox-ai: CUDA 12.6 + PyTorch + Node 22 sandbox image - scripts/post-onboard-gpu.sh: swap standard onboard sandbox for GPU image - scripts/add-gpu-agent.sh: create additional GPU agents with time-slicing

feat: add sandbox resume command

272373a

- bin/lib/sandbox-resume.js: starts gateway, port forward, returns auth token - bin/nemoclaw.js: wire resume into CLI dispatch - test/sandbox-resume.test.js: 8 tests covering resume flow

feat: add nemoclaw add-gpu-agent command with tests and docs

1f8ccf9

feat: add setup-autostart.sh for systemd user service

e40a181

docs: document autostart/reboot persistence in gpu-sandbox skill

a87c83e

fix: setup-autostart handles sandboxes-as-object and npm prefix fallback

7ef4ef2

chore: add generated resume-all-sandboxes.sh autostart helper

255fc9d

feat: merge sandbox mount, backup, and resume commands from feat/sand…

bfe5921

…box-mount-backup

feat: auto-mount sandboxes via SSHFS after resume on reboot

b322106

docs: document mount persistence in gpu-sandbox and mount-filesystem …

f78c6cb

…skills

feat: show dashboard URL with auth token in nemoclaw status

4053031

fix: resolve typescript errors in gpu sandbox files

678e62c

- add opts.model to addGpuAgent JSDoc typedef - annotate ALLOWED_ROUTES as Array<[string, RegExp]> - cast socket to net.Socket for setNoDelay call

fix: re-apply openclaw-data group-write permissions on every start

c22281b

Junior00619 closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/exclude venv from build context#1110

Fix/exclude venv from build context#1110
Junior00619 wants to merge 19 commits intoNVIDIA:mainfrom
Junior00619:fix/exclude-venv-from-build-context

Junior00619 commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Junior00619 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Junior00619 commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Testing

Checklist

General

Code Changes

Doc Changes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Junior00619 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Junior00619 commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading