Skip to content

Fix/exclude venv from build context#1110

Closed
Junior00619 wants to merge 19 commits intoNVIDIA:mainfrom
Junior00619:fix/exclude-venv-from-build-context
Closed

Fix/exclude venv from build context#1110
Junior00619 wants to merge 19 commits intoNVIDIA:mainfrom
Junior00619:fix/exclude-venv-from-build-context

Conversation

@Junior00619
Copy link
Copy Markdown
Contributor

@Junior00619 Junior00619 commented Mar 30, 2026

Summary

Prevent local Python virtual environments from being copied into the sandbox image build context. This aligns .venv handling with the existing node_modules cleanup pattern across onboarding and setup flows, and adds exclusion rules so developer-local artifacts do not leak into staged builds.

Related Issue

Fixes #774

Changes

  • remove staged nemoclaw-blueprint/.venv after blueprint copy in interactive onboarding
  • remove staged nemoclaw-blueprint/.venv after blueprint copy in scripted setup
  • exclude .venv from the VM deploy rsync path
  • add .venv ignore coverage in .dockerignore and .gitignore
  • add targeted test coverage to verify the exclusion is enforced in the relevant build-context paths

Type of Change

  • Code change for a new feature, bug fix, or refactor.
  • Code change with doc updates.
  • Doc only. Prose changes without code sample modifications.
  • Doc only. Includes code sample changes.

Testing

  • npx prek run --all-files passes (or equivalently make check).
  • npm test passes.
  • make docs builds without warnings. (for doc-only changes)

Additional validation:

  • npx vitest run test/build-context-clean.test.js passes
  • npx vitest run --project cli shows no new failures relative to clean HEAD
    (385 passed, 5 failed, matching the same 5 pre-existing failures on unmodified main)

Checklist

General

Code Changes

  • Formatters applied — npx prek run --all-files auto-fixes formatting (or make format for targeted runs).
  • Tests added or updated for new or changed behavior.
  • No secrets, API keys, or credentials committed.
  • Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Doc Changes

  • Follows the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md). Try running the update-docs agent skill to draft changes while complying with the style guide. For example, prompt your agent with "/update-docs catch up the docs for the new changes I made in this PR."
  • New pages include SPDX license header and frontmatter, if creating a new page.
  • Cross-references and links verified.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added GPU-accelerated agent sandbox support with NVIDIA time-slicing to run multiple agents on a single GPU
    • Added Docker container management tools accessible from within sandboxes
    • Added Claw3D 3D virtual office integration
    • Added sandbox filesystem mounting via SSHFS, backup/restore, and resume functionality after reboot
    • Added autostart setup for persistent sandbox lifecycle management
  • Documentation

    • Added deployment guide for running multiple GPU agents
    • Added comprehensive setup documentation for new integrations and features
  • Chores

    • Updated base environment with headless X11 tools and Docker CLI

blacksaturn1 and others added 19 commits March 25, 2026 22:27
- New `nemoclaw sandbox-init <name>` command that sets up workspace
  identity files (IDENTITY.md, SOUL.md, AGENTS.md, USER.md), applies
  network policies, configures GitHub credentials, and registers an
  agent entry in openclaw.json
- All steps are safe to re-run; skips duplicate entries
- Supports --agent-name, --agent-id, --parent-agent, --soul, --identity,
  --agents, --user, --policy, --no-github, --non-interactive flags
- 15 unit tests covering sandbox validation, agent registration,
  idempotency, subagent wiring, policy deduplication, and file resolution
- Dockerfile.sandbox-ai: CUDA 12.6 + PyTorch + Node 22 sandbox image
- scripts/post-onboard-gpu.sh: swap standard onboard sandbox for GPU image
- scripts/add-gpu-agent.sh: create additional GPU agents with time-slicing
- bin/lib/sandbox-resume.js: starts gateway, port forward, returns auth token
- bin/nemoclaw.js: wire resume into CLI dispatch
- test/sandbox-resume.test.js: 8 tests covering resume flow
- Dockerfiles: GPU-capable base and sandbox-ai images
- onboard.js: GPU onboarding flow
- sandbox-add-gpu-agent.js: add GPU agent to sandbox
- sandbox-init.js: sandbox initialization with GPU support
- sandbox-resume.js: resume flow updates
- nemoclaw.js: GPU agent CLI commands
- start-services.sh: docker-proxy service management
- mount-sandbox.sh, nemoclaw-start.sh, post-onboard-gpu.sh: GPU scripts
- extensions/docker-proxy: OpenClaw docker-proxy plugin
- scripts/docker-proxy.js: docker proxy server
- nemoclaw-blueprint: docker-proxy policy preset
- .agents/skills/nemoclaw-docker-proxy: agent skill
- add opts.model to addGpuAgent JSDoc typedef
- annotate ALLOWED_ROUTES as Array<[string, RegExp]>
- cast socket to net.Socket for setNoDelay call
- extensions/claw3d: OpenClaw plugin with tools for office list, office
  map, send message, and studio settings via the Claw3D REST API
- nemoclaw-blueprint/policies/presets/claw3d.yaml: network policy preset
  allowing agent access to host.openshell.internal:3000
- start-services.sh, sandbox-resume.js: fix gateway relay container
  lookup to search openshell-cluster-* containers for the sandbox pod
  rather than assuming openshell-cluster-<sandboxName>
detectGpu() queries nvidia-smi for VRAM, which returns [N/A] on
unified-memory architectures. The existing fallback matched GPU names
containing "GB10" (DGX Spark) but missed Jetson AGX Thor and Orin,
leaving those devices undetected.

Broaden the name check to ["GB10", "Thor", "Orin"]. Align nim.js with
the runner.runCapture() indirection already used in
sandbox-add-gpu-agent.js to enable mocked test coverage of the fallback
path.

Five new tests exercise each device tag, a desktop GPU negative case,
and the standard VRAM-queryable early return.

Fixes NVIDIA#300
nemoclaw-blueprint/ is copied into the Docker build context via cp -r
during onboarding. If a developer has run uv sync locally, the resulting
.venv directory (often hundreds of MB) is included in the staged context
and baked into the sandbox image.

Add rm -rf of nemoclaw-blueprint/.venv after staging in both onboard.js
and setup.sh, matching the existing node_modules cleanup pattern. Add
--exclude .venv to the rsync in nemoclaw.js. Also add .venv to
.dockerignore and .gitignore as defense-in-depth.

Fixes NVIDIA#774
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces GPU-accelerated sandbox support, Docker Engine API access for sandboxes, Claw3D 3D office integration, and sandbox lifecycle management (mount/backup/resume). It adds comprehensive tooling for multi-agent GPU deployment, post-reboot recovery, local filesystem mounting via SSHFS, and fixes build-context hygiene by excluding .venv directories.

Changes

Cohort / File(s) Summary
GPU Sandbox Support
.agents/skills/docs/nemoclaw-gpu-sandbox/SKILL.md, Dockerfile.sandbox-ai, scripts/post-onboard-gpu.sh, scripts/add-gpu-agent.sh, bin/lib/sandbox-add-gpu-agent.js
Introduces CUDA-enabled GPU-accelerated sandbox image with PyTorch/ML libraries, GPU agent creation workflow with NVIDIA device-plugin time-slicing configuration, and automation for converting onboarded sandboxes to GPU-backed variants.
Docker Proxy Integration
.agents/skills/docs/nemoclaw-docker-proxy/SKILL.md, .agents/skills/nemoclaw-docker-proxy/SKILL.md, extensions/docker-proxy/*, nemoclaw-blueprint/policies/presets/docker-proxy.yaml, scripts/docker-proxy.js, scripts/start-services.sh
Implements host-side HTTP proxy that restricts Docker Engine API access from sandboxes; includes Docker client wrapper, allowlist validation, body inspection for privileged/host-network rejection, and service startup integration.
Claw3D Integration
extensions/claw3d/*, nemoclaw-blueprint/policies/presets/claw3d.yaml
Adds OpenClaw plugin for Claw3D 3D virtual office REST API with tools for office listing/getting, message sending, and studio settings management.
Sandbox Lifecycle Management
scripts/mount-sandbox.sh, scripts/backup-workspace.sh, scripts/resume.sh, scripts/resume-all-sandboxes.sh, scripts/setup-autostart.sh, bin/lib/sandbox-resume.js, .agents/skills/docs/nemoclaw-mount-filesystem/SKILL.md
Adds SSHFS mounting for local data access, extends backup/restore to handle broader /sandbox/.openclaw-data structure, implements resume-from-reboot with gateway restart and port forwarding, and autostart via systemd user service.
CLI and Onboarding Updates
bin/nemoclaw.js, bin/lib/onboard.js, bin/lib/sandbox-init.js, bin/lib/nim.js
Adds new global commands sandbox-init, add-gpu-agent, and resume with per-sandbox actions (mount/unmount/backup/restore); extends onboarding with FORCE_GPU_SANDBOX and SKIP_PROBE env flags; adds sandboxInit for agent registration and workspace bootstrap; expands unified-memory GPU detection for multiple NVIDIA devices.
Build Configuration and Context
Dockerfile, Dockerfile.base, .dockerignore, .gitignore, scripts/setup.sh
Expands gateway LAN access (10.200.0.1), increases model context/token limits, adds X11/Docker tooling to base image, excludes .venv from Docker and git, and removes .venv during build-context staging.
Documentation
docs/deployment/run-multiple-gpu-agents.md, docs/index.md
Adds guide for multi-GPU agent deployment on single physical GPU with time-slicing, and updates navigation index.
Tests
test/build-context-clean.test.js, test/cli.test.js, test/nim.test.js, test/sandbox-add-gpu-agent.test.js, test/sandbox-init.test.js, test/sandbox-resume.test.js
Adds test coverage for .venv exclusion hygiene, new CLI commands (mount/unmount/resume), unified-memory GPU detection, GPU agent creation (ConfigMap/DaemonSet patching, image import, sandbox lifecycle), agent sandbox initialization and parent registration, and sandbox resume workflows.

Sequence Diagrams

sequenceDiagram
    participant User as User (CLI)
    participant Nemoclaw as nemoclaw add-gpu-agent
    participant Registry as Sandbox Registry
    participant Gateway as OpenShell Gateway<br/>(k3s Cluster)
    participant Containerd as k3s Containerd<br/>(Image Store)
    participant Sandbox as New GPU<br/>Sandbox Pod
    participant Parent as Parent Sandbox<br/>OpenClaw Config
    
    User->>Nemoclaw: add-gpu-agent agent-name<br/>--parent parent-name
    
    Nemoclaw->>Registry: Resolve parent sandbox
    Registry-->>Nemoclaw: Parent metadata
    
    Nemoclaw->>Gateway: Check allocatable<br/>nvidia.com/gpu count
    alt GPU count ≤ 1
        Nemoclaw->>Gateway: Apply nvidia-device-plugin<br/>ConfigMap with timeSlicing
        Nemoclaw->>Gateway: Patch DaemonSet to<br/>use config file
        Nemoclaw->>Gateway: Poll until GPU count > 1
    end
    
    Nemoclaw->>Containerd: Check if<br/>nemoclaw-sandbox-ai:v3<br/>exists
    alt Image not present
        Nemoclaw->>Containerd: Import GPU image<br/>from local Docker
    end
    
    Nemoclaw->>Gateway: Create sandbox from<br/>GPU image + --gpu
    Gateway->>Sandbox: Provision pod<br/>with GPU
    
    Nemoclaw->>Sandbox: Poll for Ready state<br/>(up to 60s)
    Sandbox-->>Nemoclaw: Ready
    
    Nemoclaw->>Sandbox: Start openclaw gateway<br/>via SSH proxy
    Sandbox->>Sandbox: Gateway listening<br/>on port 18789
    
    Nemoclaw->>Sandbox: Extract auth token
    Sandbox-->>Nemoclaw: token from openclaw.json
    
    Nemoclaw->>Parent: Update openclaw.json<br/>to register agent<br/>as subagent
    Parent-->>Nemoclaw: Config patched
    
    Nemoclaw->>Registry: Mark gpuEnabled: true,<br/>parentAgent reference
    Registry-->>Nemoclaw: Updated
    
    Nemoclaw-->>User: ✓ GPU agent created<br/>Dashboard URL + token
Loading
sequenceDiagram
    participant User as User (CLI)
    participant Resume as sandboxResume
    participant Openshell as openshell CLI<br/>(Cluster API)
    participant Sandbox as Sandbox Pod<br/>(OpenClaw)
    participant GatewayRelay as gateway-relay.py<br/>(Host)
    participant Dashboard as Dashboard<br/>Port 18789
    
    User->>Resume: nemoclaw sandbox resume
    
    Resume->>Openshell: sandbox list<br/>find target sandbox
    Openshell-->>Resume: Sandbox metadata
    
    Resume->>Sandbox: Check if gateway<br/>listening on 18789<br/>(kubectl exec ss -tlnp)
    alt Gateway not listening
        Resume->>Sandbox: Start openclaw gateway<br/>nohup with HOME=/sandbox
        Sandbox->>Sandbox: Gateway starts<br/>and listens
        Resume->>Resume: Poll until listening<br/>(max 30s)
    end
    
    Resume->>Openshell: Get sandbox container<br/>and IP
    Openshell-->>Resume: Container info
    
    Resume->>Openshell: Kill stale<br/>gateway-relay.py
    
    Resume->>Openshell: Start kubectl port-forward<br/>inside cluster container<br/>18789 → all interfaces
    
    Resume->>GatewayRelay: Start gateway-relay.py<br/>with cluster container IP
    GatewayRelay->>Dashboard: Forward port 18789
    
    Resume->>Sandbox: Extract auth token<br/>from openclaw.json<br/>(kubectl exec + python)
    Sandbox-->>Resume: token value
    
    Resume-->>User: ✓ Gateway started<br/>token, port forwarded
Loading
sequenceDiagram
    participant Agent as Agent inside<br/>Sandbox
    participant Proxy as docker-proxy.js<br/>(Host HTTP Server)<br/>Port 2376
    participant Allowlist as Request Validation<br/>(Allowlist + Body Check)
    participant Docker as Docker Engine<br/>Daemon (Socket/TCP)
    
    Agent->>Proxy: POST /v1.47/containers/create<br/>DOCKER_HOST=tcp://host.openshell.internal:2376
    
    Proxy->>Allowlist: Check method/path<br/>against allowlist
    Allowlist-->>Proxy: ✓ POST /containers/**<br/>allowed
    
    Proxy->>Allowlist: Buffer and parse<br/>JSON body
    Allowlist->>Allowlist: Validate: no Privileged,<br/>no HostNetwork,<br/>no dangerous CapAdd,<br/>no blocked mounts
    alt Validation fails
        Allowlist-->>Proxy: ✗ Forbidden
        Proxy-->>Agent: HTTP 403<br/>{ error: "..." }
    else Validation succeeds
        Allowlist-->>Proxy: ✓ Body valid
        Proxy->>Docker: Forward validated<br/>request + body<br/>to upstream Docker
        Docker-->>Proxy: Container created<br/>{ Id, Warnings }
        Proxy-->>Agent: HTTP 201<br/>Response forwarded
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Poem

🐰 A sandbox garden grows so bright—
With GPUs humming through the night!
We mount the data, resume with care,
And Docker doors now open there.
Three 3D offices dance in space,
While cleanup keeps the build-context clean and safe! 🏗️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Fix/exclude venv from build context' accurately describes the main objective of the changeset—preventing .venv from being included in the Docker build context during sandbox image creation.
Linked Issues check ✅ Passed All coding objectives from issue #774 are met: .venv is excluded via .dockerignore, .gitignore, and removal in onboard.js/setup.sh scripts; rsync exclusion added to nemoclaw.js; comprehensive tests validate exclusion across all relevant paths.
Out of Scope Changes check ✅ Passed While the PR includes significant scope beyond the core issue (#774)—multiple new skills documentation, GPU sandbox support, Docker proxy plugin, CLI commands, and numerous supporting scripts—these changes align with the broader NemoClaw enhancement roadmap and enable the GPU/Docker functionality referenced in related skill documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/exclude-venv-from-build-context

Comment @coderabbitai help to get the list of available commands and usage tips.

@Junior00619
Copy link
Copy Markdown
Contributor Author

Closing — discovered during rebase that #774 is already resolved on main via copyBuildContextDir() in onboard.js (which filters .venv) and clean-staged-tree.sh in setup.sh. No additional changes needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sandbox image build includes nemoclaw-blueprint/.venv from local tree (breaks build / risks leaking secrets)

2 participants