fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets by BenediktSchackenberg · Pull Request #438 · NVIDIA/NemoClaw

BenediktSchackenberg · 2026-03-19T20:08:34Z

Summary

The Discord and Slack network policy presets configure WebSocket endpoints with protocol: rest + tls: terminate, which routes traffic through the egress proxy's HTTP layer. The proxy's hardcoded idle timeout (~2 minutes) kills the WebSocket connection, causing discord.js (and Slack Socket Mode) to enter a reconnect loop with ~50% uptime.

Root Cause

protocol: rest with tls: terminate means the proxy intercepts and inspects HTTP traffic. For WebSocket connections, this means the proxy sees an idle HTTP connection after the upgrade handshake and kills it after the idle timeout.

access: full creates a CONNECT tunnel — the proxy just forwards raw TCP bytes without HTTP-level inspection or timeouts. This is exactly what WebSocket connections need.

Changes

Discord preset:

gateway.discord.gg: changed from protocol: rest → access: full (WebSocket gateway)
discord.com: added PUT/PATCH/DELETE methods (message editing, reactions, channel management)
Added media.discordapp.net endpoint (attachment/media access)

Slack preset:

Added wss-primary.slack.com and wss-backup.slack.com with access: full (Socket Mode WebSocket endpoints)

Security Considerations

access: full bypasses HTTP-level inspection for the WebSocket endpoints only
REST API endpoints (discord.com, api.slack.com) remain under protocol: rest with method/path rules
This matches the existing pattern used for github.com and registry.npmjs.org in the baseline sandbox policy

What This Doesn't Fix

The hardcoded ~2-minute idle timeout in the openshell-sandbox binary still affects any protocol: rest endpoint with long-lived connections (SSE streams, etc.). That requires an upstream fix in OpenShell. This PR fixes the most common case: WebSocket-based messaging plugins.

Partially addresses #409. Related: #361 (WhatsApp Web, same root cause).

AI-assisted: Built with Claude, reviewed by human.

Summary by CodeRabbit

New Features
- Discord endpoint now supports PUT, PATCH, and DELETE in addition to GET/POST.
- Added Socket Mode WebSocket endpoints for Slack with full WebSocket access.
- Discord gateway now operates in full-access mode on port 443.
Updates
- CDN endpoint restricted to GET-only; media/attachment host switched to media.discordapp.net with TLS/enforcement.

coderabbitai · 2026-03-19T20:09:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3428c2f4-b743-430f-9fa1-b3352b10dc38

📥 Commits

Reviewing files that changed from the base of the PR and between 4bf83ea and 8ee9ec0.

📒 Files selected for processing (2)

nemoclaw-blueprint/policies/presets/discord.yaml
nemoclaw-blueprint/policies/presets/slack.yaml

✅ Files skipped from review due to trivial changes (1)

nemoclaw-blueprint/policies/presets/slack.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

nemoclaw-blueprint/policies/presets/discord.yaml

📝 Walkthrough

Walkthrough

Updated two network policy presets: Discord broadened REST methods, changed gateway to full access and moved CDN from cdn.discordapp.com to media.discordapp.net; Slack added Socket Mode WebSocket endpoints (wss-primary/wss-backup) while keeping existing REST endpoints.

Changes

Cohort / File(s)	Summary
Discord Network Policy `nemoclaw-blueprint/policies/presets/discord.yaml`	Expanded `discord.com` rules for `path: "/**"` to include `PUT`, `PATCH`, `DELETE` in addition to `GET`/`POST`. Updated `gateway.discord.gg` to include `port: 443` and `access: full`. Removed `POST` from `cdn.discordapp.com` (now `GET` only) and replaced CDN host with `media.discordapp.net` retaining `port: 443`, `protocol: rest`, `enforcement: enforce`, `tls: terminate`, `GET`-only.
Slack Network Policy `nemoclaw-blueprint/policies/presets/slack.yaml`	Extended description to mention Socket Mode. Added WebSocket endpoints `wss-primary.slack.com:443` and `wss-backup.slack.com:443` with `access: full`, alongside existing REST endpoints (`slack.com`, `api.slack.com`, `hooks.slack.com`) permitting `GET`/`POST` on `/**`.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant NemoclawPolicy
    participant DiscordGateway as gateway.discord.gg
    participant DiscordAPI as discord.com
    participant MediaCDN as media.discordapp.net

    Client->>NemoclawPolicy: Initiate connection (WebSocket or REST)
    NemoclawPolicy->>DiscordGateway: Grant full access on port 443
    NemoclawPolicy->>DiscordAPI: Allow GET, POST, PUT, PATCH, DELETE on /**
    NemoclawPolicy->>MediaCDN: Allow GET (port 443, TLS terminate)
    DiscordGateway-->>Client: Gateway responses
    DiscordAPI-->>Client: REST responses
    MediaCDN-->>Client: Media GET responses

sequenceDiagram
    autonumber
    participant AppClient
    participant NemoclawPolicy
    participant SlackWSPrimary as wss-primary.slack.com
    participant SlackWSBackup as wss-backup.slack.com
    participant SlackAPI as api.slack.com

    AppClient->>NemoclawPolicy: Open Socket Mode WebSocket
    NemoclawPolicy->>SlackWSPrimary: Grant full access (port 443)
    NemoclawPolicy->>SlackWSBackup: Grant full access (port 443)
    AppClient->>SlackAPI: REST calls (GET/POST)
    SlackWSPrimary-->>AppClient: Socket Mode messages
    SlackAPI-->>AppClient: REST responses

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I nibbled YAML lines with care,

Opened gates and changed the air,
Gateways hum on secure four-four-three,
Slack sockets sing and media hops free,
A little rabbit cheers these policy tweaks 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title describes using CONNECT tunnels for WebSocket endpoints, but the summary shows changes also include adding PUT/PATCH/DELETE methods for discord.com, media.discordapp.net endpoint, and Slack Socket Mode—not solely about CONNECT tunnel configuration.	Clarify whether the title should emphasize CONNECT tunnel usage specifically, or broaden it to capture all policy changes (e.g., 'Allow CONNECT tunnels and additional methods for Discord/Slack presets').

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Updates the Discord and Slack sandbox network policy presets so WebSocket endpoints use a CONNECT-tunnel style egress (access: full) instead of being routed through the HTTP inspection layer, preventing long-lived WebSocket connections from being terminated by the proxy idle timeout.

Changes:

Slack: add Socket Mode WebSocket endpoints (wss-primary.slack.com, wss-backup.slack.com) with access: full.
Discord: route the gateway WebSocket endpoint (gateway.discord.gg) through access: full.
Discord: expand allowed REST methods for discord.com and add a media domain endpoint for attachments.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`nemoclaw-blueprint/policies/presets/slack.yaml`	Adds CONNECT-tunneled WebSocket endpoints for Slack Socket Mode; updates preset description.
`nemoclaw-blueprint/policies/presets/discord.yaml`	Uses CONNECT tunnel for Discord gateway; broadens REST methods on `discord.com`; adds media domain endpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-19T20:11:25Z

+      # Media/attachment uploads use a separate domain
+      - host: media.discordapp.net
        port: 443
        protocol: rest
        enforcement: enforce


The comment says "Media/attachment uploads" but this endpoint only allows GET and is named media.discordapp.net (typically used for media/attachment access / proxying). Please either update the comment to match the read-only GET policy, or expand the allowed methods if uploads to this host are actually required.

Good eye — the comment was misleading. This endpoint is read-only (GET) for fetching media/attachments, not uploading. Updated the comment to say "Media/attachment access (read-only, proxied through Discord CDN)". Uploads go through discord.com REST API.

Copilot · 2026-03-19T20:11:26Z

+          - allow: { method: PUT, path: "/**" }
+          - allow: { method: PATCH, path: "/**" }
+          - allow: { method: DELETE, path: "/**" }
+      # WebSocket gateway — must use access:full (CONNECT tunnel) instead


Minor consistency nit: the comment mentions access:full, but the actual YAML key is access: full. Consider updating the comment to include the space to avoid confusion when readers search/compare against the config.

Suggested change

# WebSocket gateway — must use access:full (CONNECT tunnel) instead

# WebSocket gateway — must use access: full (CONNECT tunnel) instead

Fair point, added the space for consistency with the actual YAML syntax. Small thing but makes it easier to grep.

coderabbitai

🧹 Nitpick comments (2)

nemoclaw-blueprint/policies/presets/discord.yaml (2)

37-44: The inline comment says “uploads”, but the rule is GET-only.

Please reword this so the comment matches the policy; otherwise it reads like a missing write rule on media.discordapp.net.

✏️ Suggested wording

-      # Media/attachment uploads use a separate domain
+      # Media/attachment fetches may use a separate domain

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemoclaw-blueprint/policies/presets/discord.yaml` around lines 37 - 44,
Update the inline comment above the media.discordapp.net host block to reflect
that the policy only allows GET requests (reads) rather than uploads/writes:
change the wording referencing "Media/attachment uploads" to something like
"Media/attachment reads (served from a separate domain)" so the comment matches
the rule under host: media.discordapp.net which currently has a GET-only rule in
rules: - allow: { method: GET, path: "/**" }.

20-22: Limit the new write verbs to the API prefix for least-privilege access.

Discord's documented PUT/PATCH/DELETE endpoints for bots and webhooks all reside under /api/v{version}/, so these rules can be tightened to /api/** without losing functionality. This reduces the blast radius of inadvertent misconfigurations.

Suggested tightening

-          - allow: { method: PUT, path: "/**" }
-          - allow: { method: PATCH, path: "/**" }
-          - allow: { method: DELETE, path: "/**" }
+          - allow: { method: PUT, path: "/api/**" }
+          - allow: { method: PATCH, path: "/api/**" }
+          - allow: { method: DELETE, path: "/api/**" }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemoclaw-blueprint/policies/presets/discord.yaml` around lines 20 - 22, The
three policy rules that currently allow write verbs globally (the entries "-
allow: { method: PUT, path: "/**" }", "- allow: { method: PATCH, path: "/**" }",
and "- allow: { method: DELETE, path: "/**" }") should be tightened to only
allow those methods under the API prefix; change each path from "/**" to
"/api/**" so PUT, PATCH and DELETE are constrained to Discord API endpoints.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nemoclaw-blueprint/policies/presets/discord.yaml`:
- Around line 37-44: Update the inline comment above the media.discordapp.net
host block to reflect that the policy only allows GET requests (reads) rather
than uploads/writes: change the wording referencing "Media/attachment uploads"
to something like "Media/attachment reads (served from a separate domain)" so
the comment matches the rule under host: media.discordapp.net which currently
has a GET-only rule in rules: - allow: { method: GET, path: "/**" }.
- Around line 20-22: The three policy rules that currently allow write verbs
globally (the entries "- allow: { method: PUT, path: "/**" }", "- allow: {
method: PATCH, path: "/**" }", and "- allow: { method: DELETE, path: "/**" }")
should be tightened to only allow those methods under the API prefix; change
each path from "/**" to "/api/**" so PUT, PATCH and DELETE are constrained to
Discord API endpoints.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aa59af84-644c-4ad8-9500-b4818e92a388

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca7d37 and b926698.

📒 Files selected for processing (2)

nemoclaw-blueprint/policies/presets/discord.yaml
nemoclaw-blueprint/policies/presets/slack.yaml

coderabbitai

🧹 Nitpick comments (1)

nemoclaw-blueprint/policies/presets/discord.yaml (1)

20-22: Scope new write methods to narrower API paths when feasible.

On Line 20–Line 22, allowing PUT/PATCH/DELETE with path: "/**" increases destructive API surface. If preset behavior allows it, prefer path-scoped rules for least privilege.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemoclaw-blueprint/policies/presets/discord.yaml` around lines 20 - 22, The
current preset grants overly broad write permissions by allowing methods PUT,
PATCH, and DELETE for path "/**"; narrow these rules to least-privilege by
replacing or augmenting the allow entries for methods PUT, PATCH, DELETE with
path-scoped rules that target only the specific API resources that require
destructive actions (e.g., /users/**, /documents/**, /settings/**) rather than
the global "/**"; update the relevant entries (the allow entries with method:
PUT, method: PATCH, method: DELETE) to use the appropriate narrower paths or
remove them if not needed by the preset behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nemoclaw-blueprint/policies/presets/discord.yaml`:
- Around line 20-22: The current preset grants overly broad write permissions by
allowing methods PUT, PATCH, and DELETE for path "/**"; narrow these rules to
least-privilege by replacing or augmenting the allow entries for methods PUT,
PATCH, DELETE with path-scoped rules that target only the specific API resources
that require destructive actions (e.g., /users/**, /documents/**, /settings/**)
rather than the global "/**"; update the relevant entries (the allow entries
with method: PUT, method: PATCH, method: DELETE) to use the appropriate narrower
paths or remove them if not needed by the preset behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 406f313a-343d-4738-99e6-d4c50c86abe2

📥 Commits

Reviewing files that changed from the base of the PR and between b926698 and 48947d5.

📒 Files selected for processing (1)

nemoclaw-blueprint/policies/presets/discord.yaml

BenediktSchackenberg · 2026-03-19T20:19:08Z

Re: CodeRabbit's nitpick about scoping PUT/PATCH/DELETE to narrower paths —

I considered this but intentionally kept /**. Discord's REST API uses these methods across dozens of endpoints:

PATCH /api/v10/channels/{id}/messages/{id} (edit message)
DELETE /api/v10/channels/{id}/messages/{id} (delete message)
PUT /api/v10/channels/{id}/messages/{id}/reactions/{emoji}/@me (add reaction)
PATCH /api/v10/guilds/{id} (update guild)
DELETE /api/v10/invites/{code}
... and many more

Scoping to specific paths would create a fragile allowlist that breaks whenever Discord adds or restructures API routes. The existing presets in this repo (telegram, slack) also use /** for their allowed methods — this is consistent with the established pattern.

The real security boundary here is the host restriction to discord.com:443 — only traffic to Discord's API server is allowed regardless of method or path.

The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket connections when endpoints are configured with protocol:rest + tls:terminate. Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses HTTP-level timeouts entirely. Discord: - gateway.discord.gg → access:full (WebSocket gateway) - Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions) - Add media.discordapp.net for attachment access Slack: - Add wss-primary.slack.com and wss-backup.slack.com → access:full (Socket Mode WebSocket endpoints) Partially addresses NVIDIA#409 — the policy-level fix enables WebSocket connections to survive. The hardcoded 2-min timeout in openshell-sandbox still affects any protocol:rest endpoints with long-lived connections. Related: NVIDIA#361 (WhatsApp Web, same root cause)

BenediktSchackenberg · 2026-03-26T13:42:46Z

@cv any chance this could get a quick look? Small change (2 files), just switches the Discord/Slack WebSocket endpoints from HTTP proxy to CONNECT tunnel so the idle timeout stops killing the connections.

@brandonpelfrey

* fix: improve gateway lifecycle recovery (NVIDIA#953) * fix: improve gateway lifecycle recovery * docs: fix readme markdown list spacing * fix: tighten gateway lifecycle review follow-ups * fix: simplify tokenized control ui output * fix: restore chat route in control ui urls * refactor: simplify ansi stripping in onboard * fix: shorten control ui url output * fix: move control ui below cli next steps * fix: swap hard/soft ulimit settings in start script (NVIDIA#951) Fixes NVIDIA#949 Co-authored-by: KJ <kejones@nvidia.com> * chore: add cyclomatic complexity lint rule (NVIDIA#875) * chore: add cyclomatic complexity rule (ratchet from 95) Add ESLint complexity rule to bin/ and scripts/ to prevent new functions from accumulating excessive branching. Starting threshold is 95 (current worst offender: setupNim in onboard.js). Ratchet plan: 95 → 40 → 25 → 15. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: ratchet complexity to 20, suppress existing violations Suppress 6 functions that exceed the threshold with eslint-disable comments so we can start enforcing at 20 instead of 95: - setupNim (95), setupPolicies (41), setupInference (22) in onboard.js - deploy (22), main IIFE (27) in nemoclaw.js - applyPreset (24) in policies.js Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: suppress complexity for 3 missed functions preflight (23), getReconciledSandboxGatewayState (25), sandboxStatus (27) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add host-side config and state file locations to README (NVIDIA#903) Signed-off-by: peteryuqin <peter.yuqin@gmail.com> * chore: add tsconfig.cli.json, root execa, TS coverage ratchet (NVIDIA#913) * chore: add tsconfig.cli.json, root execa, TS coverage ratchet Foundation for the CLI TypeScript migration (PR 0 of the shell consolidation plan). No runtime changes — config, tooling, and dependency only. - tsconfig.cli.json: strict TS type-checking for bin/ and scripts/ (noEmit, module: preserve — tsx handles the runtime) - scripts/check-coverage-ratchet.ts: pure TS replacement for the bash+python coverage ratchet script (same logic, same tolerance) - execa ^9.6.1 added to root devDependencies (used by PR 1+) - pr.yaml: coverage ratchet step now runs the TS version via tsx - .pre-commit-config.yaml: SPDX headers cover scripts/*.ts, new tsc-check-cli pre-push hook - CONTRIBUTING.md: document typecheck:cli task and CLI pre-push hook - Delete scripts/check-coverage-ratchet.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestion from @brandonpelfrey * chore: address PR feedback — use types_or, add tsx devDep - Use `types_or: [ts, tsx]` instead of file glob for tsc-check-cli hook per @brandonpelfrey's suggestion. - Add `tsx` to devDependencies so CI doesn't re-fetch it on every run per CodeRabbit's suggestion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): ignore GitHub "Apply suggestion" commits in commitlint * fix(ci): lint only PR title since repo is squash-merge only Reverts the commitlint ignores rule from the previous commit and instead removes the per-commit lint step entirely. Individual commit messages are discarded at merge time — only the squash-merged PR title lands in main and drives changelog generation. Drop the per-commit lint, keep the PR title check, and remove the now-unnecessary fetch-depth: 0. * Revert "fix(ci): lint only PR title since repo is squash-merge only" This reverts commit 1257a47. * Revert "fix(ci): ignore GitHub "Apply suggestion" commits in commitlint" This reverts commit c395657. * docs: fix markdownlint MD032 in README (blank line before list) * refactor: make coverage ratchet script idiomatic TypeScript - Wrap in main() with process.exitCode instead of scattered process.exit() - Replace mutable flags with .map()/.some() over typed MetricResult[] - Separate pure logic (checkMetrics) from formatting (formatReport) - Throw with { cause } chaining instead of exit-in-helpers - Derive CoverageThresholds from METRICS tuple (single source of truth) - Exhaustive switch on CheckStatus discriminated union * refactor: remove duplication in coverage ratchet script - Drop STATUS_LABELS map; inline labels in exhaustive switch - Extract common 'metric coverage is N%' preamble in formatResult - Simplify ratchetedThresholds: use results directly (already in METRICS order) instead of re-scanning with .find() per metric - Compute 'failed' once in main, pass into formatReport to avoid duplicate .some() scan * refactor: simplify coverage ratchet with FP patterns - Extract classify() as a named pure function (replaces nested ternary) - loadJSON takes repo-relative paths, eliminating THRESHOLD_PATH and SUMMARY_PATH constants (DRY the join-with-REPO_ROOT pattern) - Drop CoverageMetric/CoverageSummary interfaces (only pct is read); use structural type at the call site instead - Inline ratchetedThresholds (one-liner, used once) - formatReport derives fail/improved from results instead of taking a pre-computed boolean (let functions derive from data, don't thread derived state) - sections.join("\n\n") replaces manual empty-string pushing - Shorter type names (Thresholds, Status, Result) — no ambiguity in a single-purpose script * refactor: strip coverage ratchet to failure-only output prek hides output from commands that exit 0, so ok/improved reporting was dead code. Remove Status, Result, classify, formatResult, formatReport, and the ratcheted-thresholds suggestion block. The script now just filters for regressions and prints actionable errors on failure. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com> * fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets (NVIDIA#438) * fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket connections when endpoints are configured with protocol:rest + tls:terminate. Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses HTTP-level timeouts entirely. Discord: - gateway.discord.gg → access:full (WebSocket gateway) - Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions) - Add media.discordapp.net for attachment access Slack: - Add wss-primary.slack.com and wss-backup.slack.com → access:full (Socket Mode WebSocket endpoints) Partially addresses NVIDIA#409 — the policy-level fix enables WebSocket connections to survive. The hardcoded 2-min timeout in openshell-sandbox still affects any protocol:rest endpoints with long-lived connections. Related: NVIDIA#361 (WhatsApp Web, same root cause) * fix: correct comment wording for media endpoint and YAML formatting * fix: standardize Node.js minimum version to 22.16 (NVIDIA#840) * fix: remove unused RECOMMENDED_NODE_MAJOR from scripts/install.sh Shellcheck flagged it as unused after the min/recommended merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: enforce full semver >=22.16.0 in installer scripts The runtime checks only compared the major Node.js version, allowing 22.0–22.15 to pass despite package.json requiring >=22.16.0. Use the version_gte() helper for full semver comparison in both installers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden version_gte and align fallback message Guard version_gte() against prerelease suffixes (e.g. "22.16.0-rc.1") that would crash bash arithmetic. Also update the manual-install fallback message to reference MIN_NODE_VERSION instead of hardcoded "22". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update test stubs for Node.js 22.16 minimum and add Node 20 rejection test - Bump node stub in 'succeeds with acceptable Node.js' from v20.0.0 to v22.16.0 - Bump node stub in buildCurlPipeEnv from v22.14.0 to v22.16.0 - Add new test asserting Node.js 20 is rejected by ensure_supported_runtime --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: peteryuqin <peter.yuqin@gmail.com> Co-authored-by: KJ <kejones@nvidia.com> Co-authored-by: Emily Wilkins <80470879+epwilkins@users.noreply.github.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Peter <peter.yuqin@gmail.com> Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com> Co-authored-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

@brandonpelfrey

* fix: improve gateway lifecycle recovery (NVIDIA#953) * fix: improve gateway lifecycle recovery * docs: fix readme markdown list spacing * fix: tighten gateway lifecycle review follow-ups * fix: simplify tokenized control ui output * fix: restore chat route in control ui urls * refactor: simplify ansi stripping in onboard * fix: shorten control ui url output * fix: move control ui below cli next steps * fix: swap hard/soft ulimit settings in start script (NVIDIA#951) Fixes NVIDIA#949 Co-authored-by: KJ <kejones@nvidia.com> * chore: add cyclomatic complexity lint rule (NVIDIA#875) * chore: add cyclomatic complexity rule (ratchet from 95) Add ESLint complexity rule to bin/ and scripts/ to prevent new functions from accumulating excessive branching. Starting threshold is 95 (current worst offender: setupNim in onboard.js). Ratchet plan: 95 → 40 → 25 → 15. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: ratchet complexity to 20, suppress existing violations Suppress 6 functions that exceed the threshold with eslint-disable comments so we can start enforcing at 20 instead of 95: - setupNim (95), setupPolicies (41), setupInference (22) in onboard.js - deploy (22), main IIFE (27) in nemoclaw.js - applyPreset (24) in policies.js Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: suppress complexity for 3 missed functions preflight (23), getReconciledSandboxGatewayState (25), sandboxStatus (27) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add host-side config and state file locations to README (NVIDIA#903) Signed-off-by: peteryuqin <peter.yuqin@gmail.com> * chore: add tsconfig.cli.json, root execa, TS coverage ratchet (NVIDIA#913) * chore: add tsconfig.cli.json, root execa, TS coverage ratchet Foundation for the CLI TypeScript migration (PR 0 of the shell consolidation plan). No runtime changes — config, tooling, and dependency only. - tsconfig.cli.json: strict TS type-checking for bin/ and scripts/ (noEmit, module: preserve — tsx handles the runtime) - scripts/check-coverage-ratchet.ts: pure TS replacement for the bash+python coverage ratchet script (same logic, same tolerance) - execa ^9.6.1 added to root devDependencies (used by PR 1+) - pr.yaml: coverage ratchet step now runs the TS version via tsx - .pre-commit-config.yaml: SPDX headers cover scripts/*.ts, new tsc-check-cli pre-push hook - CONTRIBUTING.md: document typecheck:cli task and CLI pre-push hook - Delete scripts/check-coverage-ratchet.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestion from @brandonpelfrey * chore: address PR feedback — use types_or, add tsx devDep - Use `types_or: [ts, tsx]` instead of file glob for tsc-check-cli hook per @brandonpelfrey's suggestion. - Add `tsx` to devDependencies so CI doesn't re-fetch it on every run per CodeRabbit's suggestion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): ignore GitHub "Apply suggestion" commits in commitlint * fix(ci): lint only PR title since repo is squash-merge only Reverts the commitlint ignores rule from the previous commit and instead removes the per-commit lint step entirely. Individual commit messages are discarded at merge time — only the squash-merged PR title lands in main and drives changelog generation. Drop the per-commit lint, keep the PR title check, and remove the now-unnecessary fetch-depth: 0. * Revert "fix(ci): lint only PR title since repo is squash-merge only" This reverts commit 1257a47. * Revert "fix(ci): ignore GitHub "Apply suggestion" commits in commitlint" This reverts commit c395657. * docs: fix markdownlint MD032 in README (blank line before list) * refactor: make coverage ratchet script idiomatic TypeScript - Wrap in main() with process.exitCode instead of scattered process.exit() - Replace mutable flags with .map()/.some() over typed MetricResult[] - Separate pure logic (checkMetrics) from formatting (formatReport) - Throw with { cause } chaining instead of exit-in-helpers - Derive CoverageThresholds from METRICS tuple (single source of truth) - Exhaustive switch on CheckStatus discriminated union * refactor: remove duplication in coverage ratchet script - Drop STATUS_LABELS map; inline labels in exhaustive switch - Extract common 'metric coverage is N%' preamble in formatResult - Simplify ratchetedThresholds: use results directly (already in METRICS order) instead of re-scanning with .find() per metric - Compute 'failed' once in main, pass into formatReport to avoid duplicate .some() scan * refactor: simplify coverage ratchet with FP patterns - Extract classify() as a named pure function (replaces nested ternary) - loadJSON takes repo-relative paths, eliminating THRESHOLD_PATH and SUMMARY_PATH constants (DRY the join-with-REPO_ROOT pattern) - Drop CoverageMetric/CoverageSummary interfaces (only pct is read); use structural type at the call site instead - Inline ratchetedThresholds (one-liner, used once) - formatReport derives fail/improved from results instead of taking a pre-computed boolean (let functions derive from data, don't thread derived state) - sections.join("\n\n") replaces manual empty-string pushing - Shorter type names (Thresholds, Status, Result) — no ambiguity in a single-purpose script * refactor: strip coverage ratchet to failure-only output prek hides output from commands that exit 0, so ok/improved reporting was dead code. Remove Status, Result, classify, formatResult, formatReport, and the ratcheted-thresholds suggestion block. The script now just filters for regressions and prints actionable errors on failure. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com> * fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets (NVIDIA#438) * fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket connections when endpoints are configured with protocol:rest + tls:terminate. Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses HTTP-level timeouts entirely. Discord: - gateway.discord.gg → access:full (WebSocket gateway) - Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions) - Add media.discordapp.net for attachment access Slack: - Add wss-primary.slack.com and wss-backup.slack.com → access:full (Socket Mode WebSocket endpoints) Partially addresses NVIDIA#409 — the policy-level fix enables WebSocket connections to survive. The hardcoded 2-min timeout in openshell-sandbox still affects any protocol:rest endpoints with long-lived connections. Related: NVIDIA#361 (WhatsApp Web, same root cause) * fix: correct comment wording for media endpoint and YAML formatting * fix: standardize Node.js minimum version to 22.16 (NVIDIA#840) * fix: remove unused RECOMMENDED_NODE_MAJOR from scripts/install.sh Shellcheck flagged it as unused after the min/recommended merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: enforce full semver >=22.16.0 in installer scripts The runtime checks only compared the major Node.js version, allowing 22.0–22.15 to pass despite package.json requiring >=22.16.0. Use the version_gte() helper for full semver comparison in both installers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden version_gte and align fallback message Guard version_gte() against prerelease suffixes (e.g. "22.16.0-rc.1") that would crash bash arithmetic. Also update the manual-install fallback message to reference MIN_NODE_VERSION instead of hardcoded "22". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update test stubs for Node.js 22.16 minimum and add Node 20 rejection test - Bump node stub in 'succeeds with acceptable Node.js' from v20.0.0 to v22.16.0 - Bump node stub in buildCurlPipeEnv from v22.14.0 to v22.16.0 - Add new test asserting Node.js 20 is rejected by ensure_supported_runtime --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden installer and onboard resiliency (NVIDIA#961) * fix: harden installer and onboard resiliency * fix: address installer and debug review follow-ups * fix: harden onboard resume across later setup steps * test: simplify payload extraction in onboard tests * test: keep onboard payload extraction target-compatible * chore: align onboard session lint with complexity rule * fix: harden onboard session safety and lock handling * fix: tighten onboard session redaction and metadata handling * fix(security): strip credentials from migration snapshots and enforce blueprint digest (NVIDIA#769) Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution: - Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743) - Recursive stripCredentials() with pattern-based field detection for deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set + CREDENTIAL_FIELD_PATTERN regex) - Remove gateway config section (contains auth tokens) from sandbox openclaw.json - Blueprint digest verification (SHA-256): recorded at snapshot time, validated on restore, empty/missing digest is a hard failure - computeFileDigest() throws when blueprint file is missing instead of silently returning null - Sanitize both snapshot-level and sandbox-bundle openclaw.json copies - Backward compatible: old snapshots without blueprintDigest skip validation - Bump SNAPSHOT_VERSION 2 → 3 Supersedes NVIDIA#156 and NVIDIA#743. * fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects (NVIDIA#1025) * fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing inference.local and the gateway IP (10.200.0.1). This causes LLM inference requests to route through the egress proxy instead of going direct, and the proxy gateway IP itself gets proxied. Add proxy configuration block to nemoclaw-start.sh that: - Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and the gateway IP included - Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile (non-root fallback) so values survive OpenShell reconnect injection - Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides The non-root fallback ensures the fix works in environments like Brev where containers run without root privileges. Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains inference.local and 10.200.0.1 inside the live sandbox after connect. Ref: NVIDIA#626, NVIDIA#704 Ref: NVIDIA#704 (comment) * fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login), which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*. The previous approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live interactive sessions. Changes: - Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback) - Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy, HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy over uppercase NO_PROXY when both are set - Add idempotency guard to prevent duplicate blocks on container restart - Update tests: verify .bashrc writing, idempotency, bash -i override behavior, and lowercase variant correctness Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy + re-onboard + live `env | grep proxy` verification inside the sandbox shell via `openshell sandbox connect`. Ref: NVIDIA#626 * fix(sandbox): replace stale proxy values on restart with begin/end markers Use begin/end markers in .bashrc/.profile proxy snippet so _write_proxy_snippet replaces the block when PROXY_HOST/PORT change instead of silently keeping stale values. Adds test coverage for the replacement path. Addresses CodeRabbit review feedback on idempotency gap. * fix(sandbox): resolve sandbox user home dynamically when running as root When the entrypoint runs as root, $HOME is /root — the proxy snippet was written to /root/.bashrc instead of the sandbox user's home. Use getent passwd to look up the sandbox user's home when running as UID 0; fall back to /sandbox if the user entry is missing. Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution. --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com> * fix(policies): preset application for versionless policies (Fixes NVIDIA#35) (NVIDIA#101) * fix(policies): allow preset application for versionless policies (Fixes NVIDIA#35) Fixes NVIDIA#35 Signed-off-by: Deepak Jain <deepujain@gmail.com> * fix: remove stale complexity suppression in policies --------- Signed-off-by: Deepak Jain <deepujain@gmail.com> Co-authored-by: Kevin Jones <kejones@nvidia.com> * fix: restore routed inference and connect UX (NVIDIA#1037) * fix: restore routed inference and connect UX * fix: simplify detected local inference hint * fix: remove stale local inference hint * test: relax connect forward assertion --------- Signed-off-by: peteryuqin <peter.yuqin@gmail.com> Signed-off-by: Deepak Jain <deepujain@gmail.com> Co-authored-by: KJ <kejones@nvidia.com> Co-authored-by: Emily Wilkins <80470879+epwilkins@users.noreply.github.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Peter <peter.yuqin@gmail.com> Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com> Co-authored-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com> Co-authored-by: Lucas Wang <lucas_wang@lucas-futures.com> Co-authored-by: senthilr-nv <senthilr@nvidia.com> Co-authored-by: Deepak Jain <deepujain@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…ets (#438) * fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket connections when endpoints are configured with protocol:rest + tls:terminate. Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses HTTP-level timeouts entirely. Discord: - gateway.discord.gg → access:full (WebSocket gateway) - Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions) - Add media.discordapp.net for attachment access Slack: - Add wss-primary.slack.com and wss-backup.slack.com → access:full (Socket Mode WebSocket endpoints) Partially addresses #409 — the policy-level fix enables WebSocket connections to survive. The hardcoded 2-min timeout in openshell-sandbox still affects any protocol:rest endpoints with long-lived connections. Related: #361 (WhatsApp Web, same root cause) * fix: correct comment wording for media endpoint and YAML formatting

…ets (NVIDIA#438) * fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket connections when endpoints are configured with protocol:rest + tls:terminate. Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses HTTP-level timeouts entirely. Discord: - gateway.discord.gg → access:full (WebSocket gateway) - Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions) - Add media.discordapp.net for attachment access Slack: - Add wss-primary.slack.com and wss-backup.slack.com → access:full (Socket Mode WebSocket endpoints) Partially addresses NVIDIA#409 — the policy-level fix enables WebSocket connections to survive. The hardcoded 2-min timeout in openshell-sandbox still affects any protocol:rest endpoints with long-lived connections. Related: NVIDIA#361 (WhatsApp Web, same root cause) * fix: correct comment wording for media endpoint and YAML formatting

Copilot AI review requested due to automatic review settings March 19, 2026 20:08

BenediktSchackenberg mentioned this pull request Mar 19, 2026

[Bug] openshell-sandbox egress proxy kills WebSocket connections after ~2 minutes — Discord (and other WS-based plugins) cannot run inside sandbox #409

Open

Copilot started reviewing on behalf of BenediktSchackenberg March 19, 2026 20:09 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

psorensen-nvidia added the status: triage For new items that haven't been reviewed yet. label Mar 19, 2026

psorensen-nvidia self-assigned this Mar 19, 2026

psorensen-nvidia added the priority: medium Issue that should be addressed in upcoming releases label Mar 19, 2026

psorensen-nvidia removed their assignment Mar 19, 2026

psorensen-nvidia added bug Something isn't working enhancement: provider Use this label to identify requests to add a new AI provider to NemoClaw. labels Mar 19, 2026

wscurran added Integration: Discord Use this label to identify Discord bot integration issues with NemoClaw. Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. labels Mar 20, 2026

BenediktSchackenberg force-pushed the fix/discord-websocket-policy branch from 48947d5 to 4bf83ea Compare March 24, 2026 20:30

BenediktSchackenberg added 2 commits March 25, 2026 20:31

fix: correct comment wording for media endpoint and YAML formatting

8ee9ec0

BenediktSchackenberg force-pushed the fix/discord-websocket-policy branch from 4bf83ea to 8ee9ec0 Compare March 25, 2026 20:31

Merge branch 'main' into fix/discord-websocket-policy

6d5a76d

Merge branch 'main' into fix/discord-websocket-policy

2d8cd7a

cv approved these changes Mar 26, 2026

View reviewed changes

cv merged commit 6d44def into NVIDIA:main Mar 26, 2026
8 checks passed

BenediktSchackenberg mentioned this pull request Apr 14, 2026

CONNECT tunnel blocked (403) for policy-allowlisted hosts #1798

Open

2 tasks

	# WebSocket gateway — must use access:full (CONNECT tunnel) instead
	# WebSocket gateway — must use access: full (CONNECT tunnel) instead

Conversation

BenediktSchackenberg commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

Security Considerations

What This Doesn't Fix

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

BenediktSchackenberg Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

BenediktSchackenberg Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

BenediktSchackenberg commented Mar 19, 2026

Uh oh!

BenediktSchackenberg commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BenediktSchackenberg commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading