Skip to content

feat: auto-start OpenClaw gateway and port forwarding in nemoclaw start#310

Closed
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/gateway-auto-start
Closed

feat: auto-start OpenClaw gateway and port forwarding in nemoclaw start#310
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/gateway-auto-start

Conversation

@cluster2600
Copy link
Copy Markdown
Contributor

@cluster2600 cluster2600 commented Mar 18, 2026

Fixes #311

Summary

nemoclaw start now automatically starts the OpenClaw gateway inside the sandbox and forwards port 18789 to the host. This enables external dashboards (e.g. Mission Control) to connect via WebSocket without manual setup.

See #311 for the full problem description.

Changes

  • scripts/start-services.sh: add openclaw-gateway and gateway-forward as managed services
  • Status display updated to show gateway and port forward state
  • Uses StrictHostKeyChecking=accept-new (not =no) for sandbox SSH
  • nemoclaw stop cleans up gateway and forward alongside existing services

Test plan

  • nemoclaw start starts gateway and port forward
  • nemoclaw status shows gateway and forward status
  • nemoclaw stop stops all services including gateway
  • Port 18789 reachable from host after start
  • Mission Control connects via ws://localhost:18789

Summary by CodeRabbit

  • New Features

    • OpenClaw gateway with host-accessible dashboard via sandbox port forwarding; startup is idempotent and supports token-based auth.
    • Gateway and port-forwarding report health as “healthy on port” in status and startup banner.
    • Start/stop now reliably orchestrate gateway and port-forwarding inside the sandbox and persist sandbox selection for consistent stops.
  • Chores

    • Improved initialization and graceful fallback so other services continue when no sandbox is available; status reflects gateway/port-forwarding state.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds OpenClaw gateway and gateway-forward management to the startup script: sandbox resolution and persistence, name validation, in-sandbox gateway run/stop via SSH ProxyCommand with token, remote HTTP health probes for gateway, idempotent port-forward setup, and status/banner updates reporting gateway health and forwarding state.

Changes

Cohort / File(s) Summary
Gateway & startup script
scripts/start-services.sh
Adds SANDBOX_STATE_FILE, resolve_sandbox() and validate_name(); implements openclaw-gateway and gateway-forward service handling: in-sandbox openclaw gateway run/stop via openshell ssh-proxy ProxyCommand, remote HTTP health checks for dashboard port, idempotent start/stop, persists resolved sandbox name, updates status/banner, and expands stop/start orchestration.

Sequence Diagram(s)

sequenceDiagram
participant Host as Host (nemoclaw)
participant Openshell as Openshell (ssh-proxy)
participant Sandbox as Sandbox (guest)
participant Gateway as OpenClaw Gateway
participant Client as Dashboard

Host->>Openshell: establish ProxyCommand (ssh-proxy)
Host->>Openshell: ssh ProxyCommand -> run "openclaw gateway run --token <TOKEN>"
Openshell->>Sandbox: deliver command
Sandbox->>Gateway: start gateway (listen 18789)
Host->>Openshell: start SSH port-forward host:PORT -> sandbox:18789
Client->>Host: connect to host:PORT
Host->>Openshell: forward connection
Openshell->>Gateway: relay client requests
Host->>Openshell: probe http://127.0.0.1:PORT/ for health
Openshell->>Host: return health status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nudge the tunnel, whisper the token bright,

Loopback paths align beneath the night,
SSH paws pat the sandbox door,
Gateway hums and carrots pour,
I hop — the dashboard beams alight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: auto-starting OpenClaw gateway and port forwarding in nemoclaw start, which aligns directly with the changeset.
Linked Issues check ✅ Passed The PR fully implements the three coding objectives from issue #311: auto-start gateway in nemoclaw start, display gateway/forward state in nemoclaw status, and cleanup in nemoclaw stop.
Out of Scope Changes check ✅ Passed All changes in scripts/start-services.sh are directly scoped to implementing gateway auto-start, port forwarding, and health checks as specified in issue #311, with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/start-services.sh`:
- Around line 151-162: The gateway token is being inlined into the remote SSH
command via the gw_token variable in the start_service call (openclaw-gateway)
which leaks it to the host process list and risks shell re-parsing; instead
change start_service invocation so the token is delivered securely (e.g. write
the token to a protected file inside the sandbox or pipe it over stdin) and have
the remote command read it from that protected location or from stdin before
starting the gateway; update the ssh invocation that currently sets
"OPENCLAW_GATEWAY_TOKEN=$gw_token openclaw gateway run" to a form that does not
interpolate gw_token into the remote command (use a secure file transfer then
remote read, or ssh with stdin and remote "read token && export
OPENCLAW_GATEWAY_TOKEN=$token && openclaw gateway run") while still using the
same sandbox variable and service name (openclaw-gateway / start_service).
- Around line 148-150: The current snippet always picks the first sandbox from
openshell instead of honoring a requested sandbox; change the logic around the
sandbox variable so that if a sandbox was explicitly provided (via the --sandbox
CLI option or the NEMOCLAW_SANDBOX env var) you use that value, otherwise fall
back to auto-discovery with sandbox="$(openshell sandbox list --names
2>/dev/null | head -1)". Update any code that references the sandbox variable
(the local variable named sandbox and the openshell invocation) to only call
openshell when sandbox is empty/unset so PID files and status remain keyed to
the requested sandbox.
- Around line 169-177: The SSH tunnel in the start_service gateway-forward call
binds the local forwarded port to 0.0.0.0 (exposing it on all interfaces);
change the -L argument in the gateway-forward ssh invocation from
"0.0.0.0:$DASHBOARD_PORT:127.0.0.1:$DASHBOARD_PORT" to bind to loopback (e.g.
"127.0.0.1:$DASHBOARD_PORT:127.0.0.1:$DASHBOARD_PORT") so the dashboard is only
accessible via localhost to match the documented default URLs; if external
access is required, leave a comment and use the existing cloudflared tunnel path
instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a1a8067b-23ca-40ab-b609-ea7c0d29a9d4

📥 Commits

Reviewing files that changed from the base of the PR and between 1e23347 and 42ae578.

📒 Files selected for processing (1)
  • scripts/start-services.sh

Comment thread scripts/start-services.sh Outdated
Comment thread scripts/start-services.sh Outdated
Comment thread scripts/start-services.sh
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/start-services.sh (1)

179-184: ⚠️ Potential issue | 🟠 Major

Avoid interpolating gw_token inside the bash -c command string.

At Line 179, echo '$gw_token' is still shell-interpolated; a token containing ' can break quoting and execute unintended local shell fragments.

Proposed fix
-        start_service openclaw-gateway \
-          bash -c "echo '$gw_token' | ssh -o 'ProxyCommand=$proxy_cmd' \
+        start_service openclaw-gateway \
+          env GW_TOKEN="$gw_token" bash -lc "printf '%s\n' \"\$GW_TOKEN\" | ssh -o 'ProxyCommand=$proxy_cmd' \
               -o StrictHostKeyChecking=accept-new \
               -o UserKnownHostsFile=/dev/null \
               -o LogLevel=ERROR \
               sandbox@openshell-$sandbox \
-              'read token && export OPENCLAW_GATEWAY_TOKEN=\$token && exec openclaw gateway run'"
+              'read -r token && export OPENCLAW_GATEWAY_TOKEN=\$token && exec openclaw gateway run'"
#!/bin/bash
set -euo pipefail

# 1) Confirm current interpolation pattern in repo
rg -n "echo '\\$gw_token' \\| ssh" scripts/start-services.sh

# 2) Harmless PoC demonstrating quote-break risk from a single quote in token
gw_token="abc' ; printf 'INJECTION_TRIGGERED\n' ; : '"
cmd="echo '$gw_token' | cat >/dev/null"
printf 'Constructed command:\n%s\n' "$cmd"
bash -c "$cmd"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-services.sh` around lines 179 - 184, The current bash -c line
interpolates gw_token into the command string (echo '$gw_token') which can be
broken by a single quote in the token; fix by not embedding the token in the
bash -c string and instead pipe the token from the local shell into ssh (for
example use printf '%s' "$gw_token" | ssh ...), keep the remote command as a
single-quoted literal 'read token && export OPENCLAW_GATEWAY_TOKEN=$token &&
exec openclaw gateway run' and ensure ProxyCommand is passed safely (use -o
ProxyCommand="$proxy_cmd" rather than embedding it inside single quotes) so no
local interpolation of gw_token happens.
🧹 Nitpick comments (1)
scripts/start-services.sh (1)

63-66: Gateway health detection is tightly coupled to tunnel state.

Because Line 64 probes 127.0.0.1:$DASHBOARD_PORT, a broken gateway-forward makes openclaw-gateway look stopped and can trigger unnecessary gateway restarts in Line 175-186.

Consider decoupling checks (e.g., probe gateway directly in-sandbox via openshell SSH, or ensure forward is up before using local health probe for start decisions).

Also applies to: 175-202

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-services.sh` around lines 63 - 66, The gateway health probe in
start-services.sh currently curls 127.0.0.1:$DASHBOARD_PORT (inside the
openclaw-gateway branch) which ties health to the local tunnel state; modify the
check so it either 1) probes the gateway from inside its sandbox using the
openshell/SSH helper (run the curl/nc against 127.0.0.1:$DASHBOARD_PORT inside
the gateway container) or 2) first verify the gateway-forward tunnel is
established (e.g., check the gateway-forward status/port-forwarding
process/socket) before using the local curl; update the openclaw-gateway branch
that uses DASHBOARD_PORT to perform the in-sandbox probe or the tunnel-existence
guard so gateway restarts are not triggered by a broken forward.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/start-services.sh`:
- Around line 180-182: The SSH options currently set "-o
StrictHostKeyChecking=accept-new" while also using "-o
UserKnownHostsFile=/dev/null", which prevents persistence of host keys; update
the script (scripts/start-services.sh) to replace UserKnownHostsFile=/dev/null
with a persistent per-service known_hosts file (e.g., a path derived from the
service name or a temp dir per service) so that changed-key detection is
preserved, keep "-o StrictHostKeyChecking=accept-new" and "-o LogLevel=ERROR"
as-is, and ensure the script creates the per-service known_hosts file with
correct permissions before invoking SSH.

---

Duplicate comments:
In `@scripts/start-services.sh`:
- Around line 179-184: The current bash -c line interpolates gw_token into the
command string (echo '$gw_token') which can be broken by a single quote in the
token; fix by not embedding the token in the bash -c string and instead pipe the
token from the local shell into ssh (for example use printf '%s' "$gw_token" |
ssh ...), keep the remote command as a single-quoted literal 'read token &&
export OPENCLAW_GATEWAY_TOKEN=$token && exec openclaw gateway run' and ensure
ProxyCommand is passed safely (use -o ProxyCommand="$proxy_cmd" rather than
embedding it inside single quotes) so no local interpolation of gw_token
happens.

---

Nitpick comments:
In `@scripts/start-services.sh`:
- Around line 63-66: The gateway health probe in start-services.sh currently
curls 127.0.0.1:$DASHBOARD_PORT (inside the openclaw-gateway branch) which ties
health to the local tunnel state; modify the check so it either 1) probes the
gateway from inside its sandbox using the openshell/SSH helper (run the curl/nc
against 127.0.0.1:$DASHBOARD_PORT inside the gateway container) or 2) first
verify the gateway-forward tunnel is established (e.g., check the
gateway-forward status/port-forwarding process/socket) before using the local
curl; update the openclaw-gateway branch that uses DASHBOARD_PORT to perform the
in-sandbox probe or the tunnel-existence guard so gateway restarts are not
triggered by a broken forward.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 15a88475-c58f-4af2-9a39-2e6bcd505433

📥 Commits

Reviewing files that changed from the base of the PR and between 42ae578 and c3af06d.

📒 Files selected for processing (1)
  • scripts/start-services.sh

Comment thread scripts/start-services.sh
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/start-services.sh (1)

179-185: ⚠️ Potential issue | 🟠 Major

bash -c still exposes and misquotes the gateway token.

Embedding echo '$gw_token' inside the bash -c string still leaves the secret visible in the host process argv, and a token containing ' will break the generated shell command. Feed ssh from stdin or a protected file descriptor without interpolating the token into a shell string.

#!/bin/bash
set -euo pipefail

safe_token='secret-123'
bash -c "printf '%s\n' '$safe_token' | sleep 5" &
pid=$!
trap 'kill "$pid" 2>/dev/null || true; wait "$pid" 2>/dev/null || true' EXIT

echo "Process args still contain the token:"
ps -o pid=,args= -p "$pid"

echo
quoted_token=$'abc'\''def'
cmd="printf '%s\n' '$quoted_token' | cat >/dev/null"
echo "Generated command:"
printf '%s\n' "$cmd"
echo
echo "Parser result:"
if bash -n <<<"$cmd"; then
  echo "unexpected: parsed cleanly"
else
  echo "expected: single quote in token breaks the command"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-services.sh` around lines 179 - 185, The current approach
embeds the secret into a bash -c string (start_service openclaw-gateway using
echo '$gw_token') exposing it in process argv and breaking on quotes; instead
pipe the token into ssh's stdin so the remote side reads it (e.g., use printf
'%s\n' "$gw_token" | ssh -o "ProxyCommand=$proxy_cmd" -o
StrictHostKeyChecking=accept-new -o UserKnownHostsFile="$known_hosts_file" -o
LogLevel=ERROR sandbox@openshell-$sandbox 'read -r token; export
OPENCLAW_GATEWAY_TOKEN="$token"; exec openclaw gateway run'), removing the bash
-c wrapper and any interpolation of $gw_token into the local command string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/start-services.sh`:
- Around line 133-134: The current stop flow only kills the local wrapper PID
(stop_service calls) so a daemonized sandbox process started by "openclaw
gateway run" can remain alive; change stop_service and the shutdown sequence for
"gateway-forward" and "openclaw-gateway" to either (A) persist and read a
sandbox-side pid/lock file (e.g., /var/run/openclaw-gateway.pid) over SSH and
kill that PID on the sandbox, or (B) invoke the native "openclaw gateway stop"
command on the sandbox via SSH to perform a clean shutdown; apply the same fix
to the other shutdown block referenced (the block covering lines 175-186) so
both uses ensure the sandbox daemon itself is stopped rather than only removing
the local wrapper PID.
- Around line 63-65: The health check for openclaw-gateway currently probes host
localhost and can be misled on multi-sandbox hosts; change the check inside the
if [ "$name" = "openclaw-gateway" ] branch to run the HTTP probe through the
selected sandbox (i.e., invoke curl via the repository's sandbox
runner/namespace command used elsewhere so the request targets the sandboxed
network namespace) against the dashboard endpoint on $DASHBOARD_PORT, and leave
the host-level port probe only in the gateway-forward logic; update the command
that currently calls curl -s -o /dev/null ... to the sandbox-scoped equivalent
so gateway health reflects the chosen sandbox.

---

Duplicate comments:
In `@scripts/start-services.sh`:
- Around line 179-185: The current approach embeds the secret into a bash -c
string (start_service openclaw-gateway using echo '$gw_token') exposing it in
process argv and breaking on quotes; instead pipe the token into ssh's stdin so
the remote side reads it (e.g., use printf '%s\n' "$gw_token" | ssh -o
"ProxyCommand=$proxy_cmd" -o StrictHostKeyChecking=accept-new -o
UserKnownHostsFile="$known_hosts_file" -o LogLevel=ERROR
sandbox@openshell-$sandbox 'read -r token; export
OPENCLAW_GATEWAY_TOKEN="$token"; exec openclaw gateway run'), removing the bash
-c wrapper and any interpolation of $gw_token into the local command string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 21c4bddd-6061-4059-b6a9-52cb3ac76209

📥 Commits

Reviewing files that changed from the base of the PR and between c3af06d and 3d4f5e9.

📒 Files selected for processing (1)
  • scripts/start-services.sh

Comment thread scripts/start-services.sh Outdated
Comment thread scripts/start-services.sh
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/start-services.sh (1)

77-79: ⚠️ Potential issue | 🟡 Minor

already running message may fail for gateway service.

When is_running "openclaw-gateway" returns true (HTTP check passes), line 78 still tries to cat "$PIDDIR/$name.pid". Since the SSH wrapper process exits after launching the gateway, the PID file may be stale or reference a dead process. This causes a confusing/incorrect "already running (PID ...)" message.

Consider special-casing the message for openclaw-gateway:

🔧 Suggested fix
 start_service() {
   local name="$1"
   shift
   if is_running "$name"; then
-    info "$name already running (PID $(cat "$PIDDIR/$name.pid"))"
+    if [ "$name" = "openclaw-gateway" ]; then
+      info "$name already running (port $DASHBOARD_PORT healthy)"
+    else
+      info "$name already running (PID $(cat "$PIDDIR/$name.pid"))"
+    fi
     return 0
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-services.sh` around lines 77 - 79, The "already running (PID
...)" message can be wrong for the openclaw-gateway because its SSH wrapper
exits and the PID file may be stale; update the block guarded by is_running to
special-case the gateway: if name == "openclaw-gateway" print a simple
"openclaw-gateway already running" message (or omit the PID), otherwise continue
to read and display "$PIDDIR/$name.pid"; alternatively first verify the PID file
exists and the PID refers to a live process before using cat to include it in
the info message (use is_running and PIDDIR/$name.pid to validate).
🧹 Nitpick comments (1)
scripts/start-services.sh (1)

278-286: Minor: Banner alignment assumes 5-digit port.

The hardcoded spacing in lines 279 and 285 assumes $DASHBOARD_PORT is always 5 digits (like 18789). If a user sets a 4-digit port (e.g., 8080), the box alignment will be off by one character.

This is cosmetic and low priority, but could be addressed with printf and field widths if desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-services.sh` around lines 278 - 286, The banner hardcodes
spacing for $DASHBOARD_PORT causing misalignment for ports with fewer than 5
digits; update the echo lines that print the gateway and port-forward status
(the blocks using is_running, the literal strings showing "Gateway:     healthy
(port $DASHBOARD_PORT)" and "Port fwd:    $DASHBOARD_PORT → sandbox") to use
printf with a fixed field width or compute padding dynamically so the port is
printed in a fixed-width column (e.g., "%5s" or similar) ensuring the
surrounding box characters stay aligned regardless of DASHBOARD_PORT length.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@scripts/start-services.sh`:
- Around line 77-79: The "already running (PID ...)" message can be wrong for
the openclaw-gateway because its SSH wrapper exits and the PID file may be
stale; update the block guarded by is_running to special-case the gateway: if
name == "openclaw-gateway" print a simple "openclaw-gateway already running"
message (or omit the PID), otherwise continue to read and display
"$PIDDIR/$name.pid"; alternatively first verify the PID file exists and the PID
refers to a live process before using cat to include it in the info message (use
is_running and PIDDIR/$name.pid to validate).

---

Nitpick comments:
In `@scripts/start-services.sh`:
- Around line 278-286: The banner hardcodes spacing for $DASHBOARD_PORT causing
misalignment for ports with fewer than 5 digits; update the echo lines that
print the gateway and port-forward status (the blocks using is_running, the
literal strings showing "Gateway:     healthy (port $DASHBOARD_PORT)" and "Port
fwd:    $DASHBOARD_PORT → sandbox") to use printf with a fixed field width or
compute padding dynamically so the port is printed in a fixed-width column
(e.g., "%5s" or similar) ensuring the surrounding box characters stay aligned
regardless of DASHBOARD_PORT length.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cbab9ca6-c03d-4d48-9596-cd0c4b1d0531

📥 Commits

Reviewing files that changed from the base of the PR and between 3d4f5e9 and 15f3a6f.

📒 Files selected for processing (1)
  • scripts/start-services.sh

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/start-services.sh (1)

63-65: ⚠️ Potential issue | 🟠 Major

Make openclaw-gateway health sandbox-scoped, not host-port-scoped.

Line 64 probes host localhost, so any process listening on 18789 can falsely mark the gateway healthy and suppress startup.

Suggested fix
 is_running() {
   local name="$1"
   local pidfile="$PIDDIR/$name.pid"

   # For openclaw-gateway, check the actual port instead of PID.
   # The SSH session that started the gateway exits after launch,
   # but the gateway process inside the sandbox stays alive.
   if [ "$name" = "openclaw-gateway" ]; then
-    curl -s -o /dev/null -w '' --max-time 2 "http://127.0.0.1:$DASHBOARD_PORT/" 2>/dev/null
-    return $?
+    local sandbox gateway_name proxy_cmd
+    sandbox="$(resolve_sandbox)"
+    [ -n "$sandbox" ] || return 1
+    gateway_name="$(openshell gateway info 2>/dev/null | grep -oP 'Gateway:\s+\K\S+' || echo 'openshell')"
+    printf -v proxy_cmd 'openshell ssh-proxy --gateway-name %q --name %q' "$gateway_name" "$sandbox"
+    ssh -o "ProxyCommand=$proxy_cmd" \
+        -o StrictHostKeyChecking=accept-new \
+        -o UserKnownHostsFile="$PIDDIR/openshell-known_hosts" \
+        -o LogLevel=ERROR \
+        -o ConnectTimeout=5 \
+        sandbox@"openshell-$sandbox" \
+        "curl -sf --max-time 2 http://127.0.0.1:$DASHBOARD_PORT/ >/dev/null" 2>/dev/null
+    return $?
   fi

Also applies to: 202-203, 282-286

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-services.sh` around lines 63 - 65, The health check for the
openclaw-gateway currently probes 127.0.0.1 which is host-scoped and can be
satisfied by any local service; change the curl target to the
sandbox/network-scoped address instead (do not use 127.0.0.1). Locate the
openclaw-gateway branch where curl uses "$DASHBOARD_PORT" and replace the host
literal with the sandbox interface variable (e.g. use a SANDBOX_IP or container
network hostname variable) so the probe targets the gateway inside its sandbox;
apply the same change to the other identical curl probes referenced (the ones
around lines 202-203 and 282-286) to ensure all gateway health checks are
sandbox-scoped. Ensure the variable you use resolves to the sandbox/container IP
in the runtime environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/start-services.sh`:
- Around line 144-148: The script is resolving the sandbox differently in two
places leading to start/stop mismatch; after computing the resolved value
(variable sandbox from the SANDBOX_NAME check), persist that resolved name back
into an environment variable or a well-known temporary file (e.g., overwrite
SANDBOX_NAME or write to .current_sandbox) and have the later block (the other
occurrence around the 190-194 region) read and reuse that persisted value
instead of re-running openshell sandbox list | head -1; update the code paths
that currently set or read the sandbox variable so they always use the single
persisted resolved value.
- Around line 152-153: The ssh -o "ProxyCommand=..." uses unquoted $gateway_name
and $sandbox allowing shell metacharacter injection; fix by safely shell-quoting
or validating these vars before interpolation (e.g., build the ProxyCommand
using printf %q or otherwise escape values and/or whitelist allowed characters)
and pass the resulting fully-quoted string to the -o ProxyCommand option; update
every ssh invocation that interpolates $gateway_name and $sandbox (the
occurrences around the ssh -o "ProxyCommand=openshell ssh-proxy --gateway-name
$gateway_name --name $sandbox" lines and the other instances noted) so that
values are either validated against an allowed pattern or escaped with printf %q
before insertion into the ProxyCommand.

---

Duplicate comments:
In `@scripts/start-services.sh`:
- Around line 63-65: The health check for the openclaw-gateway currently probes
127.0.0.1 which is host-scoped and can be satisfied by any local service; change
the curl target to the sandbox/network-scoped address instead (do not use
127.0.0.1). Locate the openclaw-gateway branch where curl uses "$DASHBOARD_PORT"
and replace the host literal with the sandbox interface variable (e.g. use a
SANDBOX_IP or container network hostname variable) so the probe targets the
gateway inside its sandbox; apply the same change to the other identical curl
probes referenced (the ones around lines 202-203 and 282-286) to ensure all
gateway health checks are sandbox-scoped. Ensure the variable you use resolves
to the sandbox/container IP in the runtime environment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 34150500-6240-40fa-aae4-11495183e890

📥 Commits

Reviewing files that changed from the base of the PR and between 15f3a6f and 42119d5.

📒 Files selected for processing (1)
  • scripts/start-services.sh

Comment thread scripts/start-services.sh Outdated
Comment thread scripts/start-services.sh Outdated
@wscurran wscurran added the enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. label Mar 18, 2026
@cluster2600 cluster2600 force-pushed the fix/gateway-auto-start branch from 5387b13 to 0f24bd1 Compare March 24, 2026 11:08
mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
@cluster2600
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

Branch is stable — this has been waiting on review for a couple of weeks. Happy to address any feedback.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

@cluster2600 Thanks for the heads-up — kicking off a review of PR #310 now. I'll go through the changes in scripts/start-services.sh and post detailed feedback shortly.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

When `nemoclaw start` is run, the script now also:

1. Starts the OpenClaw gateway inside the sandbox (via SSH)
2. Forwards port 18789 from the sandbox to the host with keepalive

This means external dashboards like Mission Control can connect to
the gateway WebSocket without manual setup.

The gateway and port forward are managed as services alongside
the existing Telegram bridge and cloudflared tunnel, with proper
PID tracking, status display, and stop handling.

Additional hardening:
- validate_name() rejects identifiers with shell metacharacters
- resolve_sandbox() checks --sandbox flag, persisted state, then
  auto-detect (in that order)
- printf %q for safe quoting in ProxyCommand strings
- Sandbox-scoped gateway health check via SSH probe
- Dynamic banner alignment with printf %-40s
- StrictHostKeyChecking=accept-new with persistent known_hosts

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the fix/gateway-auto-start branch from c3c4794 to 877781a Compare April 14, 2026 06:51
@cluster2600
Copy link
Copy Markdown
Contributor Author

Closing — the gateway auto-start and diagnostics were reworked through the onboard flow (#1605, #1812) rather than via start-services.sh. The script has diverged enough that rebasing this would essentially be a rewrite. Happy to revisit if there's still a gap in the current flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenClaw gateway inside sandbox requires manual start after every reboot

2 participants