Agent Diagnostic
Agent Diagnostic
- Loaded the repo's
openshell-cli skill and used the OpenShell CLI directly for sandbox creation, SSH config generation, policy updates, and live repro.
- Reviewed the GitHub bug-report template,
CONTRIBUTING.md, SECURITY.md, and the relevant architecture and implementation for the sandbox proxy's TLS-terminating path.
- Reproduced the issue in a fresh throwaway sandbox using equivalent
protocol: rest policies for two ordinary public HTTPS sites.
- Confirmed the failure is specific to the TLS-terminating proxy path: in the concrete repro below,
example.com fails after HTTP/1.1 200 Connection Established, while time.com succeeds under the same policy shape.
- Ran a separate
tls: skip control for example.com; that changed the failure mode to a client trust issue, which suggests the main bug is not simple reachability or policy matching.
- Compared the sandbox proxy's upstream trust behavior against the vendored
webpki_roots::TLS_SERVER_ROOTS bundle it uses for upstream TLS.
- The investigation did not produce a CLI or policy-only fix. The evidence points to the sandbox proxy's upstream TLS trust behavior in the TLS-terminating path.
Description
Description
Actual behavior: Under equivalent protocol: rest TLS-terminating policies, the sandbox proxy resets some ordinary public HTTPS sites after CONNECT instead of returning a normal HTTP response. In the concrete repro below, example.com fails while time.com succeeds.
Expected behavior: Once an ordinary public HTTPS site is allowed by policy, the sandbox proxy's TLS-terminating path should handle it consistently and should not reset the connection after HTTP/1.1 200 Connection Established.
Summary
| Stage |
Host |
curl --http1.1 |
curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem |
Node fetch() |
Node fetch() with NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem |
| Fresh sandbox, no host-specific policy |
example.com |
HTTP/1.1 403 Forbidden then CONNECT tunnel failed, response 403 |
same |
fetch failed, Request was cancelled |
same |
| Fresh sandbox, no host-specific policy |
time.com |
blocked in the same way |
blocked in the same way |
blocked in the same way |
blocked in the same way |
After example.com REST policy |
example.com |
HTTP/1.1 200 Connection Established then Recv failure: Connection reset by peer |
same |
fetch failed, UND_ERR_SOCKET other side closed |
same |
After time.com REST policy |
time.com |
HTTP/1.1 200 OK |
HTTP/1.1 200 OK |
status 200 |
status 200 |
This looks like a host-specific problem in the OpenShell sandbox proxy's TLS-terminating path for example.com.
I also tested other ordinary sites with the same general policy shape, including google.com and bing.com, and they worked; time.com is just the concrete control example used below. So far, example.com is the only site I have found that shows this reset behavior.
I am using the OpenShell openclaw community sandbox image only because it includes both curl and node, which makes the repro short. I do not think the bug is NemoClaw-specific.
The only material change between the failing and succeeding cases below is the allowed hostname under the same protocol: rest TLS-terminating policy shape. time.com works through the terminating REST path, while example.com fails after CONNECT and never reaches a normal HTTP response. That points to host-specific breakage in the OpenShell CONNECT or TLS-termination or HTTP relay path for example.com, rather than a simple policy-authoring mistake.
Reproduction Steps
Reproduction Steps
Main repro
- Create a fresh throwaway sandbox and generate an SSH config for it.
SANDBOX_NAME=rest-proxy-repro
openshell sandbox create --name "$SANDBOX_NAME" --from openclaw --no-auto-providers --no-tty -- echo sandbox-ready
openshell sandbox ssh-config "$SANDBOX_NAME" > /tmp/$SANDBOX_NAME-ssh.conf
SSH_CONFIG=/tmp/$SANDBOX_NAME-ssh.conf
SSH_HOST=openshell-$SANDBOX_NAME
- In the fresh sandbox, test
example.com before adding any host-specific policy.
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
Observed: as expected, both curl commands return HTTP/1.1 403 Forbidden and CONNECT tunnel failed, response 403. Both Node commands fail with fetch failed and Request was cancelled.
- Still in the fresh sandbox, test
time.com before adding any host-specific policy.
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
Observed: as expected, same blocked behavior as example.com.
- Create and apply an
example.com TLS-terminating REST policy.
cat > /tmp/example-rest.yaml <<'YAML'
version: 1
filesystem_policy:
include_workdir: true
read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log]
read_write: [/sandbox, /tmp, /dev/null]
landlock:
compatibility: best_effort
process:
run_as_user: sandbox
run_as_group: sandbox
network_policies:
issue_site:
name: issue_site
endpoints:
- host: example.com
port: 443
protocol: rest
enforcement: enforce
access: read-only
binaries:
- { path: /usr/bin/node }
- { path: /usr/bin/curl }
YAML
openshell policy set --policy /tmp/example-rest.yaml --wait "$SANDBOX_NAME"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
Observed: both curl commands return HTTP/1.1 200 Connection Established and then fail with Recv failure: Connection reset by peer. Both Node commands fail with fetch failed and UND_ERR_SOCKET other side closed.
- Create and apply an equivalent
time.com TLS-terminating REST policy.
cat > /tmp/time-rest.yaml <<'YAML'
version: 1
filesystem_policy:
include_workdir: true
read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log]
read_write: [/sandbox, /tmp, /dev/null]
landlock:
compatibility: best_effort
process:
run_as_user: sandbox
run_as_group: sandbox
network_policies:
issue_site:
name: issue_site
endpoints:
- host: time.com
port: 443
protocol: rest
enforcement: enforce
access: read-only
binaries:
- { path: /usr/bin/node }
- { path: /usr/bin/curl }
YAML
openshell policy set --policy /tmp/time-rest.yaml --wait "$SANDBOX_NAME"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
Observed: both curl commands succeed with HTTP/1.1 200 OK. Both Node commands succeed with status 200.
- Delete the throwaway sandbox.
openshell sandbox delete "$SANDBOX_NAME"
rm -f "$SSH_CONFIG" /tmp/example-rest.yaml /tmp/time-rest.yaml
Separate control: disable TLS termination for example.com
As a separate control, if TLS termination is removed for example.com, the host does load.
cat > /tmp/example-direct.yaml <<'YAML'
version: 1
filesystem_policy:
include_workdir: true
read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log]
read_write: [/sandbox, /tmp, /dev/null]
landlock:
compatibility: best_effort
process:
run_as_user: sandbox
run_as_group: sandbox
network_policies:
issue_site:
name: issue_site
endpoints:
- host: example.com
port: 443
tls: skip
binaries:
- { path: /usr/bin/node }
- { path: /usr/bin/curl }
YAML
openshell policy set --policy /tmp/example-direct.yaml --wait "$SANDBOX_NAME"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
Observed: the first Node command fails with UNABLE_TO_GET_ISSUER_CERT_LOCALLY, while the second succeeds with status 200.
That appears to be a separate trust-store issue rather than the main bug above. OpenShell writes two TLS files into the sandbox: /etc/openshell-tls/openshell-ca.pem and /etc/openshell-tls/ca-bundle.pem. In this direct-TLS control, Node succeeds when pointed at the merged bundle at /etc/openshell-tls/ca-bundle.pem, which suggests the earlier reset is specific to the TLS-terminating path, not to basic reachability of example.com.
Additional evidence for likely root cause
There is also a host-side result that appears to explain the difference.
The OpenShell sandbox proxy's upstream TLS client is built from webpki_roots::TLS_SERVER_ROOTS, not from the host OS trust store. The live /etc/openshell-tls/* files are written for sandbox clients, not for the proxy's own upstream trust, so I extracted a PEM bundle from the exact webpki-roots 1.0.6 source OpenShell links against. With that bundle, host-side curl failed on example.com with unable to get local issuer certificate, while time.com succeeded. When I forced curl to use Ubuntu's standard CA bundle instead, example.com succeeded.
mkdir -p /tmp/empty-ca-dir
awk '/\* -----BEGIN CERTIFICATE-----/ { in_cert=1; print "-----BEGIN CERTIFICATE-----"; next } /\* -----END CERTIFICATE-----/ { print "-----END CERTIFICATE-----"; in_cert=0; next } in_cert { sub(/^.*\* /, ""); print }' ~/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/webpki-roots-1.0.6/src/lib.rs > /tmp/webpki-roots-1.0.6.pem
curl --http1.1 --cacert /tmp/webpki-roots-1.0.6.pem --capath /tmp/empty-ca-dir -sS -D - https://example.com -o /dev/null
curl --http1.1 --cacert /tmp/webpki-roots-1.0.6.pem --capath /tmp/empty-ca-dir -sS -D - https://time.com -o /dev/null
curl --http1.1 --cacert /etc/ssl/certs/ca-certificates.crt --capath /tmp/empty-ca-dir -sS -D - https://example.com -o /dev/null
Observed:
example.com with the extracted webpki-roots 1.0.6 bundle fails with SSL certificate problem: unable to get local issuer certificate
time.com with the same extracted bundle succeeds with HTTP/1.1 200 OK
example.com with Ubuntu's standard CA bundle succeeds with HTTP/1.1 200 OK
The current server chains also differ in a way that matches this result:
example.com currently chains through AAA Certificate Services
time.com currently chains through Starfield Root Certificate Authority - G2
I could find Starfield Root Certificate Authority - G2 in webpki-roots 1.0.6, but not AAA Certificate Services. If that is the operative difference, then the main reset above is probably happening when the OpenShell sandbox proxy tries to open its upstream TLS connection to example.com using the vendored webpki-roots trust set.
I’m happy to test suggested diagnostics or a candidate fix, and I’m open to input on where this can best be addressed.
Environment
Environment
- OS: Ubuntu 20.04.6 LTS
- Docker: Docker version 29.3.1, build c2be9cc
- OpenShell CLI: openshell 0.0.26-dev.11+g208a149a
- OpenShell repo commit: 208a149a
- Sandbox image used for repro:
openclaw via openshell sandbox create --from openclaw
Logs
example.com under protocol: rest
HTTP/1.1 200 Connection Established
curl: (56) Recv failure: Connection reset by peer
Node fetch()
fetch failed
UND_ERR_SOCKET other side closed
Agent-First Checklist
Agent Diagnostic
Agent Diagnostic
openshell-cliskill and used the OpenShell CLI directly for sandbox creation, SSH config generation, policy updates, and live repro.CONTRIBUTING.md,SECURITY.md, and the relevant architecture and implementation for the sandbox proxy's TLS-terminating path.protocol: restpolicies for two ordinary public HTTPS sites.example.comfails afterHTTP/1.1 200 Connection Established, whiletime.comsucceeds under the same policy shape.tls: skipcontrol forexample.com; that changed the failure mode to a client trust issue, which suggests the main bug is not simple reachability or policy matching.webpki_roots::TLS_SERVER_ROOTSbundle it uses for upstream TLS.Description
Description
Actual behavior: Under equivalent
protocol: restTLS-terminating policies, the sandbox proxy resets some ordinary public HTTPS sites after CONNECT instead of returning a normal HTTP response. In the concrete repro below,example.comfails whiletime.comsucceeds.Expected behavior: Once an ordinary public HTTPS site is allowed by policy, the sandbox proxy's TLS-terminating path should handle it consistently and should not reset the connection after
HTTP/1.1 200 Connection Established.Summary
curl --http1.1curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pemfetch()fetch()withNODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pemexample.comHTTP/1.1 403 ForbiddenthenCONNECT tunnel failed, response 403fetch failed,Request was cancelledtime.comexample.comREST policyexample.comHTTP/1.1 200 Connection EstablishedthenRecv failure: Connection reset by peerfetch failed,UND_ERR_SOCKET other side closedtime.comREST policytime.comHTTP/1.1 200 OKHTTP/1.1 200 OKstatus 200status 200This looks like a host-specific problem in the OpenShell sandbox proxy's TLS-terminating path for
example.com.I also tested other ordinary sites with the same general policy shape, including
google.comandbing.com, and they worked;time.comis just the concrete control example used below. So far,example.comis the only site I have found that shows this reset behavior.I am using the OpenShell
openclawcommunity sandbox image only because it includes bothcurlandnode, which makes the repro short. I do not think the bug is NemoClaw-specific.The only material change between the failing and succeeding cases below is the allowed hostname under the same
protocol: restTLS-terminating policy shape.time.comworks through the terminating REST path, whileexample.comfails after CONNECT and never reaches a normal HTTP response. That points to host-specific breakage in the OpenShell CONNECT or TLS-termination or HTTP relay path forexample.com, rather than a simple policy-authoring mistake.Reproduction Steps
Reproduction Steps
Main repro
example.combefore adding any host-specific policy.Observed: as expected, both curl commands return
HTTP/1.1 403 ForbiddenandCONNECT tunnel failed, response 403. Both Node commands fail withfetch failedandRequest was cancelled.time.combefore adding any host-specific policy.Observed: as expected, same blocked behavior as
example.com.example.comTLS-terminating REST policy.Observed: both curl commands return
HTTP/1.1 200 Connection Establishedand then fail withRecv failure: Connection reset by peer. Both Node commands fail withfetch failedandUND_ERR_SOCKET other side closed.time.comTLS-terminating REST policy.Observed: both curl commands succeed with
HTTP/1.1 200 OK. Both Node commands succeed withstatus 200.Separate control: disable TLS termination for
example.comAs a separate control, if TLS termination is removed for
example.com, the host does load.Observed: the first Node command fails with
UNABLE_TO_GET_ISSUER_CERT_LOCALLY, while the second succeeds withstatus 200.That appears to be a separate trust-store issue rather than the main bug above. OpenShell writes two TLS files into the sandbox:
/etc/openshell-tls/openshell-ca.pemand/etc/openshell-tls/ca-bundle.pem. In this direct-TLS control, Node succeeds when pointed at the merged bundle at/etc/openshell-tls/ca-bundle.pem, which suggests the earlier reset is specific to the TLS-terminating path, not to basic reachability ofexample.com.Additional evidence for likely root cause
There is also a host-side result that appears to explain the difference.
The OpenShell sandbox proxy's upstream TLS client is built from
webpki_roots::TLS_SERVER_ROOTS, not from the host OS trust store. The live/etc/openshell-tls/*files are written for sandbox clients, not for the proxy's own upstream trust, so I extracted a PEM bundle from the exactwebpki-roots 1.0.6source OpenShell links against. With that bundle, host-sidecurlfailed onexample.comwithunable to get local issuer certificate, whiletime.comsucceeded. When I forcedcurlto use Ubuntu's standard CA bundle instead,example.comsucceeded.Observed:
example.comwith the extractedwebpki-roots 1.0.6bundle fails withSSL certificate problem: unable to get local issuer certificatetime.comwith the same extracted bundle succeeds withHTTP/1.1 200 OKexample.comwith Ubuntu's standard CA bundle succeeds withHTTP/1.1 200 OKThe current server chains also differ in a way that matches this result:
example.comcurrently chains throughAAA Certificate Servicestime.comcurrently chains throughStarfield Root Certificate Authority - G2I could find
Starfield Root Certificate Authority - G2inwebpki-roots 1.0.6, but notAAA Certificate Services. If that is the operative difference, then the main reset above is probably happening when the OpenShell sandbox proxy tries to open its upstream TLS connection toexample.comusing the vendoredwebpki-rootstrust set.I’m happy to test suggested diagnostics or a candidate fix, and I’m open to input on where this can best be addressed.
Environment
Environment
openclawviaopenshell sandbox create --from openclawLogs
example.com under protocol: rest HTTP/1.1 200 Connection Established curl: (56) Recv failure: Connection reset by peer Node fetch() fetch failed UND_ERR_SOCKET other side closedAgent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)