Skip to content

Bug(proxy): is the sandbox proxy’s upstream TLS CA bundle too limited for some ordinary public HTTPS sites? Example repros included #813

@cosmicnet

Description

@cosmicnet

Agent Diagnostic

Agent Diagnostic

  • Loaded the repo's openshell-cli skill and used the OpenShell CLI directly for sandbox creation, SSH config generation, policy updates, and live repro.
  • Reviewed the GitHub bug-report template, CONTRIBUTING.md, SECURITY.md, and the relevant architecture and implementation for the sandbox proxy's TLS-terminating path.
  • Reproduced the issue in a fresh throwaway sandbox using equivalent protocol: rest policies for two ordinary public HTTPS sites.
  • Confirmed the failure is specific to the TLS-terminating proxy path: in the concrete repro below, example.com fails after HTTP/1.1 200 Connection Established, while time.com succeeds under the same policy shape.
  • Ran a separate tls: skip control for example.com; that changed the failure mode to a client trust issue, which suggests the main bug is not simple reachability or policy matching.
  • Compared the sandbox proxy's upstream trust behavior against the vendored webpki_roots::TLS_SERVER_ROOTS bundle it uses for upstream TLS.
  • The investigation did not produce a CLI or policy-only fix. The evidence points to the sandbox proxy's upstream TLS trust behavior in the TLS-terminating path.

Description

Description

Actual behavior: Under equivalent protocol: rest TLS-terminating policies, the sandbox proxy resets some ordinary public HTTPS sites after CONNECT instead of returning a normal HTTP response. In the concrete repro below, example.com fails while time.com succeeds.

Expected behavior: Once an ordinary public HTTPS site is allowed by policy, the sandbox proxy's TLS-terminating path should handle it consistently and should not reset the connection after HTTP/1.1 200 Connection Established.

Summary

Stage Host curl --http1.1 curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem Node fetch() Node fetch() with NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem
Fresh sandbox, no host-specific policy example.com HTTP/1.1 403 Forbidden then CONNECT tunnel failed, response 403 same fetch failed, Request was cancelled same
Fresh sandbox, no host-specific policy time.com blocked in the same way blocked in the same way blocked in the same way blocked in the same way
After example.com REST policy example.com HTTP/1.1 200 Connection Established then Recv failure: Connection reset by peer same fetch failed, UND_ERR_SOCKET other side closed same
After time.com REST policy time.com HTTP/1.1 200 OK HTTP/1.1 200 OK status 200 status 200

This looks like a host-specific problem in the OpenShell sandbox proxy's TLS-terminating path for example.com.
I also tested other ordinary sites with the same general policy shape, including google.com and bing.com, and they worked; time.com is just the concrete control example used below. So far, example.com is the only site I have found that shows this reset behavior.
I am using the OpenShell openclaw community sandbox image only because it includes both curl and node, which makes the repro short. I do not think the bug is NemoClaw-specific.

The only material change between the failing and succeeding cases below is the allowed hostname under the same protocol: rest TLS-terminating policy shape. time.com works through the terminating REST path, while example.com fails after CONNECT and never reaches a normal HTTP response. That points to host-specific breakage in the OpenShell CONNECT or TLS-termination or HTTP relay path for example.com, rather than a simple policy-authoring mistake.

Reproduction Steps

Reproduction Steps

Main repro

  1. Create a fresh throwaway sandbox and generate an SSH config for it.
SANDBOX_NAME=rest-proxy-repro
openshell sandbox create --name "$SANDBOX_NAME" --from openclaw --no-auto-providers --no-tty -- echo sandbox-ready
openshell sandbox ssh-config "$SANDBOX_NAME" > /tmp/$SANDBOX_NAME-ssh.conf
SSH_CONFIG=/tmp/$SANDBOX_NAME-ssh.conf
SSH_HOST=openshell-$SANDBOX_NAME
  1. In the fresh sandbox, test example.com before adding any host-specific policy.
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""

Observed: as expected, both curl commands return HTTP/1.1 403 Forbidden and CONNECT tunnel failed, response 403. Both Node commands fail with fetch failed and Request was cancelled.

  1. Still in the fresh sandbox, test time.com before adding any host-specific policy.
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""

Observed: as expected, same blocked behavior as example.com.

  1. Create and apply an example.com TLS-terminating REST policy.
cat > /tmp/example-rest.yaml <<'YAML'
version: 1

filesystem_policy:
  include_workdir: true
  read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log]
  read_write: [/sandbox, /tmp, /dev/null]
landlock:
  compatibility: best_effort
process:
  run_as_user: sandbox
  run_as_group: sandbox
network_policies:
  issue_site:
    name: issue_site
    endpoints:
      - host: example.com
        port: 443
        protocol: rest
        enforcement: enforce
        access: read-only
    binaries:
      - { path: /usr/bin/node }
      - { path: /usr/bin/curl }
YAML

openshell policy set --policy /tmp/example-rest.yaml --wait "$SANDBOX_NAME"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://example.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""

Observed: both curl commands return HTTP/1.1 200 Connection Established and then fail with Recv failure: Connection reset by peer. Both Node commands fail with fetch failed and UND_ERR_SOCKET other side closed.

  1. Create and apply an equivalent time.com TLS-terminating REST policy.
cat > /tmp/time-rest.yaml <<'YAML'
version: 1

filesystem_policy:
  include_workdir: true
  read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log]
  read_write: [/sandbox, /tmp, /dev/null]
landlock:
  compatibility: best_effort
process:
  run_as_user: sandbox
  run_as_group: sandbox
network_policies:
  issue_site:
    name: issue_site
    endpoints:
      - host: time.com
        port: 443
        protocol: rest
        enforcement: enforce
        access: read-only
    binaries:
      - { path: /usr/bin/node }
      - { path: /usr/bin/curl }
YAML

openshell policy set --policy /tmp/time-rest.yaml --wait "$SANDBOX_NAME"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "curl --http1.1 --cacert /etc/openshell-tls/openshell-ca.pem -sS -D - https://time.com -o /dev/null"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://time.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""

Observed: both curl commands succeed with HTTP/1.1 200 OK. Both Node commands succeed with status 200.

  1. Delete the throwaway sandbox.
openshell sandbox delete "$SANDBOX_NAME"
rm -f "$SSH_CONFIG" /tmp/example-rest.yaml /tmp/time-rest.yaml

Separate control: disable TLS termination for example.com

As a separate control, if TLS termination is removed for example.com, the host does load.

cat > /tmp/example-direct.yaml <<'YAML'
version: 1

filesystem_policy:
  include_workdir: true
  read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log]
  read_write: [/sandbox, /tmp, /dev/null]
landlock:
  compatibility: best_effort
process:
  run_as_user: sandbox
  run_as_group: sandbox
network_policies:
  issue_site:
    name: issue_site
    endpoints:
      - host: example.com
        port: 443
        tls: skip
    binaries:
      - { path: /usr/bin/node }
      - { path: /usr/bin/curl }
YAML

openshell policy set --policy /tmp/example-direct.yaml --wait "$SANDBOX_NAME"
ssh -F "$SSH_CONFIG" "$SSH_HOST" "node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""
ssh -F "$SSH_CONFIG" "$SSH_HOST" "NODE_EXTRA_CA_CERTS=/etc/openshell-tls/ca-bundle.pem node -e \"fetch('https://example.com').then(r=>{console.log('status',r.status); console.log('url',r.url)}).catch(e=>{console.error('ERR', e && e.name, e && e.code, e && e.message, e && e.cause && e.cause.code, e && e.cause && e.cause.message); process.exit(1)})\""

Observed: the first Node command fails with UNABLE_TO_GET_ISSUER_CERT_LOCALLY, while the second succeeds with status 200.

That appears to be a separate trust-store issue rather than the main bug above. OpenShell writes two TLS files into the sandbox: /etc/openshell-tls/openshell-ca.pem and /etc/openshell-tls/ca-bundle.pem. In this direct-TLS control, Node succeeds when pointed at the merged bundle at /etc/openshell-tls/ca-bundle.pem, which suggests the earlier reset is specific to the TLS-terminating path, not to basic reachability of example.com.

Additional evidence for likely root cause

There is also a host-side result that appears to explain the difference.

The OpenShell sandbox proxy's upstream TLS client is built from webpki_roots::TLS_SERVER_ROOTS, not from the host OS trust store. The live /etc/openshell-tls/* files are written for sandbox clients, not for the proxy's own upstream trust, so I extracted a PEM bundle from the exact webpki-roots 1.0.6 source OpenShell links against. With that bundle, host-side curl failed on example.com with unable to get local issuer certificate, while time.com succeeded. When I forced curl to use Ubuntu's standard CA bundle instead, example.com succeeded.

mkdir -p /tmp/empty-ca-dir
awk '/\* -----BEGIN CERTIFICATE-----/ { in_cert=1; print "-----BEGIN CERTIFICATE-----"; next } /\* -----END CERTIFICATE-----/ { print "-----END CERTIFICATE-----"; in_cert=0; next } in_cert { sub(/^.*\* /, ""); print }' ~/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/webpki-roots-1.0.6/src/lib.rs > /tmp/webpki-roots-1.0.6.pem

curl --http1.1 --cacert /tmp/webpki-roots-1.0.6.pem --capath /tmp/empty-ca-dir -sS -D - https://example.com -o /dev/null
curl --http1.1 --cacert /tmp/webpki-roots-1.0.6.pem --capath /tmp/empty-ca-dir -sS -D - https://time.com -o /dev/null
curl --http1.1 --cacert /etc/ssl/certs/ca-certificates.crt --capath /tmp/empty-ca-dir -sS -D - https://example.com -o /dev/null

Observed:

  • example.com with the extracted webpki-roots 1.0.6 bundle fails with SSL certificate problem: unable to get local issuer certificate
  • time.com with the same extracted bundle succeeds with HTTP/1.1 200 OK
  • example.com with Ubuntu's standard CA bundle succeeds with HTTP/1.1 200 OK

The current server chains also differ in a way that matches this result:

  • example.com currently chains through AAA Certificate Services
  • time.com currently chains through Starfield Root Certificate Authority - G2

I could find Starfield Root Certificate Authority - G2 in webpki-roots 1.0.6, but not AAA Certificate Services. If that is the operative difference, then the main reset above is probably happening when the OpenShell sandbox proxy tries to open its upstream TLS connection to example.com using the vendored webpki-roots trust set.

I’m happy to test suggested diagnostics or a candidate fix, and I’m open to input on where this can best be addressed.

Environment

Environment

  • OS: Ubuntu 20.04.6 LTS
  • Docker: Docker version 29.3.1, build c2be9cc
  • OpenShell CLI: openshell 0.0.26-dev.11+g208a149a
  • OpenShell repo commit: 208a149a
  • Sandbox image used for repro: openclaw via openshell sandbox create --from openclaw

Logs

example.com under protocol: rest
HTTP/1.1 200 Connection Established
curl: (56) Recv failure: Connection reset by peer

Node fetch()
fetch failed
UND_ERR_SOCKET other side closed

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions