Skip to content

bug: link-local / loopback entries in allowed_ips are silently accepted but never enforced #814

@vnicolici

Description

@vnicolici

Agent Diagnostic

Loaded skills: openshell-cli (for openshell sandbox, openshell policy, openshell logs commands). No sandbox-specific debug skill was applicable — debug-openshell-cluster is for cluster-start failures and debug-inference is for inference routing, neither matched this symptom.

Investigation sequence:

  1. Static trace of the enforcement path. Followed the forward-proxy code path at crates/openshell-sandbox/src/proxy.rs:1936 (handle_forward_proxy). Traced the OPA → query_allowed_ipsimplicit_allowed_ips_for_ip_host fallback → resolve_and_check_allowed_ips chain. Read the Rego endpoint-matching rules at crates/openshell-sandbox/data/sandbox-policy.rego (exact-host match at lines 95–101, hostless-plus-allowed_ips at 117–123, extended-config gate at 346–356). Confirmed PR fix(sandbox): treat literal IP in policy host as implicit allowed_ips #570 (3f1917a7 fix(sandbox): treat literal IP in policy host as implicit allowed_ips) is present in v0.0.26, so for most literal-IP scenarios the code path is sound on paper.

  2. Found the hard block. is_always_blocked_ip at crates/openshell-sandbox/src/proxy.rs:1530 returns true for IPv4 loopback, link-local (169.254.0.0/16), and unspecified ranges — and the equivalent IPv6 ranges including ::1, fe80::/10, and v4-mapped variants. It is consulted before the allowlist membership check inside resolve_and_check_allowed_ips at crates/openshell-sandbox/src/proxy.rs:1580-1587. Test test_forward_link_local_always_blocked_even_with_allowed_ips at crates/openshell-sandbox/src/proxy.rs:3257 explicitly locks the behaviour in as intentional (cloud-metadata SSRF hardening). The CONNECT path at crates/openshell-sandbox/src/proxy.rs:482-524 and the forward-proxy path at crates/openshell-sandbox/src/proxy.rs:2218-2259 both flow through the same helper, so the block applies uniformly.

  3. Traced the loader and approval paths to look for upstream validation. parse_allowed_ips at crates/openshell-sandbox/src/proxy.rs:1623 only emits a warning for very broad CIDRs (< /16) and never rejects entries that fall inside the always-blocked ranges. The server-side policy ingest and the TUI draft-approval handler (handle_approve_draft_chunkmerge_chunk_into_policy at crates/openshell-server/src/grpc/policy.rs:999 and :1624) also perform no such validation — they happily persist a draft chunk whose host or allowed_ips names an un-enforceable range, mark it approved, and push a revision through sandbox_watch_bus.notify. From the operator's seat, the TUI reports success while enforcement silently no-ops.

  4. Traced the denial analyzer. mechanistic_mapper::generate_proposals at crates/openshell-sandbox/src/mechanistic_mapper.rs:107 builds a NetworkPolicyRule from each denial summary, calls resolve_allowed_ips_if_private to populate allowed_ips when the resolved IP is RFC1918-ish, and emits a proposal. It has no notion of "always-blocked class" — so for denials whose destination IP is structurally un-allowable, it keeps generating fresh proposals on every flush cycle (every ~10s in my repro).

  5. Live repro on sandbox theseus. Dumped the loaded policy with openshell policy get theseus --full (rev 9, hash d47e757…). User had a clean sandbox-level aws_iam rule that sets both the IP in host: and a matching allowed_ips: entry — policy loaded without warning. Reproduced the 403 myself from inside the sandbox with openshell sandbox exec -n theseus -- curl -v -X PUT http://169.254.169.254/latest/api/token — the client connected via http_proxy=http://10.200.0.1:3128 (the sandbox forward proxy) and got HTTP/1.1 403 Forbidden. The corresponding denial event appeared in openshell logs theseus tagged [policy:aws_iam] (see Logs section below). That tag is emitted at crates/openshell-sandbox/src/proxy.rs:434:

    let policy_str = matched_policy.as_deref().unwrap_or("-");

    so [policy:aws_iam] proves OPA matched the rule cleanly at both the endpoint-allowed and binary-allowed layers — the 403 is strictly downstream at the SSRF / always-blocked layer, not an OPA miss.

  6. Checked the OCSF JSONL sink to see status_detail. The shorthand OCSF text format drops the status_detail field, which is where the explanatory phrase "resolves to always-blocked address" lives (set at crates/openshell-sandbox/src/proxy.rs:508, 543, 584 for CONNECT and :2244, :2282 for forward). The JSONL file at /var/log/openshell-ocsf.2026-04-11.log inside the sandbox was 0 bytes in my repro, so there was no higher-fidelity log to fall back on. This is a meaningful contributor to the bug symptom — the system is not lying to the operator, it is just not telling them enough: the shorthand denial line reads identically to a plain allowlist miss.

What could not be resolved by the agent: The block itself is intentional (covered by a locked-in test) and removing it would regress cloud-metadata SSRF hardening, which is a deliberate security invariant. The fix has to go into validation, messaging, and proposal filtering — see "Suggested fixes" below — none of which I applied because the user wants maintainer review first.

Description

Actual behaviour: In v0.0.26, sandbox policies that whitelist an IP inside any always-blocked range (127.0.0.0/8, 169.254.0.0/16, 0.0.0.0, ::1, fe80::/10, v4-mapped equivalents) are accepted without warning by the policy loader and by the TUI draft-approval path, but produce 403s on every request at runtime. OPA matches the rule (denial events carry the matched policy name in the log), and the rejection happens downstream at is_always_blocked_ip inside resolve_and_check_allowed_ips. The shorthand OCSF log line for the denial does not surface the reason, so the operator cannot distinguish "my allowlist is wrong" from "my allowlist can never take effect." The denial analyzer keeps generating fresh draft proposals for the same destination on every flush cycle, so the TUI notification repeatedly returns after approval, reinforcing the illusion that the rule is being processed.

Expected behaviour: At least one of the following — ideally all four:

  1. The policy loader (and the TUI-approval path on the server) should reject entries whose IP or CIDR overlaps any always-blocked range, with an explicit error identifying the range and the SSRF-hardening rationale, so un-enforceable rules never get persisted in the first place.
  2. The runtime denial message should clearly distinguish "allowlist miss" from "destination is structurally un-allowable" so log readers understand what happened.
  3. The denial analyzer should skip proposals whose resolved IP set is entirely inside always-blocked ranges, so the TUI does not keep re-surfacing notifications for un-fixable situations.
  4. The OCSF JSONL sink inside the sandbox image should actually be populated by default — or the shorthand log format should surface status_detail for denials — so operators have a path to the full event context.

Scope clarification: The hard block on link-local / loopback / unspecified destinations is correct and should stay. This issue is about the silent failure mode around it, not the block itself.

Reproduction Steps

  1. Set a sandbox policy that names an always-blocked IP — AWS IMDS is the canonical case:

    aws_iam:
      name: aws_iam
      endpoints:
      - host: '169.254.169.254'
        port: 80
        allowed_ips:
        - '169.254.169.254'
      binaries:
      - path: /usr/bin/curl

    Apply it with openshell policy set <sandbox> --policy policy.yaml.

  2. Verify it loaded: openshell policy get <sandbox> --full. The rule appears verbatim, no warnings, status Loaded.

  3. From inside the sandbox, hit the IMDSv2 token endpoint:

    openshell sandbox exec -n <sandbox> -- curl -v -X PUT http://169.254.169.254/latest/api/token

    Result: HTTP/1.1 403 Forbidden from the sandbox forward proxy (http_proxy=http://10.200.0.1:3128).

  4. Check the logs: openshell logs <sandbox> --source sandbox shows the denial tagged with the matched policy name — proving OPA matched and the block is downstream:

    HTTP:PUT [MED] DENIED /usr/bin/curl(1618) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]
    
  5. (Optional) Leave it running and observe Flushed denial analysis to gateway proposals=1 repeating in the logs every ~10 s — the analyzer keeps re-proposing the same un-enforceable rule.

Environment

Logs

# Live denial events on sandbox `theseus` (shorthand OCSF format from `openshell logs`):
[1775932282.846] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(1014) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]
[1775932282.854] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(1014) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:aws_iam]
[1775932288.821] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(1040) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]
[1775932288.828] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(1040) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:aws_iam]
# ... repeats every ~6s with a fresh node pid as OpenClaw spawns new children

# Reproduced the same denial from a manual curl inside the sandbox:
[1775932413.437] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/bin/curl(1618) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]

# Denial analyzer re-proposing the same rule on every flush:
[1775932313.643] [sandbox] [INFO ] [openshell_sandbox] Flushed denial analysis to gateway proposals=1 sandbox_name=theseus summaries=1
[1775932323.642] [sandbox] [INFO ] [openshell_sandbox] Flushed denial analysis to gateway proposals=1 sandbox_name=theseus summaries=1

# The OCSF JSONL sink inside the sandbox — where `status_detail` would live — was empty:
$ openshell sandbox exec -n theseus -- stat -c '%s %n' /var/log/openshell-ocsf.2026-04-11.log
0 /var/log/openshell-ocsf.2026-04-11.log

The [policy:aws_iam] tag at the end of each denial line is the key signal: it is emitted at crates/openshell-sandbox/src/proxy.rs:434 from matched_policy.as_deref().unwrap_or("-"), so it can only be set when OPA has returned NetworkAction::Allow { matched_policy: Some(...) }. The 403 therefore cannot be an OPA miss — it is happening later, inside resolve_and_check_allowed_ips at crates/openshell-sandbox/src/proxy.rs:1580, where is_always_blocked_ip rejects the destination before the allowlist is consulted.

Suggested fixes (for maintainer review)

  1. Validate at load time. In parse_allowed_ips (crates/openshell-sandbox/src/proxy.rs:1623), reject any parsed IpNet that overlaps an always-blocked range, with an error naming the offending entry and explaining the metadata-SSRF rationale. The same check can be shared with a fn is_always_blocked_net(net: IpNet) -> bool helper next to is_always_blocked_ip.
  2. Validate at ingest/approval time. Mirror the same check in the server-side policy ingest and in merge_chunk_into_policy / handle_approve_draft_chunk at crates/openshell-server/src/grpc/policy.rs:999 and :1624, so the TUI surfaces the rejection synchronously instead of silently persisting a no-op rule and clearing the notification.
  3. Disambiguate the denial message. Split the FORWARD blocked: allowed_ips check failed for <host>:<port> top-line at crates/openshell-sandbox/src/proxy.rs:505-506 and :2242-2244 into two distinct messages — one for "not in allowlist" and one for "destination is in always-blocked class" — or include status_detail in the shorthand OCSF format for denials so the reason travels with the event.
  4. Filter un-fixable proposals. In mechanistic_mapper::generate_proposals (crates/openshell-sandbox/src/mechanistic_mapper.rs:107), skip destinations whose resolved IPs fall entirely within the always-blocked ranges, so the analyzer stops re-proposing rules that cannot take effect. Optionally emit a single structured warning event for operators so the denial is still visible but does not re-surface the TUI notification loop.
  5. Separately investigate the JSONL sink. /var/log/openshell-ocsf.<date>.log was 0 bytes inside the sandbox in this repro. Either the default image is missing the JSONL tracing layer or it is pointed elsewhere — worth a separate spike. Having a working JSONL sink would have reduced the severity of this bug because an operator could have grepped status_detail for always-blocked and diagnosed it in one step.

Design note

One design question worth flagging: the one always-blocked range with a real use case is 169.254.0.0/16169.254.169.254 is where AWS/GCP/Azure serve instance metadata and IAM credential bootstrap, and every AWS SDK tries IMDSv2 there first on a cloud host. The current hard block pushes operators off instance-role auth and onto long-lived static keys, which is a regression in credential hygiene. The counter is also real: this range is item #1 on every SSRF hardening checklist because of the 2019 Capital One breach, and IMDSv2's PUT-then-GET handshake does not help inside a sandbox where the attacker can trivially issue PUTs — the block's real job here is preventing a prompt-injected agent from stealing the host's EC2 role credentials. A blanket relaxation would be wrong, but a narrowly-scoped opt-in for the specific metadata endpoints (as a distinct top-level policy field, not something reached via allowed_ips:) might deserve a design look. Orthogonal to the validation and messaging items above; flagging it only because the investigation surfaced the question.

Related

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (openshell-cli for the live repro; no sandbox-specific debug skill was applicable)
  • My agent could not resolve this — the diagnostic above explains why (the block is intentional and locked in by a test; the remediation is loader/UX/analyzer validation, which needs maintainer review before modification)

Metadata

Metadata

Assignees

Labels

state:triage-neededOpened without agent diagnostics and needs triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions