bug: link-local / loopback entries in allowed_ips are silently accepted but never enforced

## Agent Diagnostic

Loaded skills: `openshell-cli` (for `openshell sandbox`, `openshell policy`, `openshell logs` commands). No sandbox-specific debug skill was applicable — `debug-openshell-cluster` is for cluster-start failures and `debug-inference` is for inference routing, neither matched this symptom.

**Investigation sequence:**

1. **Static trace of the enforcement path.** Followed the forward-proxy code path at `crates/openshell-sandbox/src/proxy.rs:1936` (`handle_forward_proxy`). Traced the OPA → `query_allowed_ips` → `implicit_allowed_ips_for_ip_host` fallback → `resolve_and_check_allowed_ips` chain. Read the Rego endpoint-matching rules at `crates/openshell-sandbox/data/sandbox-policy.rego` (exact-host match at lines 95–101, hostless-plus-allowed_ips at 117–123, extended-config gate at 346–356). Confirmed PR #570 (`3f1917a7 fix(sandbox): treat literal IP in policy host as implicit allowed_ips`) is present in v0.0.26, so for most literal-IP scenarios the code path is sound on paper.

2. **Found the hard block.** `is_always_blocked_ip` at `crates/openshell-sandbox/src/proxy.rs:1530` returns true for IPv4 loopback, link-local (`169.254.0.0/16`), and unspecified ranges — and the equivalent IPv6 ranges including `::1`, `fe80::/10`, and v4-mapped variants. It is consulted **before** the allowlist membership check inside `resolve_and_check_allowed_ips` at `crates/openshell-sandbox/src/proxy.rs:1580-1587`. Test `test_forward_link_local_always_blocked_even_with_allowed_ips` at `crates/openshell-sandbox/src/proxy.rs:3257` explicitly locks the behaviour in as intentional (cloud-metadata SSRF hardening). The CONNECT path at `crates/openshell-sandbox/src/proxy.rs:482-524` and the forward-proxy path at `crates/openshell-sandbox/src/proxy.rs:2218-2259` both flow through the same helper, so the block applies uniformly.

3. **Traced the loader and approval paths to look for upstream validation.** `parse_allowed_ips` at `crates/openshell-sandbox/src/proxy.rs:1623` only emits a warning for very broad CIDRs (`< /16`) and never rejects entries that fall inside the always-blocked ranges. The server-side policy ingest and the TUI draft-approval handler (`handle_approve_draft_chunk` → `merge_chunk_into_policy` at `crates/openshell-server/src/grpc/policy.rs:999` and `:1624`) also perform no such validation — they happily persist a draft chunk whose `host` or `allowed_ips` names an un-enforceable range, mark it approved, and push a revision through `sandbox_watch_bus.notify`. From the operator's seat, the TUI reports success while enforcement silently no-ops.

4. **Traced the denial analyzer.** `mechanistic_mapper::generate_proposals` at `crates/openshell-sandbox/src/mechanistic_mapper.rs:107` builds a `NetworkPolicyRule` from each denial summary, calls `resolve_allowed_ips_if_private` to populate `allowed_ips` when the resolved IP is RFC1918-ish, and emits a proposal. It has no notion of "always-blocked class" — so for denials whose destination IP is structurally un-allowable, it keeps generating fresh proposals on every flush cycle (every ~10s in my repro).

5. **Live repro on sandbox `theseus`.** Dumped the loaded policy with `openshell policy get theseus --full` (rev 9, hash `d47e757…`). User had a clean sandbox-level `aws_iam` rule that sets both the IP in `host:` and a matching `allowed_ips:` entry — policy loaded without warning. Reproduced the 403 myself from inside the sandbox with `openshell sandbox exec -n theseus -- curl -v -X PUT http://169.254.169.254/latest/api/token` — the client connected via `http_proxy=http://10.200.0.1:3128` (the sandbox forward proxy) and got `HTTP/1.1 403 Forbidden`. The corresponding denial event appeared in `openshell logs theseus` tagged `[policy:aws_iam]` (see Logs section below). That tag is emitted at `crates/openshell-sandbox/src/proxy.rs:434`:

    ```rust
    let policy_str = matched_policy.as_deref().unwrap_or("-");
    ```

    so `[policy:aws_iam]` proves OPA matched the rule cleanly at both the endpoint-allowed and binary-allowed layers — the 403 is strictly downstream at the SSRF / always-blocked layer, not an OPA miss.

6. **Checked the OCSF JSONL sink to see `status_detail`.** The shorthand OCSF text format drops the `status_detail` field, which is where the explanatory phrase `"resolves to always-blocked address"` lives (set at `crates/openshell-sandbox/src/proxy.rs:508, 543, 584` for CONNECT and `:2244, :2282` for forward). The JSONL file at `/var/log/openshell-ocsf.2026-04-11.log` inside the sandbox was 0 bytes in my repro, so there was no higher-fidelity log to fall back on. This is a meaningful contributor to the bug symptom — the system is not lying to the operator, it is just not telling them enough: the shorthand denial line reads identically to a plain allowlist miss.

**What could not be resolved by the agent:** The block itself is intentional (covered by a locked-in test) and removing it would regress cloud-metadata SSRF hardening, which is a deliberate security invariant. The fix has to go into validation, messaging, and proposal filtering — see "Suggested fixes" below — none of which I applied because the user wants maintainer review first.

## Description

**Actual behaviour:** In v0.0.26, sandbox policies that whitelist an IP inside any always-blocked range (`127.0.0.0/8`, `169.254.0.0/16`, `0.0.0.0`, `::1`, `fe80::/10`, v4-mapped equivalents) are accepted without warning by the policy loader and by the TUI draft-approval path, but produce 403s on every request at runtime. OPA matches the rule (denial events carry the matched policy name in the log), and the rejection happens downstream at `is_always_blocked_ip` inside `resolve_and_check_allowed_ips`. The shorthand OCSF log line for the denial does not surface the reason, so the operator cannot distinguish "my allowlist is wrong" from "my allowlist can never take effect." The denial analyzer keeps generating fresh draft proposals for the same destination on every flush cycle, so the TUI notification repeatedly returns after approval, reinforcing the illusion that the rule is being processed.

**Expected behaviour:** At least one of the following — ideally all four:

1. The policy loader (and the TUI-approval path on the server) should reject entries whose IP or CIDR overlaps any always-blocked range, with an explicit error identifying the range and the SSRF-hardening rationale, so un-enforceable rules never get persisted in the first place.
2. The runtime denial message should clearly distinguish "allowlist miss" from "destination is structurally un-allowable" so log readers understand what happened.
3. The denial analyzer should skip proposals whose resolved IP set is entirely inside always-blocked ranges, so the TUI does not keep re-surfacing notifications for un-fixable situations.
4. The OCSF JSONL sink inside the sandbox image should actually be populated by default — or the shorthand log format should surface `status_detail` for denials — so operators have a path to the full event context.

**Scope clarification:** The hard block on link-local / loopback / unspecified destinations is correct and should stay. This issue is about the *silent* failure mode around it, not the block itself.

## Reproduction Steps

1. Set a sandbox policy that names an always-blocked IP — AWS IMDS is the canonical case:

    ```yaml
    aws_iam:
      name: aws_iam
      endpoints:
      - host: '169.254.169.254'
        port: 80
        allowed_ips:
        - '169.254.169.254'
      binaries:
      - path: /usr/bin/curl
    ```

    Apply it with `openshell policy set <sandbox> --policy policy.yaml`.

2. Verify it loaded: `openshell policy get <sandbox> --full`. The rule appears verbatim, no warnings, status `Loaded`.

3. From inside the sandbox, hit the IMDSv2 token endpoint:

    ```shell
    openshell sandbox exec -n <sandbox> -- curl -v -X PUT http://169.254.169.254/latest/api/token
    ```

    Result: `HTTP/1.1 403 Forbidden` from the sandbox forward proxy (`http_proxy=http://10.200.0.1:3128`).

4. Check the logs: `openshell logs <sandbox> --source sandbox` shows the denial tagged with the matched policy name — proving OPA matched and the block is downstream:

    ```
    HTTP:PUT [MED] DENIED /usr/bin/curl(1618) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]
    ```

5. (Optional) Leave it running and observe `Flushed denial analysis to gateway proposals=1` repeating in the logs every ~10 s — the analyzer keeps re-proposing the same un-enforceable rule.

## Environment

- OS: Linux 6.17.0-1014-nvidia
- OpenShell: 0.0.26 (`openshell --version`)
- Deployment: local gateway, sandbox-level policy (not gateway-global)
- External trigger: OpenClaw running inside the sandbox initialising AWS SDK credentials — see openclaw/openclaw#64891 for the retry-loop behaviour that makes this very noisy in practice, though the OpenShell bug is independent of OpenClaw.

## Logs

```shell
# Live denial events on sandbox `theseus` (shorthand OCSF format from `openshell logs`):
[1775932282.846] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(1014) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]
[1775932282.854] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(1014) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:aws_iam]
[1775932288.821] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(1040) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]
[1775932288.828] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(1040) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:aws_iam]
# ... repeats every ~6s with a fresh node pid as OpenClaw spawns new children

# Reproduced the same denial from a manual curl inside the sandbox:
[1775932413.437] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/bin/curl(1618) -> PUT http://169.254.169.254/latest/api/token [policy:aws_iam]

# Denial analyzer re-proposing the same rule on every flush:
[1775932313.643] [sandbox] [INFO ] [openshell_sandbox] Flushed denial analysis to gateway proposals=1 sandbox_name=theseus summaries=1
[1775932323.642] [sandbox] [INFO ] [openshell_sandbox] Flushed denial analysis to gateway proposals=1 sandbox_name=theseus summaries=1

# The OCSF JSONL sink inside the sandbox — where `status_detail` would live — was empty:
$ openshell sandbox exec -n theseus -- stat -c '%s %n' /var/log/openshell-ocsf.2026-04-11.log
0 /var/log/openshell-ocsf.2026-04-11.log
```

The `[policy:aws_iam]` tag at the end of each denial line is the key signal: it is emitted at `crates/openshell-sandbox/src/proxy.rs:434` from `matched_policy.as_deref().unwrap_or("-")`, so it can only be set when OPA has returned `NetworkAction::Allow { matched_policy: Some(...) }`. The 403 therefore cannot be an OPA miss — it is happening later, inside `resolve_and_check_allowed_ips` at `crates/openshell-sandbox/src/proxy.rs:1580`, where `is_always_blocked_ip` rejects the destination before the allowlist is consulted.

## Suggested fixes (for maintainer review)

1. **Validate at load time.** In `parse_allowed_ips` (`crates/openshell-sandbox/src/proxy.rs:1623`), reject any parsed `IpNet` that overlaps an always-blocked range, with an error naming the offending entry and explaining the metadata-SSRF rationale. The same check can be shared with a `fn is_always_blocked_net(net: IpNet) -> bool` helper next to `is_always_blocked_ip`.
2. **Validate at ingest/approval time.** Mirror the same check in the server-side policy ingest and in `merge_chunk_into_policy` / `handle_approve_draft_chunk` at `crates/openshell-server/src/grpc/policy.rs:999` and `:1624`, so the TUI surfaces the rejection synchronously instead of silently persisting a no-op rule and clearing the notification.
3. **Disambiguate the denial message.** Split the `FORWARD blocked: allowed_ips check failed for <host>:<port>` top-line at `crates/openshell-sandbox/src/proxy.rs:505-506` and `:2242-2244` into two distinct messages — one for "not in allowlist" and one for "destination is in always-blocked class" — or include `status_detail` in the shorthand OCSF format for denials so the reason travels with the event.
4. **Filter un-fixable proposals.** In `mechanistic_mapper::generate_proposals` (`crates/openshell-sandbox/src/mechanistic_mapper.rs:107`), skip destinations whose resolved IPs fall entirely within the always-blocked ranges, so the analyzer stops re-proposing rules that cannot take effect. Optionally emit a single structured warning event for operators so the denial is still visible but does not re-surface the TUI notification loop.
5. **Separately investigate the JSONL sink.** `/var/log/openshell-ocsf.<date>.log` was 0 bytes inside the sandbox in this repro. Either the default image is missing the JSONL tracing layer or it is pointed elsewhere — worth a separate spike. Having a working JSONL sink would have reduced the severity of this bug because an operator could have grepped `status_detail` for `always-blocked` and diagnosed it in one step.

## Design note

One design question worth flagging: the one always-blocked range with a real use case is `169.254.0.0/16` — `169.254.169.254` is where AWS/GCP/Azure serve instance metadata and IAM credential bootstrap, and every AWS SDK tries IMDSv2 there first on a cloud host. The current hard block pushes operators off instance-role auth and onto long-lived static keys, which is a regression in credential hygiene. The counter is also real: this range is item #1 on every SSRF hardening checklist because of the 2019 Capital One breach, and IMDSv2's PUT-then-GET handshake does not help inside a sandbox where the attacker can trivially issue PUTs — the block's real job here is preventing a prompt-injected agent from stealing the *host's* EC2 role credentials. A blanket relaxation would be wrong, but a narrowly-scoped opt-in for the specific metadata endpoints (as a distinct top-level policy field, not something reached via `allowed_ips:`) might deserve a design look. Orthogonal to the validation and messaging items above; flagging it only because the investigation surfaced the question.

## Related

- openclaw/openclaw#64891 — OpenClaw retry-loops IMDSv2 on 403. Amplifies the log spam in this repro (fresh node pid every ~6s) but is not itself an OpenShell bug.

## Agent-First Checklist

- [x] I pointed my agent at the repo and had it investigate this issue
- [x] I loaded relevant skills (`openshell-cli` for the live repro; no sandbox-specific debug skill was applicable)
- [x] My agent could not resolve this — the diagnostic above explains why (the block is intentional and locked in by a test; the remediation is loader/UX/analyzer validation, which needs maintainer review before modification)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: link-local / loopback entries in allowed_ips are silently accepted but never enforced #814

Agent Diagnostic

Description

Reproduction Steps

Environment

Logs

Suggested fixes (for maintainer review)

Design note

Related

Agent-First Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: link-local / loopback entries in allowed_ips are silently accepted but never enforced #814

Description

Agent Diagnostic

Description

Reproduction Steps

Environment

Logs

Suggested fixes (for maintainer review)

Design note

Related

Agent-First Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions