Foundation: introduce weaver-sandbox for CLI/daemon isolation by leynos · Pull Request #33 · leynos/weaver

leynos · 2025-12-03T20:35:00Z

Summary

Adds a new weaver-sandbox crate that provides Foundation-level sandboxing for Weaver CLI/daemon subprocesses. It wraps birdcage to enforce Linux namespaces, seccomp filters, and strict resource isolation by default.
Exposes a safe, high-level API to configure whitelists (executables, paths), environment and network policies, and to spawn sandboxed commands.
Includes environment guard, error handling, runtime helpers, and a test harness with unit and behavioural tests.
Updates to the workspace, docs, and test tooling to reflect the new sandbox foundation.

Changes

New crate: weaver-sandbox
- Modules: env_guard, error, profile, runtime, sandbox, plus a compact lib.rs that exposes the public API.
- SandboxPolicy and Builders
  - SandboxProfile: configure read-only/read-write executable paths, environment policy, and networking policy.
  - EnvironmentPolicy: Isolated, AllowList, InheritAll (default Isolated).
  - NetworkPolicy: Deny (default), Allow.
- Runtime helpers
  - linux_runtime_roots(): canonical list of standard Linux library roots for read-only access by default.
  - thread_count(): determine active thread count (Linux: reads /proc/self/status).
- Sandbox orchestration
  - Sandbox, SandboxCommand, SandboxChild, SandboxOutput, SandboxError (exported types)
  - spawn() enforces single-threaded context, canonicalises program path, validates whitelist, and applies preflight exceptions via birdcage.
  - EnvGuard captures/restores the parent environment to keep host state clean when sandboxing.
  - collect_exceptions() builds a list of path, environment, and network restrictions to apply in the sandbox.
Integration and build
- workspace Cargo.toml updated to include crates/weaver-sandbox and wire in birdcage as a dependency.
- Makefile adjusted to run tests with RUST_TEST_THREADS=1 to align with single-threaded launcher expectations.
Tests
- New test suites under crates/weaver-sandbox: unit.rs, tests, and behavioural tests in tests/{behaviour, support} mirroring a BDD style.
- Features-based sandbox scenarios added (fixtures, whitelist checks, environment inheritance).
Documentation and roadmap
- docs/users-guide.md updated to describe Sandbox defaults and behaviours (isolation, read-only roots, absolute path whitelisting).
- docs/weaver-design.md includes a section on the initial weaver-sandbox implementation and its expectations.
- docs/roadmap.md marks the sandbox feature as implemented (x).

Why

Provides a secure, predictable foundation for CLI/daemon workflows by isolating subprocesses, controlling inherited environment, restricting filesystem access, and blocking network access by default. This is essential for a zero-trust style of operation in Weaver's CLI/daemon components.

How to test

Run workspace tests:
- cargo test --workspace --all-targets --all-features
Focused sandbox tests:
- cargo test -p weaver-sandbox --tests
Behavioural tests verify:
- Allowed executable access is honoured
- Forbidden files cannot be read
- Environment inheritance is restricted by default and can be whitelisted
- Multi-threaded processes are rejected with a MultiThreaded error
Validate documentation updates by reading the new Sandbox defaults section in the user guide.

Migration / Notes

This is the foundation for CLI/daemon isolation and is Linux-centric (thread_count and /proc-based checks are Linux-specific). Non-Linux hosts will have restricted defaults (e.g., empty linux_runtime_roots and thread_count returning 1).
The new weaver-sandbox crate is added as a workspace member and is designed to be extended with additional policies and policy-driven exceptions in future iterations.

Risk & rollback

Minimal risk as this is a new crate with guarded public API surface. If needed, we can toggle sandbox defaults or disable Linux-specific preflights behind feature flags.

🌿 Generated by Terry

ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/e978d32d-903a-4ae9-b14d-f0f7a4f21136

Summary by Sourcery

Introduce a new weaver-sandbox crate that wraps the birdcage library to provide opinionated sandboxing for Weaver subprocesses, update workspace configuration and docs, and add focused tests for the new sandbox behaviour.

New Features:

Add the weaver-sandbox crate exposing Sandbox, SandboxProfile, and related types to launch subprocesses inside a restrictive birdcage-based sandbox.
Provide configurable environment, networking, and filesystem access policies for sandboxed commands with safe builder-style APIs.
Include runtime helpers to detect Linux runtime library roots and current thread count for sandbox defaults and preflight checks.

Enhancements:

Guard and restore the parent process environment around sandbox activation to avoid leaking environment changes from birdcage.
Add detailed error types for sandbox preflight and activation failures, including non-absolute paths, missing paths, and multi-threaded callers.

Build:

Register weaver-sandbox as a workspace member and add birdcage as a shared workspace dependency.
Adjust the Makefile test target to run weaver-sandbox tests single-threaded and then run the remaining workspace tests separately.

Documentation:

Document sandbox defaults and behaviour in the user guide and design docs, including environment, networking, and path-whitelisting expectations.
Update the project roadmap to mark the initial weaver-sandbox implementation as complete.

Tests:

Add unit and BDD-style behavioural tests for sandbox profiles, environment guarding, preflight validation, and end-to-end sandbox behaviour using fixtures.

Introduces the `weaver-sandbox` crate to provide a restrictive, policy-driven sandbox wrapper around the birdcage library (v0.8.1). This crate enforces Weaver's zero-trust sandbox defaults including: - Absolute and whitelisted executables only - Networking disabled by default - Environment variables isolated unless explicitly allowed - Standard Linux runtime library paths whitelisted for read-only access - Single-threaded enforcement with explicit error reporting Adds comprehensive profile and sandbox implementations, error types, environment guard to restore environment after sandboxing, runtime helpers, and tests using `rstest-bdd`. Updates workspace and dependencies accordingly. Includes documentation updates describing sandbox behavior and roadmap progress. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

coderabbitai · 2025-12-03T20:35:05Z

Warning

Rate limit exceeded

@leynos has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 485ba1c and 955221d.

📒 Files selected for processing (1)

crates/weaver-sandbox/src/sandbox.rs (1 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

Release Notes

New Features
- Introduced sandboxing capability enabling restricted process execution with configurable file access, environment variable controls, and network policies.
Documentation
- Completed sandboxing implementation in project roadmap.
- Added sandbox configuration documentation to user guide.
- Included sandbox architecture details and design specifications.
Tests
- Added comprehensive behavioural test suite for sandbox functionality.
- Included unit tests covering sandbox configuration and error handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Introduce a new crate crates/weaver-sandbox that wraps birdcage v0.8.1 to provide sandboxing: add SandboxProfile with policies, EnvGuard for environment snapshot/restore, runtime preflight checks (thread count, canonical paths), a Sandbox API to spawn restricted processes, tests, BDD features and documentation.

Changes

Cohort / File(s)	Summary
Workspace configuration `Cargo.toml`, `Makefile`	Add workspace member `crates/weaver-sandbox` and workspace dependency `birdcage = "0.8.1"`; modify Makefile test target to run `weaver-sandbox` with `RUST_TEST_THREADS=1` then run workspace tests excluding `weaver-sandbox`.
Sandbox crate manifest `crates/weaver-sandbox/Cargo.toml`	Add new package `weaver-sandbox` (workspace edition/version) and declare workspace-based dependencies and dev-dependencies (`birdcage`, `once_cell`, `thiserror`, `rstest*`, `tempfile`).
Core sandbox implementation `crates/weaver-sandbox/src/env_guard.rs`, `crates/weaver-sandbox/src/error.rs`, `crates/weaver-sandbox/src/lib.rs`, `crates/weaver-sandbox/src/profile.rs`, `crates/weaver-sandbox/src/runtime.rs`, `crates/weaver-sandbox/src/sandbox.rs`	Add EnvGuard to capture/restore environment; introduce SandboxError enum; expose crate API and re-export `birdcage::process`; implement EnvironmentPolicy, NetworkPolicy and SandboxProfile builder; add linux_runtime_roots() and thread_count() helpers; implement Sandbox enforcing single-threaded use, canonicalisation, whitelist validation, exception collection and Birdcage activation to spawn sandboxed commands.
Test infrastructure `crates/weaver-sandbox/src/tests/mod.rs`, `crates/weaver-sandbox/src/tests/behaviour.rs`, `crates/weaver-sandbox/src/tests/support/mod.rs`, `crates/weaver-sandbox/src/tests/unit.rs`, `crates/weaver-sandbox/src/tests/env_guard.rs`, `crates/weaver-sandbox/src/tests/sandbox_unit.rs`, `crates/weaver-sandbox/tests/features/sandbox.feature`	Add unit and behavioural tests, BDD feature file, and TestWorld test harness with EnvHandle and global mutex for serialising environment changes; include fixtures and assertions for sandbox behaviour, environment handling and spawn preflight errors.
Test support primitives `crates/weaver-sandbox/src/tests/support/env.rs`	Add global `ENV_MUTEX` and `lock_env()` helper returning a mutex guard for serialising environment mutations in tests.
Documentation `docs/roadmap.md`, `docs/users-guide.md`, `docs/weaver-design.md`	Mark initial weaver-sandbox roadmap item complete; add "Sandbox defaults" to user guide (inserted in two locations); add "Initial weaver-sandbox implementation" section to design doc describing behaviour, canonicalisation strategy and test guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Sandbox
    participant EnvGuard
    participant Profile
    participant Birdcage
    participant ChildProcess

    Client->>Sandbox: construct(Profile)
    Client->>Sandbox: spawn(Command)

    rect rgb(245,245,220)
        Sandbox->>Sandbox: ensure_single_threaded()
        Sandbox->>Sandbox: canonical_program()
        Sandbox->>Sandbox: ensure_program_whitelisted()
    end

    rect rgb(220,245,245)
        Sandbox->>EnvGuard: capture()
        EnvGuard-->>Sandbox: EnvGuard handle
    end

    rect rgb(220,245,220)
        Sandbox->>Profile: query policies & paths
        Sandbox->>Sandbox: collect_exceptions()
    end

    Sandbox->>Birdcage: activate(exceptions, command)
    Birdcage->>ChildProcess: spawn
    Birdcage-->>Sandbox: SandboxChild
    Sandbox-->>Client: Result<SandboxChild>

    Note right of EnvGuard: On Drop -> restore environment

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Inspect src/sandbox.rs for path canonicalisation, exception collection, Birdcage activation and error conversions.
Review src/env_guard.rs for unsafe environment mutation and correctness of restore logic.
Validate src/runtime.rs Linux-specific parsing of /proc/self/status and non-Linux fallbacks.
Verify profile builder semantics, defaults and EnvironmentPolicy→exceptions mapping in src/profile.rs.
Run and inspect behavioural/integration tests and TestWorld setup/teardown in tests/support/mod.rs.

Poem

🏰 A sandbox forms, secure and tight,
🔐 Where Weaver's tasks run out of sight—
📋 Profiles penned and policies clear,
✨ Birdcage shields the atmosphere,
🎯 One thread to rule them, pure and right!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarises the main change: introducing a new weaver-sandbox crate for CLI/daemon isolation. It is concise, specific, and directly related to the changeset.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset. It explains the purpose, key changes, implementation details, testing approach, and migration notes for the new weaver-sandbox crate.
Docstring Coverage	✅ Passed	Docstring coverage is 97.62% which is sufficient. The required threshold is 80.00%.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai · 2025-12-03T20:35:06Z

Reviewer's Guide

Introduces a new weaver-sandbox crate that wraps the birdcage library to provide a policy-driven sandbox for Weaver subprocesses, with strict defaults for filesystem, environment, networking, and threading; wires it into the workspace, adds targeted tests, and documents the design and usage.

Sequence diagram for Sandbox::spawn sandboxed command execution

sequenceDiagram
    actor Caller
    participant Sandbox
    participant Runtime as RuntimeHelpers
    participant Profile as SandboxProfile
    participant EnvGuard
    participant Birdcage as BirdcageSandbox

    Caller->>Sandbox: spawn(command: SandboxCommand)
    activate Sandbox

    Sandbox->>Runtime: thread_count()
    Runtime-->>Sandbox: Result~usize, io::Error~
    alt thread_count > 1
        Sandbox-->>Caller: Err(SandboxError::MultiThreaded)
    else single-threaded
        Sandbox->>Sandbox: canonical_program(command.get_program())
        alt program not absolute or missing
            Sandbox-->>Caller: Err(SandboxError::ProgramNotAbsolute | MissingPath | CanonicalisationFailed)
        else program canonicalised
            Sandbox->>Profile: executable_paths()
            Profile-->>Sandbox: &[PathBuf]
            Sandbox->>Sandbox: ensure_program_whitelisted(program)
            alt not whitelisted
                Sandbox-->>Caller: Err(SandboxError::ExecutableNotAuthorised)
            else whitelisted
                Sandbox->>EnvGuard: capture()
                EnvGuard-->>Sandbox: EnvGuard
                Sandbox->>Profile: read_only_paths(), read_write_paths(), executable_paths(), environment_policy(), network_policy()
                Profile-->>Sandbox: paths and policies
                Sandbox->>Sandbox: collect_exceptions(program)
                Sandbox->>Birdcage: new()
                Birdcage-->>Sandbox: BirdcageSandbox
                loop for each Exception
                    Sandbox->>Birdcage: add_exception(exception)
                    Birdcage-->>Sandbox: Result~(), BirdcageError~
                end
                alt activation error
                    Sandbox-->>Caller: Err(SandboxError::Activation)
                else activation ok
                    Sandbox->>Birdcage: spawn(command)
                    Birdcage-->>Sandbox: SandboxChild
                    Sandbox->>EnvGuard: drop (restore env)
                    Sandbox-->>Caller: Ok(SandboxChild)
                end
            end
        end
    end
    deactivate Sandbox

Class diagram for the new weaver-sandbox crate

classDiagram
    class Sandbox {
        - SandboxProfile profile
        + new(profile: SandboxProfile) Sandbox
        + spawn(command: SandboxCommand) Result~SandboxChild, SandboxError~
        - ensure_single_threaded() Result~(), SandboxError~
        - ensure_program_whitelisted(program: Path) Result~(), SandboxError~
        - collect_exceptions(program: Path) Result~Vec~Exception~, SandboxError~
        - canonical_program(program: Path) Result~PathBuf, SandboxError~
    }

    class SandboxProfile {
        - Vec~PathBuf~ read_only_paths
        - Vec~PathBuf~ read_write_paths
        - Vec~PathBuf~ executable_paths
        - EnvironmentPolicy environment
        - NetworkPolicy network
        + new() SandboxProfile
        + allow_executable(path: PathBuf) SandboxProfile
        + allow_read_path(path: PathBuf) SandboxProfile
        + allow_read_write_path(path: PathBuf) SandboxProfile
        + allow_environment_variable(key: String) SandboxProfile
        + allow_full_environment() SandboxProfile
        + allow_networking() SandboxProfile
        + read_only_paths() &[PathBuf]
        + read_write_paths() &[PathBuf]
        + executable_paths() &[PathBuf]
        + environment_policy() &EnvironmentPolicy
        + network_policy() NetworkPolicy
    }

    class EnvironmentPolicy {
        <<enum>>
        Isolated
        AllowList(keys: BTreeSet~String~)
        InheritAll
        + with_allowed(key: String) EnvironmentPolicy
        + to_exceptions() Vec~Exception~
    }

    class NetworkPolicy {
        <<enum>>
        Deny
        Allow
        + is_denied() bool
    }

    class EnvGuard {
        - HashMap~OsString, OsString~ original
        + capture() EnvGuard
        + restore() void
        - drop() void
    }

    class SandboxError {
        <<enum>>
        ProgramNotAbsolute(path: PathBuf)
        ExecutableNotAuthorised(program: PathBuf)
        MissingPath(path: PathBuf)
        CanonicalisationFailed(path: PathBuf, source: io::Error)
        MultiThreaded(thread_count: usize)
        ThreadCountUnavailable(source: io::Error)
        Activation(source: BirdcageError)
    }

    class RuntimeHelpers {
        + linux_runtime_roots() Vec~PathBuf~
        + thread_count() Result~usize, io::Error~
    }

    class BirdcageSandbox {
        <<external>>
        + new() BirdcageSandbox
        + add_exception(exception: Exception) Result~(), BirdcageError~
        + spawn(command: SandboxCommand) Result~SandboxChild, BirdcageError~
    }

    class BirdcageProcessTypes {
        <<external>>
        SandboxCommand
        SandboxChild
        SandboxOutput
    }

    class BirdcageException {
        <<external enum>>
        Read(path: PathBuf)
        WriteAndRead(path: PathBuf)
        ExecuteAndRead(path: PathBuf)
        Environment(key: String)
        FullEnvironment
        Networking
    }

    Sandbox --> SandboxProfile : uses
    Sandbox --> EnvGuard : uses
    Sandbox --> BirdcageSandbox : wraps
    Sandbox --> BirdcageException : builds
    Sandbox --> SandboxError : returns
    Sandbox --> RuntimeHelpers : uses

    SandboxProfile --> EnvironmentPolicy : has
    SandboxProfile --> NetworkPolicy : has
    SandboxProfile --> RuntimeHelpers : uses linux_runtime_roots

    EnvironmentPolicy --> BirdcageException : to_exceptions builds

    SandboxError --> BirdcageSandbox : wraps BirdcageError

    BirdcageSandbox --> BirdcageProcessTypes : spawns

    RuntimeHelpers --> BirdcageSandbox : preflight for spawn

File-Level Changes

Change	Details	Files
Add weaver-sandbox crate providing policy configuration and sandbox orchestration over birdcage.	Define EnvironmentPolicy and NetworkPolicy enums and SandboxProfile builder with explicit allowlists for executables, paths, environment, and networking. Implement Sandbox wrapper that enforces single-threaded preflight, requires absolute/canonicalised whitelisted executables, builds birdcage exceptions for FS/env/network, and maps failures to SandboxError. Expose a minimal public API via lib.rs re-exporting sandbox types, policies, error type, and birdcage process primitives.	`crates/weaver-sandbox/src/profile.rs` `crates/weaver-sandbox/src/sandbox.rs` `crates/weaver-sandbox/src/error.rs` `crates/weaver-sandbox/src/runtime.rs` `crates/weaver-sandbox/src/env_guard.rs` `crates/weaver-sandbox/src/lib.rs` `crates/weaver-sandbox/Cargo.toml`
Introduce environment snapshot/restore guard to keep host env clean across sandbox activation and tests.	Implement EnvGuard that captures the full environment, restores previous values, and removes variables created while active. Use EnvGuard in Sandbox::spawn to ensure child env mutations don’t leak to the parent. Add dedicated env_guard tests plus a shared EnvHandle/OnceLock-based mutex to serialize env-mutation tests.	`crates/weaver-sandbox/src/env_guard.rs` `crates/weaver-sandbox/src/tests/env_guard.rs` `crates/weaver-sandbox/src/tests/support/mod.rs`
Provide runtime helpers for Linux runtime roots and thread-count detection used by sandbox defaults and preflights.	Implement linux_runtime_roots() to collect existing standard library directories and use them as default read-only paths in SandboxProfile::new(). Implement thread_count() reading /proc/self/status on Linux and falling back to 1 on non-Linux hosts, used to enforce single-threaded sandbox spawning.	`crates/weaver-sandbox/src/runtime.rs` `crates/weaver-sandbox/src/profile.rs` `crates/weaver-sandbox/src/sandbox.rs`
Add unit and BDD-style behavioural tests for sandbox policies, env handling, and error paths.	Add unit tests for SandboxProfile defaults, environment/network policies, and basic path recording, plus sandbox_unit tests for preflight errors (relative paths, missing programs, unwhitelisted executables). Add EnvGuard unit tests verifying restoration of modified/removed variables and cleanup of newly created variables. Introduce BDD-style behavioural tests with rstest-bdd covering allowed/denied file access and environment inheritance/isolation, backed by TestWorld fixtures and sandbox.feature scenarios.	`crates/weaver-sandbox/src/tests/unit.rs` `crates/weaver-sandbox/src/tests/sandbox_unit.rs` `crates/weaver-sandbox/src/tests/env_guard.rs` `crates/weaver-sandbox/src/tests/behaviour.rs` `crates/weaver-sandbox/src/tests/support/mod.rs` `crates/weaver-sandbox/tests/features/sandbox.feature`
Wire weaver-sandbox into the workspace, adjust test execution, and document sandbox design and defaults.	Register crates/weaver-sandbox as a workspace member and add birdcage as a workspace dependency. Update Makefile test target to run weaver-sandbox tests single-threaded (RUST_TEST_THREADS=1) before running the rest of the workspace, excluding weaver-sandbox from the second run. Extend weaver-design.md with a section describing the initial weaver-sandbox implementation, update users-guide.md with sandbox defaults and behaviour, and mark the sandbox roadmap item as completed.	`Cargo.toml` `Cargo.lock` `Makefile` `docs/weaver-design.md` `docs/users-guide.md` `docs/roadmap.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

In EnvGuard, the use of unsafe for env::set_var and env::remove_var appears unnecessary since the standard APIs are already safe; consider removing the unsafe blocks and relying on the safe versions or adding a strong justification if there's a concrete invariants issue you're guarding against.
The canonicalise/MissingPath check in the sandbox profile prevents whitelisting paths that don't yet exist (e.g., output files that will be created by the sandboxed command); if you expect that use-case, consider allowing non-existent paths when their parent directory is authorised, or making this behaviour configurable.
Setting RUST_TEST_THREADS=1 globally in the top-level Makefile forces the entire workspace test suite to run single-threaded; you might want to scope this to only the weaver-sandbox crate (e.g., via a dedicated test target or per-crate env) to avoid slowing unrelated tests.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `EnvGuard`, the use of `unsafe` for `env::set_var` and `env::remove_var` appears unnecessary since the standard APIs are already safe; consider removing the `unsafe` blocks and relying on the safe versions or adding a strong justification if there's a concrete invariants issue you're guarding against.
- The `canonicalise`/`MissingPath` check in the sandbox profile prevents whitelisting paths that don't yet exist (e.g., output files that will be created by the sandboxed command); if you expect that use-case, consider allowing non-existent paths when their parent directory is authorised, or making this behaviour configurable.
- Setting `RUST_TEST_THREADS=1` globally in the top-level `Makefile` forces the entire workspace test suite to run single-threaded; you might want to scope this to only the `weaver-sandbox` crate (e.g., via a dedicated test target or per-crate env) to avoid slowing unrelated tests.

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/env_guard.rs:32-35` </location>
<code_context>
+        // Remove variables introduced while the guard was active.
+        for (key, _) in env::vars_os() {
+            if !expected_keys.contains(&key) {
+                // SAFETY: keys originate from the host OS and were previously
+                // present in the environment, so removal cannot violate
+                // invariants expected by `std::env`.
+                unsafe { env::remove_var(&key) };
+            }
+        }
</code_context>

<issue_to_address>
**suggestion:** The `unsafe` blocks around `env::remove_var`/`env::set_var` are unnecessary and could be removed to simplify the code.

These functions are explicitly safe and support `OsString` keys/values, so the `unsafe` blocks and `SAFETY` comments don’t add protection and instead increase audit noise. Removing them simplifies the implementation without changing semantics.

Suggested implementation:

```rust
        // Remove variables introduced while the guard was active.
        for (key, _) in env::vars_os() {
            if !expected_keys.contains(&key) {
                env::remove_var(&key);
            }
        }

```

1. Search the rest of `crates/weaver-sandbox/src/env_guard.rs` for any other `unsafe` blocks surrounding `env::set_var` or `env::remove_var`.
2. Apply the same pattern: remove the `unsafe` block and any corresponding `SAFETY` comment, and call `env::set_var` / `env::remove_var` directly as safe functions.
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/sandbox.rs:133-140` </location>
<code_context>
+    Ok(set)
+}
+
+fn canonicalise(path: &Path) -> Result<PathBuf, SandboxError> {
+    if !path.exists() {
+        return Err(SandboxError::MissingPath {
+            path: path.to_path_buf(),
+        });
+    }
+
+    fs::canonicalize(path).map_err(|source| SandboxError::CanonicalisationFailed {
+        path: path.to_path_buf(),
+        source,
</code_context>

<issue_to_address>
**suggestion (bug_risk):** The explicit `exists()` check before `canonicalize` introduces a TOCTTOU window and might be better folded into the error handling.

Because `path.exists()` and `fs::canonicalize(path)` are separate operations, the filesystem can change in between, so you may see different behavior depending on timing. Instead, you can rely on `fs::canonicalize` alone and map `io::ErrorKind::NotFound` to `SandboxError::MissingPath`, which preserves behavior while avoiding both the TOCTTOU window and the extra syscall.
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/unit.rs:36` </location>
<code_context>
+}
+
+#[test]
+fn network_is_denied_by_default() {
+    let profile = SandboxProfile::new();
+    assert_eq!(profile.network_policy(), NetworkPolicy::Deny);
</code_context>

<issue_to_address>
**suggestion (testing):** Extend unit tests to cover `allow_networking` and full-environment inheritance behaviour

Current tests cover the default network policy and environment allowlist deduplication. To complete coverage of `SandboxProfile`’s public API, please also add tests that:

- Use `SandboxProfile::new().allow_networking()` and assert `network_policy() == NetworkPolicy::Allow` and `NetworkPolicy::Allow.is_denied()` is `false`.
- Call `allow_full_environment()` and assert `environment_policy() == EnvironmentPolicy::InheritAll`.
- Optionally confirm the transition `Isolated` → `AllowList` → `InheritAll` (e.g. adding allowlist entries after `allow_full_environment` has no effect).

This will exercise the positive/opt-in paths for networking and environment inheritance.

Suggested implementation:

```rust
#[test]
fn network_is_denied_by_default() {
    let profile = SandboxProfile::new();
    assert_eq!(profile.network_policy(), NetworkPolicy::Deny);
}

#[test]
fn network_can_be_allowed() {
    let profile = SandboxProfile::new().allow_networking();

    assert_eq!(profile.network_policy(), NetworkPolicy::Allow);
    assert!(!NetworkPolicy::Allow.is_denied());
}

#[test]
fn full_environment_can_be_inherited() {
    let profile = SandboxProfile::new().allow_full_environment();

    assert!(matches!(
        profile.environment_policy(),
        EnvironmentPolicy::InheritAll
    ));
}

#[test]
fn read_write_paths_are_recorded() {

```

If your `SandboxProfile` API exposes methods for adding to the environment allowlist (e.g. `allow_env_var`, `allow_environment_key`, etc.), consider adding an additional test to assert that calling that method *after* `allow_full_environment()` does not change the policy away from `EnvironmentPolicy::InheritAll`. The skeleton would be:

```rust
#[test]
fn environment_allowlist_has_no_effect_after_full_inherit() {
    let profile = SandboxProfile::new()
        .allow_full_environment()
        .allow_environment_key("SHOULD_BE_IGNORED");

    assert!(matches!(
        profile.environment_policy(),
        EnvironmentPolicy::InheritAll
    ));
}
```

You will need to replace `allow_environment_key` with the actual allowlist API used elsewhere in this test module to ensure the transition `Isolated → AllowList → InheritAll` behaves as expected and that post-`allow_full_environment` allowlist mutations have no effect.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:47-48` </location>
<code_context>
+    env::set_var("DROP_ME", "remove-me");
+}
+
+#[given("the sandbox allows only KEEP_ME to be inherited")]
+fn given_environment_allowlist(world: &RefCell<TestWorld>) {
+    let mut world = world.borrow_mut();
+    world.configure_env_reader();
</code_context>

<issue_to_address>
**suggestion (testing):** Behavioural tests only cover the allowlist environment policy; add scenarios for default isolation and full inheritance

Currently only the allowlist mode is exercised via the `KEEP_ME` / `DROP_ME` scenario. Please also add BDD scenarios for:

- **Default isolation**: no environment policy configured; the env-reading command should see neither `KEEP_ME` nor `DROP_ME`.
- **Full inheritance**: using `allow_full_environment()`; both `KEEP_ME` and `DROP_ME` should be visible inside the sandbox.

You can reuse the same step definitions with different Given/When combinations so that all environment policies are behaviourally covered.

Suggested implementation:

```rust
        .allow_executable(world.command.as_ref().unwrap().get_program())
        .allow_read_path(&world.allowed_file);
}

#[given("environment variables KEEP_ME and DROP_ME are set")]
fn given_environment_variables(_world: &RefCell<TestWorld>) {
    env::set_var("KEEP_ME", "present");
    env::set_var("DROP_ME", "remove-me");
}

#[given("the sandbox allows only KEEP_ME to be inherited")]
fn given_environment_allowlist(world: &RefCell<TestWorld>) {
    let mut world = world.borrow_mut();
    world.configure_env_reader();
    world.profile = world
        .profile
        .clone()
        .allow_environment_variable("KEEP_ME");
}

#[given("the sandbox uses the default environment isolation")]
fn given_environment_default_isolation(world: &RefCell<TestWorld>) {
    // Configure the env-reading command but do not enable any environment
    // inheritance policy; this exercises the default isolation behaviour.
    let mut world = world.borrow_mut();
    world.configure_env_reader();
    // Intentionally do not modify `world.profile` here.
}

#[given("the sandbox inherits the full environment")]
fn given_environment_full_inheritance(world: &RefCell<TestWorld>) {
    // Configure the env-reading command and enable full environment
    // inheritance so that all variables (KEEP_ME and DROP_ME) are visible.
    let mut world = world.borrow_mut();
    world.configure_env_reader();
    world.profile = world.profile.clone().allow_full_environment();
}

#[when("the sandbox launches the command")]
fn when_launch(world: &RefCell<TestWorld>) {
    world.borrow_mut().launch();
}

```

To fully exercise these new step definitions, add corresponding scenarios in the relevant `.feature` file, for example:
1. A scenario using:
   - `Given environment variables KEEP_ME and DROP_ME are set`
   - `And the sandbox uses the default environment isolation`
   - `When the sandbox launches the command`
   - `Then the environment inside the sandbox does not contain KEEP_ME or DROP_ME` (reusing your existing Then-steps for negative checks).
2. A scenario using:
   - `Given environment variables KEEP_ME and DROP_ME are set`
   - `And the sandbox inherits the full environment`
   - `When the sandbox launches the command`
   - `Then the environment inside the sandbox contains KEEP_ME and DROP_ME`.

If your existing Then-steps assert only the allowlist behaviour, you will need to add/adjust them so that they can assert both presence and absence of individual variables for these new scenarios.
</issue_to_address>

### Comment 5
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:113-114` </location>
<code_context>
+    );
+}
+
+#[then("environment markers are cleaned up")]
+fn then_environment_cleaned(_world: &RefCell<TestWorld>) {
+    env::remove_var("KEEP_ME");
+    env::remove_var("DROP_ME");
</code_context>

<issue_to_address>
**suggestion (testing):** Consider making environment cleanup more robust in behavioural tests

This step only runs if the scenario reaches it; if a step fails earlier, `KEEP_ME` and `DROP_ME` can leak into later tests. To avoid cross-test contamination, consider wrapping each scenario in an env-guard style helper/fixture that snapshots and restores `std::env` (or an `rstest` fixture with `Drop`-based cleanup) so cleanup is guaranteed regardless of step ordering or panics.

Suggested implementation:

```rust
#[then("environment markers are cleaned up")]
fn then_environment_cleaned(_world: &RefCell<TestWorld>) {
    // This step asserts that the environment has been cleaned up by the
    // scenario-level env guard / fixture, rather than performing cleanup itself.
    assert!(
        env::var("KEEP_ME").is_err(),
        "KEEP_ME was expected to be unset after scenario cleanup"
    );
    assert!(
        env::var("DROP_ME").is_err(),
        "DROP_ME was expected to be unset after scenario cleanup"
    );
}

```

To fully implement robust environment cleanup (so it happens even if steps fail or panic), you should also:

1. Introduce an environment guard that snapshots and restores the process environment:
   - Define a helper like:
     ```rust
     struct EnvGuard {
         snapshot: std::collections::HashMap<std::ffi::OsString, std::ffi::OsString>,
     }

     impl EnvGuard {
         fn new() -> Self {
             let snapshot = std::env::vars_os().collect();
             Self { snapshot }
         }
     }

     impl Drop for EnvGuard {
         fn drop(&mut self) {
             use std::collections::HashSet;
             use std::env;

             let before = &self.snapshot;
             let after: std::collections::HashMap<_, _> = env::vars_os().collect();

             // Remove any variables that were added after the snapshot.
             let before_keys: HashSet<_> = before.keys().cloned().collect();
             for (k, _) in &after {
                 if !before_keys.contains(k) {
                     env::remove_var(k);
                 }
             }

             // Restore original values.
             for (k, v) in before {
                 env::set_var(k, v);
             }
         }
     }
     ```
   - Place this near your test helpers (either in this file or a shared test module), and add the necessary imports (`HashMap`, `HashSet`, `OsString`, etc.) at the top of the file/module.

2. Tie the guard to the scenario/world lifecycle so it is always dropped:
   - Option A: Add an `EnvGuard` field to `TestWorld` and construct it in `TestWorld`'s constructor/`Default` implementation. That way, when the world is dropped at the end of a scenario, the guard’s `Drop` runs and restores the environment.
   - Option B: If your cucumber setup supports per-scenario fixtures, create the `EnvGuard` at the start of each scenario and keep it alive for the scenario’s duration (e.g. in a wrapper around `TestWorld` or a dedicated per-scenario struct).

3. Ensure that any steps that mutate `std::env` rely on this guard rather than local ad‑hoc cleanup, so environment isolation is guaranteed even when a step panics or the scenario aborts early.
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/tests/support/mod.rs:15` </location>
<code_context>
+use crate::process::Stdio;
+
+/// Shared state for behavioural sandbox tests.
+pub struct TestWorld {
+    pub profile: SandboxProfile,
+    pub command: Option<SandboxCommand>,
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for `EnvGuard` behaviour to ensure environment restoration is correct and future changes remain safe

Current behavioural tests only exercise `EnvGuard` indirectly via environment mutation. Because `EnvGuard::restore` uses `unsafe`, it would be safer to add focused tests that: (1) snapshot the environment, perform add/change/remove operations, drop the guard, and assert the environment is exactly restored; and (2) confirm that variables created during the guard’s lifetime are removed while pre-existing ones retain their original values. Placing these in a dedicated `env_guard_tests` module alongside the other unit tests would help ensure the restoration logic and `unsafe` usage remain correct over time.

Suggested implementation:

```rust
// Sandbox test support.
use crate::sandbox::{Sandbox, SandboxChild, SandboxCommand, SandboxOutput};
use crate::process::Stdio;

#[cfg(test)]
mod env_guard_tests {
    use std::env;

    use crate::sandbox::EnvGuard;

    /// Ensures that the environment is fully restored after a mix of
    /// additions, changes, and removals while an EnvGuard is active.
    #[test]
    fn restores_modified_and_removed_environment_variables() {
        const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
        const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
        const EPHEMERAL: &str = "WEAVER_ENV_GUARD_EPHEMERAL";

        // Establish a known baseline outside the guard.
        env::set_var(EXISTING, "original");
        env::set_var(ANOTHER, "keep");

        // Snapshot the environment.
        let guard = EnvGuard::new().expect("failed to create EnvGuard");

        // Mutate environment in various ways while the guard is active.
        env::set_var(EXISTING, "changed");      // modify existing
        env::remove_var(ANOTHER);               // remove existing
        env::set_var(EPHEMERAL, "ephemeral");   // add new

        // Dropping the guard must restore the environment to its original state.
        drop(guard);

        // Pre-existing variables must be restored to their original values.
        assert_eq!(env::var(EXISTING).as_deref(), Ok("original"));
        assert_eq!(env::var(ANOTHER).as_deref(), Ok("keep"));

        // Variables created during the guard's lifetime must not remain.
        assert!(env::var(EPHEMERAL).is_err());

        // Cleanup baseline variables for other tests.
        env::remove_var(EXISTING);
        env::remove_var(ANOTHER);
    }

    /// Ensures that variables created only during the guard's lifetime
    /// are removed, while pre-existing variables retain their value.
    #[test]
    fn removes_variables_created_during_guard_lifetime() {
        const PRE_EXISTING: &str = "WEAVER_ENV_GUARD_PRE_EXISTING";
        const CREATED: &str = "WEAVER_ENV_GUARD_CREATED";

        // Set up an initial variable before we create the guard.
        env::set_var(PRE_EXISTING, "value");

        let guard = EnvGuard::new().expect("failed to create EnvGuard");

        // Create a variable that should not survive after the guard is dropped.
        env::set_var(CREATED, "temporary");

        // Sanity-check that both variables are visible while the guard is active.
        assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));
        assert_eq!(env::var(CREATED).as_deref(), Ok("temporary"));

        // Dropping the guard should restore the environment snapshot.
        drop(guard);

        // The pre-existing variable must still have its original value.
        assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));

        // The variable created during the guard's lifetime must be removed.
        assert!(env::var(CREATED).is_err());

        // Cleanup baseline variable.
        env::remove_var(PRE_EXISTING);
    }
}

```

These tests assume that:

1. `EnvGuard` is exported from `crate::sandbox` and is constructible via `EnvGuard::new() -> Result<EnvGuard, _>`.
2. Dropping `EnvGuard` restores the environment snapshot (i.e. `Drop` calls the `unsafe` `restore` implementation).

If the actual API differs, you should adjust the tests accordingly, for example:
- If the constructor is named differently (e.g. `EnvGuard::capture()`, `EnvGuard::snapshot()`, or a plain `EnvGuard::new()` without `Result`), update the creation call.
- If `EnvGuard` lives in a different module (e.g. `crate::process::EnvGuard`), fix the import path.
- If restoration is triggered by an explicit `restore()` rather than `Drop`, keep the tests’ logic the same but call that method instead of `drop(guard)`.
</issue_to_address>

### Comment 7
<location> `crates/weaver-sandbox/src/env_guard.rs:9` </location>
<code_context>
+
+/// Restores the parent process environment after `birdcage` strips variables.
+#[derive(Debug)]
+pub struct EnvGuard {
+    original: Vec<(OsString, OsString)>,
+}
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying `EnvGuard::restore` to work on in-memory snapshots using a temporary `HashSet` so you can avoid rebuilding a `BTreeSet` and remove the unnecessary `unsafe` blocks around environment mutation.

You can simplify `EnvGuard` and remove both the `BTreeSet` and the `unsafe` blocks by doing all diffs in memory on snapshots.

Two focused changes:

1. **Avoid rebuilding a `BTreeSet` every restore**

You don’t need a sorted set, and you don’t need to recompute it on every `restore`. Either compute a `HashSet` once in `capture`, or just build one locally from `self.original` inside `restore`.

For example, keep the struct as-is and simplify `restore`:

```rust
use std::collections::HashSet;
// ...

fn restore(&self) {
    // Snapshot current environment before mutating it.
    let current: Vec<(OsString, OsString)> = env::vars_os().collect();

    let expected_keys: HashSet<&OsString> =
        self.original.iter().map(|(k, _)| k).collect();

    // Remove vars that were introduced while the guard was active.
    for (key, _) in &current {
        if !expected_keys.contains(key) {
            env::remove_var(key);
        }
    }

    // Restore the original variables.
    for (key, value) in &self.original {
        env::set_var(key, value);
    }
}
```

That lets you:

- Drop `original_keys` entirely.
- Replace `BTreeSet` with a simpler `HashSet<&OsString>` created on the fly.
- Avoid extra allocations per key (no cloning of `OsString` just to test membership).

2. **Remove unnecessary `unsafe`**

Wrapping `env::remove_var`/`env::set_var` in `unsafe` doesn’t buy anything; they are safe functions and the real issue is mutating while iterating the environment. By snapshotting first (`current: Vec<_> = env::vars_os().collect()`), you never mutate during iteration, so all `unsafe` blocks can be removed without changing behavior.

With these adjustments, `EnvGuard` stays functionally identical but is simpler, purely safe, and avoids the extra data structure/extra passes.
</issue_to_address>

### Comment 8
<location> `crates/weaver-sandbox/src/lib.rs:1` </location>
<code_context>
+//! Sandboxing utilities for Weaver processes.
+//!
+//! The `weaver-sandbox` crate wraps the [`birdcage`] library with policy
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit tests that exercise the new weaver-sandbox crate’s functionality.

Add unit tests (e.g. in `src/tests/unit.rs`) that validate core behaviours such as profile defaults, environment policies, path canonicalisation and authorisation logic, and error cases (e.g. `ProgramNotAbsolute`, `MissingPath`, `MultiThreaded`). Add behavioural tests (e.g. in `src/tests/behaviour.rs` and the BDD feature you stubbed in `tests/features/sandbox.feature`) that launch real processes under the sandbox and verify the documented guarantees: absolute-path enforcement, whitelist enforcement, environment isolation, default network denial, and successful execution of whitelisted binaries.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural and unit tests are required.

</details>
</issue_to_address>

### Comment 9
<location> `crates/weaver-sandbox/src/env_guard.rs:35` </location>
<code_context>
+                // SAFETY: keys originate from the host OS and were previously
+                // present in the environment, so removal cannot violate
+                // invariants expected by `std::env`.
+                unsafe { env::remove_var(&key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used here even though `std::env::remove_var` is a safe API, which violates the guideline to avoid `unsafe` unless absolutely necessary.

The call to `env::remove_var` is already safe and does not require an `unsafe` block. Please remove the `unsafe` here (and below for `set_var`) unless there is a specific unsafe API being invoked that cannot be expressed safely.

If there is a subtle invariants-related reason this must be unsafe, it should be documented explicitly and ideally wrapped in a narrowly scoped helper so that other code can remain safe.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

### Comment 10
<location> `crates/weaver-sandbox/src/env_guard.rs:43` </location>
<code_context>
+            // SAFETY: keys and values were captured from the process
+            // environment before sandboxing mutated it, so restoring them
+            // preserves the prior state without introducing invalid data.
+            unsafe { env::set_var(key, value) };
+        }
+    }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is also used around `std::env::set_var`, which is a safe function and does not justify an unsafe block under the stated guidelines.

Similar to the `remove_var` call above, `env::set_var` is safe to call and does not require an `unsafe` block. Please drop the `unsafe` here unless you are actually calling an unsafe API, in which case the reason should be clearly justified and isolated.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

### Comment 11
<location> `crates/weaver-sandbox/src/profile.rs:92` </location>
<code_context>
+    /// Whitelists an environment variable for inheritance.
+    #[must_use]
+    pub fn allow_environment_variable(mut self, key: impl Into<String>) -> Self {
+        match &mut self.environment {
+            EnvironmentPolicy::Isolated => {
+                let mut allow = BTreeSet::new();
</code_context>

<issue_to_address>
**issue (review_instructions):** This `match` over `EnvironmentPolicy` has three branches but is implemented inline instead of being moved into a dedicated predicate/helper as required.

The guideline asks for conditionals with more than two branches to be moved into a predicate (or helper) function. Here you could, for example, add a method on `EnvironmentPolicy` to return an updated policy when allowing a new variable, and call that from `allow_environment_variable` instead of inlining the three-way `match`.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Move conditionals with >2 branches into a predicate function.

</details>
</issue_to_address>

### Comment 12
<location> `crates/weaver-sandbox/src/sandbox.rs:98` </location>
<code_context>
+
+        exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
+
+        match self.profile.environment_policy() {
+            EnvironmentPolicy::Isolated => {}
+            EnvironmentPolicy::AllowList(keys) => {
</code_context>

<issue_to_address>
**issue (review_instructions):** This three-branch `match` over `EnvironmentPolicy` is another multi-branch conditional that is not factored into a predicate/helper as requested.

To align with the instruction to move conditionals with more than two branches into a predicate function, consider extracting this `EnvironmentPolicy` handling into a helper (e.g. a method that returns a collection of environment-related `Exception`s) and calling that from here. That keeps the branching logic encapsulated and re-usable.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Move conditionals with >2 branches into a predicate function.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

leynos · 2025-12-04T10:06:38Z

@coderabbitai review

coderabbitai · 2025-12-04T10:06:54Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (4)

crates/weaver-sandbox/src/env_guard.rs (1)

22-24: Simplify key tracking to avoid unnecessary allocations.

The original_keys method clones every key into a BTreeSet, but restoration only needs membership testing and doesn't require sorting. Build a HashSet<&OsString> directly in restore from self.original to eliminate allocations and improve performance.

Apply this diff:
-    fn original_keys(&self) -> BTreeSet<OsString> {
-        self.original.iter().map(|(key, _)| key.clone()).collect()
-    }
-
     fn restore(&self) {
-        let expected_keys = self.original_keys();
+        let expected_keys: std::collections::HashSet<&OsString> =
+            self.original.iter().map(|(k, _)| k).collect();

         // Remove variables introduced whilst the guard was active.
Remove the unused import:
-use std::collections::BTreeSet;
+use std::collections::HashSet;

crates/weaver-sandbox/src/profile.rs (1)

89-104: Extract multi-branch match into a helper method.

The three-branch match over EnvironmentPolicy violates the guideline requiring conditionals with more than two branches to be moved into a predicate or helper function. Extract this logic into a method on EnvironmentPolicy, such as with_variable(self, key: String) -> Self, to improve readability and adhere to project standards.

Apply this diff to profile.rs:

 impl EnvironmentPolicy {
+    /// Returns a new policy with the given variable allowed.
+    fn with_variable(self, key: String) -> Self {
+        match self {
+            Self::Isolated => {
+                let mut allow = BTreeSet::new();
+                allow.insert(key);
+                Self::AllowList(allow)
+            }
+            Self::AllowList(mut keys) => {
+                let _ = keys.insert(key);
+                Self::AllowList(keys)
+            }
+            Self::InheritAll => self,
+        }
+    }
 }

 impl SandboxProfile {
     // ... other methods ...

     /// Whitelists an environment variable for inheritance.
     #[must_use]
     pub fn allow_environment_variable(mut self, key: impl Into<String>) -> Self {
-        match &mut self.environment {
-            EnvironmentPolicy::Isolated => {
-                let mut allow = BTreeSet::new();
-                allow.insert(key.into());
-                self.environment = EnvironmentPolicy::AllowList(allow);
-            }
-            EnvironmentPolicy::AllowList(keys) => {
-                let _ = keys.insert(key.into());
-            }
-            EnvironmentPolicy::InheritAll => {}
-        }
+        self.environment = self.environment.clone().with_variable(key.into());
         self
     }

As per coding guidelines, conditionals with more than two branches must be extracted.

crates/weaver-sandbox/src/sandbox.rs (2)

98-106: Extract the EnvironmentPolicy match into a helper method.

This three-branch match violates the guideline to move conditionals with more than two branches into a predicate or helper function. Extract this into a method such as fn environment_exceptions(&self) -> Vec<Exception> to encapsulate the branching logic and improve readability.

+    fn environment_exceptions(&self) -> Vec<Exception> {
+        match self.profile.environment_policy() {
+            EnvironmentPolicy::Isolated => Vec::new(),
+            EnvironmentPolicy::AllowList(keys) => {
+                keys.iter().map(|key| Exception::Environment(key.clone())).collect()
+            }
+            EnvironmentPolicy::InheritAll => vec![Exception::FullEnvironment],
+        }
+    }

Then call exceptions.extend(self.environment_exceptions()); in collect_exceptions.

133-144: Eliminate the TOCTTOU window by removing the redundant exists() check.

The explicit path.exists() check before fs::canonicalize() introduces a time-of-check-to-time-of-use race condition. Since fs::canonicalize() returns io::ErrorKind::NotFound when the path does not exist, rely on that alone and map the error appropriately.

 fn canonicalise(path: &Path) -> Result<PathBuf, SandboxError> {
-    if !path.exists() {
-        return Err(SandboxError::MissingPath {
-            path: path.to_path_buf(),
-        });
-    }
-
-    fs::canonicalize(path).map_err(|source| SandboxError::CanonicalisationFailed {
-        path: path.to_path_buf(),
-        source,
-    })
+    fs::canonicalize(path).map_err(|source| {
+        if source.kind() == std::io::ErrorKind::NotFound {
+            SandboxError::MissingPath {
+                path: path.to_path_buf(),
+            }
+        } else {
+            SandboxError::CanonicalisationFailed {
+                path: path.to_path_buf(),
+                source,
+            }
+        }
+    })
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 312246b and 0382b07.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (17)

Cargo.toml (2 hunks)
Makefile (1 hunks)
crates/weaver-sandbox/Cargo.toml (1 hunks)
crates/weaver-sandbox/src/env_guard.rs (1 hunks)
crates/weaver-sandbox/src/error.rs (1 hunks)
crates/weaver-sandbox/src/lib.rs (1 hunks)
crates/weaver-sandbox/src/profile.rs (1 hunks)
crates/weaver-sandbox/src/runtime.rs (1 hunks)
crates/weaver-sandbox/src/sandbox.rs (1 hunks)
crates/weaver-sandbox/src/tests/behaviour.rs (1 hunks)
crates/weaver-sandbox/src/tests/mod.rs (1 hunks)
crates/weaver-sandbox/src/tests/support/mod.rs (1 hunks)
crates/weaver-sandbox/src/tests/unit.rs (1 hunks)
crates/weaver-sandbox/tests/features/sandbox.feature (1 hunks)
docs/roadmap.md (1 hunks)
docs/users-guide.md (1 hunks)
docs/weaver-design.md (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

Cargo.toml

📄 CodeRabbit inference engine (AGENTS.md)

Cargo.toml: Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
All crate versions specified in Cargo.toml must use SemVer-compatible caret requirements (e.g., some-crate = "1.2.3"); wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden; tilde requirements (~) should only be used with documented reason.

Files:

Cargo.toml

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing, which execute cargo fmt --workspace -- --check, cargo clippy --workspace --all-targets --all-features -- -D warnings, and cargo test --workspace respectively.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality; run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated tests with #[rstest(...)] parameterised cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line function definitions in Rust where appropriate (e.g., `pub fn new(id: u64) -> Self { ...

Files:

crates/weaver-sandbox/src/tests/unit.rs
crates/weaver-sandbox/src/tests/mod.rs
crates/weaver-sandbox/src/env_guard.rs
crates/weaver-sandbox/src/lib.rs
crates/weaver-sandbox/src/runtime.rs
crates/weaver-sandbox/src/error.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/profile.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/sandbox.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap() in tests.

.expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and...

Files:

crates/weaver-sandbox/src/tests/unit.rs
crates/weaver-sandbox/src/tests/mod.rs
crates/weaver-sandbox/src/env_guard.rs
crates/weaver-sandbox/src/lib.rs
crates/weaver-sandbox/src/runtime.rs
crates/weaver-sandbox/src/error.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/profile.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/sandbox.rs

docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Documentation must use en-GB-oxendict ("-ize" / "-yse" / "-our") spelling and grammar, except for the "LICENSE" file which is left unchanged for community consistency.

docs/**/*.md: Use British English based on Oxford English Dictionary (en-GB-oxendict), including: -ize suffix (realize, organization), -lyse suffix (analyse, paralyse), -our suffix (colour, behaviour), -re suffix (calibre, centre), double 'l' (cancelled, counsellor), maintain 'e' (likeable, liveable), -ogue suffix (analogue, catalogue)
Use US spelling when used in an API, for example color
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns in documentation, for example: 'Lille Industries are expanding'
Write headings in sentence case using Markdown headings (#, ##, ###) in order without skipping levels
Follow markdownlint recommendations for Markdown consistency
Provide code blocks and lists using standard Markdown syntax, always providing a language identifier for fenced code blocks (use plaintext for non-code text)
Use - as the first level bullet and renumber lists when items change in documentation
Prefer inline links using [text](url) or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row in documentation
Expand any uncommon acronym on first use in documentation, for example, Continuous Integration (CI)
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use footnotes referenced with [^label] in documentation
Include Mermaid diagrams where they add clarity to documentation
When embedding figures in documentation, use ![alt text](path/to/image) and provide brief alt text describing the content
Add a short description before each Mermaid diagram so screen readers can understand it
Define roa...

Files:

docs/roadmap.md
docs/users-guide.md
docs/weaver-design.md

**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Validate Markdown files using make markdownlint.
Run make fmt after any documentation changes to format all Markdown files and fix table markup.
Validate Mermaid diagrams in Markdown files by running make nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Tables and headings in Markdown must not be wrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

docs/roadmap.md
docs/users-guide.md
docs/weaver-design.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Headings must not be wrapped.

Documents must start with a level 1 heading

Headings must correctly increase or decrease by no more than one level at a time

Use GitHub-flavoured Markdown style for footnotes and endnotes.

Numbered footnotes must be numbered by order of appearance in the document.

Files:

docs/roadmap.md
docs/users-guide.md
docs/weaver-design.md

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: leynos/weaver PR: 0
File: docs/roadmap.md:0-0
Timestamp: 2025-12-03T19:40:08.780Z
Learning: Implement `weaver-sandbox` crate using `birdcage` for focused scope, prioritising robust Linux support via namespaces and seccomp-bpf

📚 Learning: 2025-12-03T19:40:08.780Z

Learnt from: CR
Repo: leynos/weaver PR: 0
File: docs/roadmap.md:0-0
Timestamp: 2025-12-03T19:40:08.780Z
Learning: Implement `weaver-sandbox` crate using `birdcage` for focused scope, prioritising robust Linux support via namespaces and seccomp-bpf

Applied to files:

docs/roadmap.md
crates/weaver-sandbox/src/lib.rs
docs/weaver-design.md

🧬 Code graph analysis (6)

crates/weaver-sandbox/src/env_guard.rs (1)

crates/weaver-config/tests/configuration_failfast.rs (1)

set_var (21-32)

crates/weaver-sandbox/src/runtime.rs (3)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (26-43)

crates/weaver-sandbox/src/tests/support/mod.rs (2)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/profile.rs (3)

crates/weaver-sandbox/src/runtime.rs (1)

linux_runtime_roots (9-37)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (26-43)

crates/weaver-sandbox/src/tests/behaviour.rs (4)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (26-43)

crates/weaver-config/tests/configuration_failfast.rs (1)

set_var (21-32)

crates/weaver-sandbox/src/sandbox.rs (3)

crates/weaver-sandbox/src/runtime.rs (1)

thread_count (40-60)

crates/weaver-sandbox/src/profile.rs (2)

new (58-66)

network_policy (142-144)

crates/weaver-sandbox/src/env_guard.rs (2)

capture (16-20)

drop (49-51)

🪛 LanguageTool

docs/users-guide.md

[misspelling] ~149-~149: This word is normally spelled as one.
Context: ...wlist before launch; requests made from multi-threaded contexts return a MultiThreaded error...

(EN_COMPOUNDS_MULTI_THREADED)

🔍 Remote MCP Deepwiki, Ref

Context Summary for PR Review

Based on my research, here is the relevant context for reviewing this pull request:

Project Context

The weaver project provides a code intelligence platform for AI agents, enabling them to interact with codebases at a semantic level. It employs a client-daemon architecture where the weaverd daemon is a long-running background process that starts and supervises language servers. The new weaver-sandbox crate is being introduced to secure subprocess execution, which is essential because weaver wraps project-specific build and test commands as subprocesses and executes them to parse their output into diagnostic objects.

Birdcage Library Context

The PR wraps birdcage 0.8.1, which provides a cross-platform embedded sandbox API for macOS and Linux. Birdcage's Exception type defines resource access rules including: Read (read-only path access), WriteAndRead (read-write access), ExecuteAndRead (execute and read access), Environment (specific environment variable access), FullEnvironment (all env vars), and Networking. The sandbox operates on a zero-trust model where by default everything is prohibited unless explicitly allowed via exceptions.

Security Approach

The implementation aligns with weaver's existing security strategy. Weaver emphasizes safe code modification through transient edits (in-memory file overlays), atomic file operations, and dry-run flags. The weaver-sandbox crate extends this by adding process-level isolation for external tool execution.

Key Implementation Details

The PR adds:

Single-threaded enforcement: Birdcage requires single-threaded contexts, with the implementation enforcing this via thread_count() checks and returning MultiThreaded errors when violations occur. This is reflected in the Makefile change to set RUST_TEST_THREADS=1.
Path canonicalization: Executable paths are canonicalized to prevent symlink escapes, and all paths are validated against whitelists.
Environment isolation: Default Isolated policy prevents environment variable leakage, with allowlist and inherit-all options available.
Zero-trust defaults: Networking is denied by default; read-only filesystem access restricted to standard library roots.

Testing Strategy

The PR uses rstest-bdd for behavior-driven testing with feature scenarios covering fixture access controls, environment handling, and multi-threaded rejection. Tests run with RUST_TEST_THREADS=1 to satisfy birdcage's single-threaded requirement.,

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Sourcery review

🔇 Additional comments (26)

Makefile (1)

20-20: LGTM! Single-threaded test execution enforced correctly.

The RUST_TEST_THREADS=1 setting aligns with birdcage's single-threaded requirement and the sandbox's preflight checks that reject multi-threaded contexts.

crates/weaver-sandbox/src/runtime.rs (1)

39-60: LGTM! Thread count parsing is robust.

The implementation correctly reads /proc/self/status, parses the Threads: field with descriptive errors for missing or malformed data, and provides a sensible fallback for non-Linux platforms.

crates/weaver-sandbox/src/tests/mod.rs (1)

1-5: LGTM! Test module structure is clear.

The test suite organisation (behaviour, support, unit) follows standard Rust conventions.

docs/roadmap.md (1)

56-58: LGTM! Roadmap status accurately reflects the implementation.

The weaver-sandbox crate has been implemented with birdcage 0.8.1, Linux namespace support, and seccomp-bpf, as documented in the PR.

Based on learnings, this implementation uses birdcage for focused scope and prioritises robust Linux support.

Cargo.toml (1)

7-7: LGTM! Workspace configuration is correct.

The weaver-sandbox member and birdcage dependency are properly configured with appropriate SemVer versioning.

Also applies to: 31-31

crates/weaver-sandbox/src/tests/unit.rs (1)

7-55: LGTM! Unit tests cover core sandbox configuration behaviour.

The tests verify platform-specific runtime roots, environment variable deduplication, default network denial, and path recording. The test assertions are appropriate and follow best practices.

A previous review comment suggests extending coverage to include allow_networking() and allow_full_environment() transitions. Whilst the current tests are sufficient for the MVP, those additions would strengthen the test suite.

crates/weaver-sandbox/tests/features/sandbox.feature (1)

1-25: LGTM! BDD scenarios comprehensively cover sandbox isolation.

The feature scenarios test the critical behaviours: whitelist-based file access, denial of unauthorised paths, and environment variable isolation. The Given/When/Then structure is clear and aligns with the sandbox's zero-trust defaults.

docs/users-guide.md (1)

142-153: LGTM! Sandbox documentation is clear and accurate.

The section correctly describes the weaver-sandbox wrapper, zero-trust defaults, and the MultiThreaded error behaviour. The hyphenation in "multi-threaded" is correct for British English compound adjectives.

Note: The static analysis hint about "multi-threaded" is a false positive for en-GB-oxendict style.

crates/weaver-sandbox/src/env_guard.rs (4)

1-11: Module documentation and struct definition are clear.

The module-level comment succinctly describes the purpose, and the EnvGuard structure is appropriately minimal.

13-20: Capture method is straightforward.

The #[must_use] annotation correctly signals that callers should not discard the guard, ensuring restoration occurs.

26-45: Environment mutation requires unsafe per Rust 2024 guidelines.

The unsafe blocks wrapping env::remove_var and env::set_var are required. The coding guidelines explicitly state: "Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such." The SAFETY comments correctly justify the invariants, and the implementation aligns with the pattern used elsewhere in the codebase (see crates/weaver-config/tests/configuration_failfast.rs:20-31).

Note: Past review comments claiming the unsafe blocks are unnecessary contradict the project's coding guidelines and should be disregarded.

Based on learnings and as per coding guidelines.

48-52: Drop implementation correctly enforces restoration.

The Drop implementation ensures environment restoration occurs automatically when the guard leaves scope.

docs/weaver-design.md (1)

983-999: Documentation accurately describes the sandbox implementation.

The new section 5.3 clearly articulates the weaver-sandbox implementation strategy, including the birdcage wrapper, absolute-path requirements, whitelist canonicalisation, single-threaded enforcement, and test constraints. The content aligns with the codebase changes and uses appropriate British spelling.

Based on learnings, birdcage is used for focused scope with Linux namespace/seccomp-bpf support.

crates/weaver-sandbox/Cargo.toml (1)

1-14: Cargo manifest correctly declares dependencies.

The package uses workspace-managed dependencies appropriately, with birdcage and thiserror for production code, and testing frameworks (rstest, rstest-bdd, tempfile) scoped to dev-dependencies.

crates/weaver-sandbox/src/lib.rs (1)

1-51: Library module provides clear API documentation and appropriate exports.

The module-level documentation succinctly explains the zero-trust design, includes a working example, and correctly warns about single-threaded requirements. The public API surface is well-composed, re-exporting the necessary types whilst keeping internal modules private.

crates/weaver-sandbox/src/error.rs (1)

1-39: Error types are well-defined with clear messages.

The SandboxError enum comprehensively covers all failure modes with actionable error messages. The use of thiserror for derivation and the #[from] conversion for BirdcageError follow best practices.

crates/weaver-sandbox/src/profile.rs (2)

1-88: Policy enums and profile builders are clearly defined.

The EnvironmentPolicy and NetworkPolicy enums provide explicit, type-safe configuration options. The SandboxProfile builder methods use #[must_use] appropriately and initialize with secure defaults (isolated environment, no network).

120-159: Accessor methods and trait implementations are appropriate.

The crate-visible accessors provide controlled access to profile internals, and the Default delegation to new() maintains a single source of truth for default configuration. The is_denied helper on NetworkPolicy is a clear predicate.

crates/weaver-sandbox/src/tests/behaviour.rs (3)

1-40: BDD test structure and initial step definitions are sound.

The use of rstest-bdd with a RefCell<TestWorld> fixture provides clean test orchestration. The given/when/then steps correctly configure the sandbox profile and commands for behavioural scenarios.

46-112: Test assertions correctly validate sandbox behaviour.

The step definitions for environment allowlist, process success/failure, and stdout content verification appropriately exercise the sandbox's enforcement of policies and capture expected outcomes.

119-122: Scenario binding correctly links feature file.

The #[scenario] attribute correctly binds the step definitions to tests/features/sandbox.feature.

crates/weaver-sandbox/src/sandbox.rs (5)

1-14: Module structure and imports are well-organised.

The module-level doc comment explains the purpose. Imports are minimal and appropriately grouped. Using BTreeSet provides deterministic iteration order for reproducible sandbox configurations.

28-33: Constructor implementation is correct.

The #[must_use] attribute is appropriate. Per coding guidelines, this could be condensed to a single-line form, though the current multi-line version is acceptable.

41-57: Spawn logic is sound with appropriate preflight checks.

The sequence—single-thread check, path canonicalisation, whitelist validation, exception collection, then spawn—is well-structured. The explicit drop(env_guard) documents intent clearly, though RAII would handle it anyway on early returns via ?.

59-78: Helper methods are clean and follow single responsibility.

ensure_single_threaded and ensure_program_whitelisted are focused, well-named, and use the early-return pattern appropriately. Spelling of "authorised" is correct per en-GB-oxendict.

115-122: Path validation is correct.

The absolute path check followed by canonicalisation is the appropriate sequence. Returning SandboxError::ProgramNotAbsolute gives a clear error when the contract is violated.

…l tests Add a mutex-based mechanism to serialize environment variable changes during sandbox behavioural tests, preventing interference between tests. Replace direct std::env calls with controlled modifications in TestWorld to ensure environment isolation and cleanup. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

…d drop Changed the environment cleanup from clearing it to restoring it in tests. Renamed `clear_env` to `restore_env` and ensured it is called automatically when `TestWorld` is dropped to reliably clean up environment markers. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

…ce EnvGuard - Introduce the ability to fully inherit the environment in sandbox profiles with `allow_full_environment` and supporting logic. - Refactor `EnvGuard` to use a HashMap for efficiency and improve environment restoration semantics. - Extend environment policy handling to convert policies into birdcage exceptions. - Update sandbox builder to include new environment exceptions. - Improve canonicalization logic to handle non-existent paths by reconstructing from existing ancestors. - Add comprehensive unit and integration tests for environment restoration, canonicalization, and sandbox behavior regarding environment inheritance. - Adjust Makefile to isolate testing for weaver-sandbox crate. This enhances environment isolation control, allowing sandboxes to optionally inherit the full environment, improving flexibility and correctness. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0382b07 and 970c172.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

crates/weaver-sandbox/Cargo.toml (1 hunks)
crates/weaver-sandbox/src/tests/behaviour.rs (1 hunks)
crates/weaver-sandbox/src/tests/support/mod.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing, which execute cargo fmt --workspace -- --check, cargo clippy --workspace --all-targets --all-features -- -D warnings, and cargo test --workspace respectively.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality; run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated tests with #[rstest(...)] parameterised cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line function definitions in Rust where appropriate (e.g., `pub fn new(id: u64) -> Self { ...

Files:

crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/tests/behaviour.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap() in tests.

.expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and...

Files:

crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/tests/behaviour.rs

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: leynos/weaver PR: 0
File: docs/roadmap.md:0-0
Timestamp: 2025-12-03T19:40:08.813Z
Learning: Implement `weaver-sandbox` crate using `birdcage` for focused scope, prioritising robust Linux support via namespaces and seccomp-bpf

📚 Learning: 2025-08-24T13:22:19.937Z

Learnt from: leynos
Repo: leynos/ortho-config PR: 101
File: ortho_config/src/subcommand/paths.rs:0-0
Timestamp: 2025-08-24T13:22:19.937Z
Learning: Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such. For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option. If mockable cannot be used, env mutations in tests MUST be wrapped in shared guards and mutexes placed in a shared test_utils or test_helpers crate. Direct environment mutation is FORBIDDEN in tests.

Applied to files:

crates/weaver-sandbox/src/tests/behaviour.rs

📚 Learning: 2025-11-27T20:15:36.198Z

Learnt from: CR
Repo: leynos/lille PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T20:15:36.198Z
Learning: Applies to **/*.rs : For testing of functionality depending upon environment variables, dependency injection and the `mockable` crate are the preferred option. If mockable cannot be used, env mutations in tests MUST be wrapped in shared guards and mutexes placed in a shared `test_utils` or `test_helpers` crate. Direct environment mutation is FORBIDDEN in tests.

Applied to files:

crates/weaver-sandbox/src/tests/behaviour.rs

📚 Learning: 2025-11-26T20:29:06.669Z

Learnt from: CR
Repo: leynos/ortho-config PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-26T20:29:06.669Z
Learning: Applies to **/*.rs : If `mockable` cannot be used for Rust environment variable testing, env mutations in tests MUST be wrapped in shared guards and mutexes placed in a shared `test_utils` or `test_helpers` crate. Direct environment mutation is FORBIDDEN.

Applied to files:

crates/weaver-sandbox/src/tests/behaviour.rs

📚 Learning: 2025-12-02T23:47:57.677Z

Learnt from: CR
Repo: leynos/mriya PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-02T23:47:57.677Z
Learning: Applies to **/*.rs : If mockable cannot be used for environment variable testing in Rust, env mutations in tests must be wrapped in shared guards and mutexes in a shared `test_utils` or `test_helpers` crate. Direct environment mutation is forbidden.

Applied to files:

crates/weaver-sandbox/src/tests/behaviour.rs

📚 Learning: 2025-11-24T20:16:05.202Z

Learnt from: CR
Repo: leynos/rstest-bdd PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T20:16:05.202Z
Learning: Applies to **/*.rs : If `mockable` cannot be used in Rust tests, env mutations MUST be wrapped in shared guards and mutexes placed in a shared `test_utils` or `test_helpers` crate. Direct environment mutation is FORBIDDEN in tests.

Applied to files:

crates/weaver-sandbox/src/tests/behaviour.rs

📚 Learning: 2025-11-26T19:12:50.040Z

Learnt from: leynos
Repo: leynos/netsuke PR: 240
File: src/stdlib/which/lookup/tests.rs:188-221
Timestamp: 2025-11-26T19:12:50.040Z
Learning: In the netsuke codebase (Rust), environment variable mutation in tests must use the test_support crate helpers: test_support::env::VarGuard::set(key, value) for RAII-based env var setting/restoration with automatic synchronisation, test_support::PathGuard::new(original) for specialised PATH restoration, or test_support::env::{set_var, remove_var} for explicit mutations with EnvLock synchronisation. Never use unsafe std::env::set_var or std::env::remove_var directly in tests.

Applied to files:

crates/weaver-sandbox/src/tests/behaviour.rs

🧬 Code graph analysis (2)

crates/weaver-sandbox/src/tests/support/mod.rs (3)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/env_guard.rs (1)

drop (49-51)

crates/weaver-sandbox/src/tests/behaviour.rs (4)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (65-83)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-cli/src/lifecycle/controller.rs (1)

status (87-142)

🔍 Remote MCP Deepwiki

Additional context relevant to reviewing PR #33

Repo documentation available (site index) — confirms docs updates accompany the new crate.
Project-level design docs and roadmap already mention sandboxing goals and now include concrete weaver-sandbox details (weaver-design.md, roadmap.md, users-guide.md). These describe: birdcage v0.8.1 wrapper, absolute-path requirement, canonicalisation to avoid symlink escapes, zero-trust defaults (env isolated, network denied), and single-thread enforcement.
Build/test changes: Makefile runs tests with RUST_TEST_THREADS=1 to satisfy birdcage’s single-thread requirement; workspace adds birdcage = "0.8.1" and new member crates/weaver-sandbox.
Platform focus and runtime helpers: runtime module is Linux-centric — linux_runtime_roots() returns canonicalised standard library roots on Linux and empty Vec on non-Linux; thread_count() reads /proc/self/status on Linux and returns Ok(1) on non-Linux. This explains why many checks and tests assume Linux behavior.
API and behavior summary from docs: Sandbox enforces single-threaded preflight (errors with MultiThreaded), canonicalises and whitelists program paths, collects birdcage Exceptions (Read, WriteAndRead, ExecuteAndRead, Environment, FullEnvironment, Networking) from SandboxProfile, uses EnvGuard to capture/restore parent environment, and spawns birdcage with aggregated exceptions.
Testing strategy: new unit tests, BDD behavioural tests (rstest-bdd) and feature scenarios exercise whitelist file access, environment allowlist/inheritance, and multi-thread rejection; test harness includes TempDir fixtures and EnvHandle/EnvGuard patterns to restore env across tests.
Packaging/workflow: crate added to workspace and documented in design & user docs; CI/test tooling (pyproject/Makefile) already enforces timeouts and strict checks — relevant for test reliability when adding native sandboxing behavior.

Sources used: Deepwiki repository index and page contents for leynos/weaver.,

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Sourcery review

🔇 Additional comments (4)

crates/weaver-sandbox/Cargo.toml (3)

1-4: Package declaration and workspace inheritance are correctly configured.

The manifest properly inherits edition and version from the workspace root, ensuring consistency across all crates.

6-8: Validate root workspace declares all dependencies with stable versions.

Root workspace Cargo.toml correctly declares birdcage = "0.8.1" and thiserror = "2.0" in [workspace.dependencies], with weaver-sandbox properly referencing these through .workspace = true attributes. No version conflicts exist across workspace members.

6-15: No action required. Birdcage 0.8.1 and all declared workspace dependencies have no known security advisories in RustSec, CVE databases, or other vulnerability tracking systems.

crates/weaver-sandbox/src/tests/support/mod.rs (1)

52-83: Keep the TestWorld harness and env guard pattern; it matches the env-mutation guidelines

The TestWorld fixture, plus the EnvHandle guarded via ENV_MUTEX, gives each scenario an isolated temp directory, deterministic fixtures, and RAII-backed environment mutation confined to tests. restore_env being called from Drop guarantees cleanup even when a scenario panics, which resolves the earlier direct env::set_var / env::remove_var violations.

Also applies to: 108-147

leynos · 2025-12-04T12:16:36Z

@coderabbitai Have the following now been resolved?

The canonicalise/MissingPath check in the sandbox profile prevents whitelisting paths that don't yet exist (e.g., output files that will be created by the sandboxed command); if you expect that use-case, consider allowing non-existent paths when their parent directory is authorised, or making this behaviour configurable.
Setting RUST_TEST_THREADS=1 globally in the top-level Makefile forces the entire workspace test suite to run single-threaded; you might want to scope this to only the weaver-sandbox crate (e.g., via a dedicated test target or per-crate env) to avoid slowing unrelated tests.

coderabbitai · 2025-12-04T12:17:33Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the canonicalise function implementation in sandbox.rs
echo "=== Checking canonicalise function in sandbox.rs ==="
rg -n -A10 "fn canonicalise" crates/weaver-sandbox/src/sandbox.rs

echo ""
echo "=== Checking MissingPath error usage ==="
rg -n "MissingPath" crates/weaver-sandbox/src/

echo ""
echo "=== Checking Makefile test target ==="
cat Makefile | grep -A5 "^test:"

Length of output: 1615

🏁 Script executed:

#!/bin/bash
# Check the rebuild_from_existing_ancestor function
echo "=== Checking rebuild_from_existing_ancestor function ==="
rg -n -A20 "fn rebuild_from_existing_ancestor" crates/weaver-sandbox/src/sandbox.rs

echo ""
echo "=== Full context around MissingPath usage in sandbox.rs ==="
rg -n -B5 -A5 "SandboxError::MissingPath" crates/weaver-sandbox/src/sandbox.rs

Length of output: 1153

@leynos Yes, both issues have been resolved:

Issue 1: Non-existent path whitelisting ✅ RESOLVED

The canonicalise function now handles non-existent paths gracefully. When a path doesn't exist, it calls rebuild_from_existing_ancestor, which:

Walks up the directory tree to find the first existing ancestor
Canonicalises that existing ancestor
Rebuilds the full path by appending the missing components back

This allows you to whitelist output files or directories that will be created by the sandboxed command, provided their parent directory exists. The MissingPath error now only occurs in the extreme edge case where no ancestor exists at all (essentially if the root filesystem is missing).

Issue 2: Global RUST_TEST_THREADS=1 ✅ RESOLVED

The Makefile now uses two separate test commands:

RUST_TEST_THREADS=1 $(CARGO) test ... -p weaver-sandbox
$(CARGO) test ... --workspace --exclude weaver-sandbox

This scopes the single-threaded requirement to only the weaver-sandbox crate, whilst the rest of the workspace tests run normally in parallel. This avoids the performance penalty for unrelated tests.

Both concerns have been addressed effectively.

…c::OnceLock Replaced all uses of once_cell::sync::Lazy with the standard library's std::sync::OnceLock for lazily initialized static mutexes in tests and support code. Removed once_cell from the dependencies to streamline crates/weaver-sandbox dependency list. This modernizes the code and reduces external dependencies. Additional minor improvements include: - Added AArch64 runtime root paths for Linux. - Improved unsafety comments around environment mutation. - Cleaned up environment variable test assertions. - Minor formatting and error handling cleanups. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

Add notes explaining that environment variable mutations must be contained within unsafe blocks on Rust 2024 toolchains, clarifying the rationale for wrapping std APIs within the env_guard module to comply with testing and safety policies. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

In Sandbox::collect_exceptions the explicit ExecuteAndRead(program) exception will duplicate any matching entry from the canonicalised executable set; consider avoiding adding the program twice to keep the exception list minimal and easier to reason about.
canonicalised_set is called on every spawn for read-only, read-write, and executable paths, which re-canonicalises the same static paths each time; if sandboxes are used frequently, consider caching the canonicalised paths in SandboxProfile to avoid repeated filesystem lookups.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `Sandbox::collect_exceptions` the explicit `ExecuteAndRead(program)` exception will duplicate any matching entry from the canonicalised executable set; consider avoiding adding the program twice to keep the exception list minimal and easier to reason about.
- `canonicalised_set` is called on every spawn for read-only, read-write, and executable paths, which re-canonicalises the same static paths each time; if sandboxes are used frequently, consider caching the canonicalised paths in `SandboxProfile` to avoid repeated filesystem lookups.

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/sandbox.rs:41` </location>
<code_context>
+    /// profile. When more than one thread exists in the current process the
+    /// function returns [`SandboxError::MultiThreaded`] to avoid triggering the
+    /// single-thread assertion enforced by `birdcage`.
+    pub fn spawn(&self, command: SandboxCommand) -> Result<SandboxChild, SandboxError> {
+        self.ensure_single_threaded()?;
+        let program = Self::canonical_program(Path::new(command.get_program()))?;
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit test coverage for the new `weaver-sandbox` crate APIs; the current changes introduce a new feature without any implemented tests.

Add unit tests that exercise `Sandbox::spawn` and its error paths (`ProgramNotAbsolute`, `ExecutableNotAuthorised`, `MissingPath`, `CanonicalisationFailed`, `MultiThreaded`, and `ThreadCountUnavailable`), as well as the behaviour of `SandboxProfile`, `EnvironmentPolicy::to_exceptions`, `linux_runtime_roots`, `thread_count`, and `EnvGuard`.

In addition, add behavioural tests that validate the end-to-end behaviour of sandboxed commands (e.g. successfully running an allowed executable, rejecting a non-whitelisted executable, enforcing absolute paths, denying vs allowing networking, and environment isolation vs allowlists). Populate the currently empty test modules under `crates/weaver-sandbox/src/tests/` and the `tests/features/sandbox.feature` file so that both unit and higher-level behavioural expectations of this new crate are verified in CI.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural and unit tests are required.

</details>
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/env_guard.rs:32` </location>
<code_context>
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
+                unsafe { env::remove_var(key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is being used around `env::remove_var` / `env::set_var`, but these standard-library functions are safe and do not require `unsafe`, which conflicts with the "avoid unsafe" instruction.

`std::env::remove_var` and `std::env::set_var` are safe APIs; wrapping them in an `unsafe` block doesn’t provide additional guarantees and directly contradicts the guideline to avoid `unsafe` unless absolutely necessary.

Unless there is some external FFI or other genuinely unsafe invariant being relied on here (which doesn’t appear to be the case), these `unsafe` blocks should be removed and the calls made directly in safe Rust.

If there *is* a non-obvious safety requirement, it would be better to encapsulate that in a small, well-documented unsafe helper function explaining the invariant, and reference that rationale here.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

In Sandbox::collect_exceptions and related helpers you canonicalise and rebuild BTreeSets on every spawn; if profiles are reused across many spawns it would be more efficient and predictable to canonicalise and deduplicate paths once when building the SandboxProfile and store them in canonical form.
The EnvHandle::drop implementation calls drop(&self.guard), which only drops a reference to the MutexGuard rather than the guard itself; this line is redundant and can be removed to make it clearer that locking is managed purely by the struct’s lifetime.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `Sandbox::collect_exceptions` and related helpers you canonicalise and rebuild `BTreeSet`s on every spawn; if profiles are reused across many spawns it would be more efficient and predictable to canonicalise and deduplicate paths once when building the `SandboxProfile` and store them in canonical form.
- The `EnvHandle::drop` implementation calls `drop(&self.guard)`, which only drops a reference to the `MutexGuard` rather than the guard itself; this line is redundant and can be removed to make it clearer that locking is managed purely by the struct’s lifetime.

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/sandbox.rs:92-96` </location>
<code_context>
+        for path in read_write {
+            exceptions.push(Exception::WriteAndRead(path));
+        }
+        for path in executables {
+            exceptions.push(Exception::ExecuteAndRead(path));
+        }
+
+        exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
+
+        exceptions.extend(self.profile.environment_policy().to_exceptions());
</code_context>

<issue_to_address>
**suggestion:** The program path is added as `ExecuteAndRead` twice, which is redundant.

Since `collect_exceptions` already iterates over `executables` and pushes `ExecuteAndRead(path)` for each entry, adding `ExecuteAndRead(program)` again guarantees at least one duplicate for authorised programs. This is unnecessary work and may confuse debugging/logging. Consider either ensuring `program` is included in `executables` and removing the extra push, or checking for its presence before pushing it again.
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:1` </location>
<code_context>
+//! Behavioural tests for sandbox spawning using `rstest-bdd`.
+
+use std::cell::RefCell;
</code_context>

<issue_to_address>
**issue (testing):** Behavioural tests will panic on non-Linux hosts due to `resolve_binary`’s `#[cfg(not(target_os = "linux"))]` panic path; consider gating this module to Linux.

Because `tests/support/mod.rs` defines `resolve_binary` as a `panic!()` on non-Linux targets, and this module’s tests are always compiled and run, any non-Linux CI runner will panic before actually exercising sandbox behaviour. To avoid spurious failures while still validating Linux behaviour, gate this module (or at least the `sandbox_behaviour` scenario/fixture) with `#[cfg(target_os = "linux")]` so it only runs where the support code is valid.
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/sandbox_unit.rs:1` </location>
<code_context>
+//! Unit tests covering sandbox spawn preflight errors.
+
+use std::path::PathBuf;
</code_context>

<issue_to_address>
**suggestion (testing):** Add a unit test exercising the `MultiThreaded` error path when more than one thread is active.

Current tests don’t exercise the single-thread preflight (`SandboxError::MultiThreaded`). Please add a test that starts an extra thread (e.g. a short-lived `std::thread::spawn` waiting on a barrier), then calls `Sandbox::spawn` and asserts it returns `SandboxError::MultiThreaded { thread_count: n }` so this safety check is covered and regressions in the thread-count logic are caught.

Suggested implementation:

```rust
use std::path::PathBuf;
use std::sync::{Arc, Barrier};
use std::thread;

```

```rust
#[test]
fn spawn_rejects_when_multiple_threads_are_active() {
    // Ensure there's at least one additional thread alive while we call `Sandbox::spawn`.
    let ready = Arc::new(Barrier::new(2));
    let release = Arc::new(Barrier::new(2));

    let ready_thread = Arc::clone(&ready);
    let release_thread = Arc::clone(&release);

    let handle = thread::spawn(move || {
        // Signal that the helper thread is up and running.
        ready_thread.wait();
        // Stay alive until the main thread lets us go, so we are
        // definitely counted in the thread preflight.
        release_thread.wait();
    });

    // Wait for the helper thread to start and block on `release`.
    ready.wait();

    let sandbox = sandbox_with_profile(SandboxProfile::default());

    // Use a simple, valid command; the important part of this test is
    // exercising the MultiThreaded preflight error, not command execution.
    let command = SandboxCommand::new(PathBuf::from("/bin/true"));

    let error = sandbox
        .spawn(command)
        .expect_err("expected SandboxError::MultiThreaded when multiple threads are active");

    // Allow the helper thread to exit and join it.
    release.wait();
    handle.join().expect("helper thread should exit cleanly");

    match error {
        SandboxError::MultiThreaded { thread_count } => {
            assert!(
                thread_count >= 2,
                "expected at least 2 threads to be observed, got {thread_count}"
            );
        }
        other => panic!("expected SandboxError::MultiThreaded, got {other:?}"),
    }
}

#[test]
fn rejects_relative_program_paths() {

```

The test assumes:
1. `SandboxProfile::default()` constructs a usable profile. If not, use the appropriate constructor (e.g. `SandboxProfile::new(...)` or a specific test profile helper).
2. `SandboxCommand::new(PathBuf)` exists. If the API differs (e.g. `SandboxCommand::from(PathBuf)` or includes args/env), adjust the `command` construction accordingly.
3. `Sandbox::spawn` has the signature `fn spawn(&self, cmd: SandboxCommand) -> Result<_, SandboxError>`. If it’s `&mut self` or returns a different result type, adapt the call while keeping the `expect_err` and `match` on `SandboxError::MultiThreaded`.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/tests/unit.rs:1` </location>
<code_context>
+//! Unit tests for sandbox configuration helpers.
+
+use std::path::PathBuf;
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for `EnvironmentPolicy::to_exceptions` to verify the mapping to birdcage exceptions.

Existing tests cover environment policy construction but not its conversion into `birdcage` exceptions. Please add a few focused tests that assert:
- `EnvironmentPolicy::Isolated.to_exceptions()` returns an empty vector
- `EnvironmentPolicy::AllowList(["FOO", "BAR"])` produces the corresponding `birdcage::Exception::Environment("FOO")`/`"BAR"`
- `EnvironmentPolicy::InheritAll` yields a single `Exception::FullEnvironment`
This will help catch regressions if the policy enum or exception wiring changes.

Suggested implementation:

```rust
 //! Unit tests for sandbox configuration helpers.

use std::path::PathBuf;

use birdcage::Exception;
use crate::profile::{EnvironmentPolicy, NetworkPolicy, SandboxProfile};

```

```rust
use crate::profile::{EnvironmentPolicy, NetworkPolicy, SandboxProfile};

#[test]
fn environment_policy_to_exceptions_isolated_is_empty() {
    let policy = EnvironmentPolicy::Isolated;
    let exceptions = policy.to_exceptions();

    assert!(
        exceptions.is_empty(),
        "Isolated environment policy should not yield any environment exceptions"
    );
}

#[test]
fn environment_policy_to_exceptions_allow_list_maps_to_environment_exceptions() {
    let policy = EnvironmentPolicy::AllowList(vec!["FOO".into(), "BAR".into()]);
    let exceptions = policy.to_exceptions();

    assert_eq!(
        exceptions,
        vec![
            Exception::Environment("FOO".into()),
            Exception::Environment("BAR".into())
        ],
        "AllowList should map each allowed variable to an Exception::Environment entry"
    );
}

#[test]
fn environment_policy_to_exceptions_inherit_all_maps_to_full_environment_exception() {
    let policy = EnvironmentPolicy::InheritAll;
    let exceptions = policy.to_exceptions();

    assert_eq!(
        exceptions,
        vec![Exception::FullEnvironment],
        "InheritAll should map to a single Exception::FullEnvironment entry"
    );
}

#[test]
fn profile_whitelists_linux_runtime_roots() {

```

If the `birdcage::Exception` type is re-exported under a different path in your codebase (for example `crate::birdcage::Exception` or `crate::sandbox::birdcage::Exception`), adjust the `use birdcage::Exception;` import accordingly.

If `EnvironmentPolicy::AllowList` takes a different inner type than `Vec<String>` (for example `Vec<&'static str>` or a slice), update the construction in `environment_policy_to_exceptions_allow_list_maps_to_environment_exceptions` to match its actual signature.
</issue_to_address>

### Comment 5
<location> `crates/weaver-sandbox/src/tests/support/mod.rs:1` </location>
<code_context>
+//! Shared fixtures for sandbox behavioural tests.
+
+use std::fs;
</code_context>

<issue_to_address>
**suggestion (testing):** The shared test world doesn’t currently exercise networking policy; consider adding a scenario around `NetworkPolicy::Allow`.

Current behavioural tests through `TestWorld` cover filesystem and environment policy but not networking. Right now `NetworkPolicy` is only checked via unit tests that validate enum values, not that `Exception::Networking` is actually wired through. Please add coverage that verifies a profile with `allow_networking()` results in the networking exception being applied (either via a `TestWorld` scenario or a crate-internal helper that inspects the exception set for a profile). This will confirm the flag properly flows into the underlying birdcage exceptions.

Suggested implementation:

```rust
use std::fs;
use std::io::Write;
use std::path::{Path, PathBuf};
use std::sync::{Mutex, MutexGuard};

use std::sync::OnceLock;
use tempfile::TempDir;

use crate::error::SandboxError;
use crate::env_guard::EnvGuard;
use crate::process::Stdio;
use crate::policy::NetworkPolicy;
use crate::profile::Profile;
use crate::sandbox::Exception;

/// Assert that a profile with `NetworkPolicy::Allow` has the networking exception wired through
/// to the underlying sandbox implementation.
///
/// This is intended to be used from behavioural tests that build a profile via `TestWorld`
/// and call `.allow_networking()` (or equivalent) on the profile builder.
pub fn assert_networking_exception_applied(profile: &Profile) {
    assert!(
        matches!(profile.network_policy(), NetworkPolicy::Allow),
        "assert_networking_exception_applied must be called with a profile whose networking policy is Allow"
    );

    let exceptions = profile.exceptions();
    assert!(
        exceptions.contains(&Exception::Networking),
        "expected networking exception when NetworkPolicy::Allow is enabled; got {exceptions:?}"
    );
}

```

To fully implement the requested behavioural coverage, you should also:

1. **Add/extend a behavioural test using `TestWorld`:**
   - In the appropriate test module (e.g. `crates/weaver-sandbox/src/tests/behaviour.rs` or similar), create a scenario that:
     - Constructs a `TestWorld`.
     - Builds a sandbox profile via the usual helper / builder in `TestWorld`.
     - Enables networking via the existing API (e.g. `.allow_networking()` on the profile builder, or setting `NetworkPolicy::Allow` explicitly).
     - Obtains the resulting `Profile` instance.
     - Calls `assert_networking_exception_applied(&profile)` to verify that the `Exception::Networking` flag is present.

   Example sketch (adjust names/paths to match your codebase):

   ```rust
   #[test]
   fn profile_with_allow_networking_sets_networking_exception() {
       let mut world = TestWorld::new();
       let profile = world
           .profile_builder()
           .allow_networking() // or equivalent
           .build()
           .expect("failed to build profile");

       crate::tests::support::assert_networking_exception_applied(&profile);
   }
   ```

2. **Align type/module paths if they differ:**
   - If `NetworkPolicy`, `Profile`, or `Exception` live under slightly different modules (e.g. `crate::policy::network::NetworkPolicy`, `crate::sandbox::profile::Profile`, `crate::sandbox::exceptions::Exception`), update the `use` statements in `mod.rs` accordingly.
   - If `Profile` exposes exceptions or network policy via differently-named methods (e.g. `profile.exception_set()` or `profile.networking()`), adjust `assert_networking_exception_applied` to use the correct accessors.

3. **Ensure the `Exception::Networking` variant name matches:**
   - If the variant is named differently (`NetworkingAllowed`, `Network`, etc.), update the helper to match the actual enum variant.

These additional changes will ensure the behavioural tests actually exercise that `NetworkPolicy::Allow` flows through into the underlying birdcage exceptions, not just the enum wiring.
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/lib.rs:1` </location>
<code_context>
+//! Sandboxing utilities for Weaver processes.
+//!
+//! The `weaver-sandbox` crate wraps the [`birdcage`] library with policy
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both unit and behavioural tests for the new `weaver-sandbox` crate; the newly added test files and feature file must contain concrete tests, not placeholders.

You introduce a new crate (`weaver-sandbox`) with non-trivial behaviour (profiling, path canonicalisation, environment restoration, network policy, thread-count checks), but the newly added test modules under `crates/weaver-sandbox/src/tests/` and the `tests/features/sandbox.feature` file are empty in this diff.

Implement unit tests that exercise `SandboxProfile`, `EnvironmentPolicy::to_exceptions`, `EnvGuard`, `canonicalise`/`rebuild_from_existing_ancestor`, `thread_count`, and the error-mapping logic in `Sandbox::spawn` and the helper functions. Also implement behavioural tests (e.g. BDD-style using the `sandbox.feature` and associated step definitions) that demonstrate end-to-end behaviour: rejecting non-absolute paths, enforcing the executable allowlist, restoring the environment after `birdcage` activation, default network denial vs. `allow_networking`, and the `MultiThreaded` error path.

Populate the existing test files with these tests rather than leaving them empty so that this new feature is covered by both unit and behavioural tests as required.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 7
<location> `crates/weaver-sandbox/src/env_guard.rs:32` </location>
<code_context>
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
+                unsafe { env::remove_var(key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used here even though the underlying APIs are safe; this violates the requirement to avoid `unsafe` unless it is absolutely necessary.

`env::remove_var` and `env::set_var` are safe functions, so wrapping them in an `unsafe` block is not required by Rust itself. Given the instruction to avoid `unsafe` unless absolutely necessary, these calls should be made directly without `unsafe`. If there is a strong reason to keep `unsafe` (e.g. project-wide policy), it should be documented and aligned with this repository’s guidelines, but as written this looks avoidable.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/profile.rs:91-92` </location>
<code_context>
+
+    /// Whitelists an environment variable for inheritance.
+    #[must_use]
+    pub fn allow_environment_variable(mut self, key: impl Into<String>) -> Self {
+        self.environment = self.environment.clone().with_allowed(key.into());
+        self
+    }
</code_context>

<issue_to_address>
**suggestion (performance):** Avoid cloning the entire `EnvironmentPolicy` enum on every `allow_environment_variable` call.

Taking `self` by value forces a full clone of `self.environment` just to call `with_allowed`, which can be costly as the allow-list grows. Consider either (a) changing `with_allowed` to take `&mut self` and match on `self` directly, or (b) having `allow_environment_variable` take `&mut self` and mutate in place (optionally via an internal `allow_environment_variable_mut`). Both options avoid the extra clone while preserving the builder-style API.

Suggested implementation:

```rust
    /// Whitelists an environment variable for inheritance.
    ///
    /// This method preserves the builder-style API while mutating the internal
    /// environment policy in place to avoid cloning on each call.
    #[must_use]
    pub fn allow_environment_variable(mut self, key: impl Into<String>) -> Self {
        self.allow_environment_variable_mut(key);
        self
    }

    /// Mutable variant of [`allow_environment_variable`] that updates the
    /// environment policy in place.
    pub fn allow_environment_variable_mut(&mut self, key: impl Into<String>) {
        // `with_allowed` is expected to mutate the existing policy in place.
        self.environment.with_allowed(key.into());
    }

/// Network access policy applied to sandboxed processes.

```

To fully avoid the clone in `allow_environment_variable`, you also need to update the environment policy type (likely `EnvironmentPolicy`) and its `with_allowed` method:

1. Change the signature of `with_allowed` from something like:
   ```rust
   impl EnvironmentPolicy {
       pub fn with_allowed(self, key: String) -> Self {
           match self {
               // ...
           }
       }
   }
   ```
   to:
   ```rust
   impl EnvironmentPolicy {
       pub fn with_allowed(&mut self, key: String) {
           match self {
               // mutate `self` in place, pushing `key` into the allow-list, etc.
           }
       }
   }
   ```

2. Update all call sites of `with_allowed` to pass `&mut` instead of using the returned value. For example, replace:
   ```rust
   self.environment = self.environment.clone().with_allowed(key);
   ```
   with:
   ```rust
   self.environment.with_allowed(key);
   ```

These changes ensure `allow_environment_variable` no longer clones the entire `EnvironmentPolicy` on each call while preserving the existing builder-style API.
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:1-3` </location>
<code_context>
+//! Behavioural tests for sandbox spawning using `rstest-bdd`.
+
+use std::cell::RefCell;
</code_context>

<issue_to_address>
**suggestion (testing):** Behavioural tests should be gated to Linux to avoid panics from `resolve_binary` on non-Linux hosts

This suite depends on `resolve_binary`, which panics on non-Linux targets. As-is, `behaviour.rs` builds everywhere and will cause macOS/Windows test runs to panic instead of skip. Please gate these tests with `#[cfg(target_os = "linux")]` (either at module level or on the `sandbox_behaviour` scenario, possibly combined with `#[cfg(test)]`) so they are only built and run where supported.

```suggestion
 //! Behavioural tests for sandbox spawning using `rstest-bdd`.
 //! 
 //! These tests are only built and run on Linux, since they rely on sandbox
 //! functionality that is only available there.

#![cfg(all(test, target_os = "linux"))]

use std::cell::RefCell;
```
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/env_guard.rs:17` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
</code_context>

<issue_to_address>
**suggestion (testing):** Add a test that relies on `EnvGuard`’s `Drop` implementation rather than explicit `restore`

Current tests only assert behavior when `snapshot.restore()` is called explicitly, but `Sandbox::spawn` relies on `EnvGuard`’s `Drop` to restore the environment. Please add a test that creates an `EnvGuard` with `capture()`, mutates the environment, then lets it go out of scope without calling `restore()`, and finally asserts the environment is restored. This ensures regressions in the `Drop` path are caught.

Suggested implementation:

```rust
fn restores_modified_and_removed_environment_variables() {

```

I can only see the beginning of the existing test function, so I can’t safely anchor an insertion point in the current code. Please **add the following new test function** (for example, just after `restores_modified_and_removed_environment_variables`) to exercise the `Drop` behavior of `EnvGuard`:

```rust
#[test]
fn restores_environment_on_drop_without_explicit_restore() {
    const EXISTING: &str = "WEAVER_ENV_GUARD_DROP_EXISTING";
    const EPHEMERAL: &str = "WEAVER_ENV_GUARD_DROP_EPHEMERAL";

    let _guard = lock_env();

    // Set up initial environment state.
    unsafe { std::env::set_var(EXISTING, "original") };

    {
        // Capture the environment.
        let _snapshot = EnvGuard::capture();

        // Mutate existing and add new variables while the snapshot is alive.
        unsafe { std::env::set_var(EXISTING, "modified") };
        unsafe { std::env::set_var(EPHEMERAL, "temporary") };

        // No explicit call to `_snapshot.restore()`; we rely on `Drop`.
    }

    // After `_snapshot` is dropped, the environment must be restored.
    unsafe {
        assert_eq!(std::env::var(EXISTING).as_deref(), Ok("original"));
        assert!(std::env::var(EPHEMERAL).is_err(), "ephemeral var should be removed on drop");
    }
}
```

This test:

1. Uses `lock_env()` to serialize environment access, matching existing conventions.
2. Calls `EnvGuard::capture()`, mutates the environment, and **does not** call `restore()`.
3. Relies on `EnvGuard`’s `Drop` implementation to restore the environment, then asserts:
   - the original value of `EXISTING` is restored, and
   - the ephemeral variable is removed.

Place this function alongside the other `EnvGuard` tests in `crates/weaver-sandbox/src/tests/env_guard.rs`.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/tests/sandbox_unit.rs:12` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
</code_context>

<issue_to_address>
**suggestion (testing):** Missing unit coverage for the `MultiThreaded` and `ThreadCountUnavailable` sandbox errors

Current tests cover paths and whitelisting but not the `thread_count()` preflight behavior. Please add a test that:
- starts an extra thread that stays alive,
- calls `Sandbox::spawn`, and
- asserts it returns `SandboxError::MultiThreaded { thread_count: .. }`.

If practical, also add a test for the `ThreadCountUnavailable` branch by injecting or mocking a failing `thread_count()` (e.g., via a wrapper or feature flag in tests) so that IO errors from `/proc/self/status` are verified to surface as `SandboxError::ThreadCountUnavailable`.
</issue_to_address>

### Comment 5
<location> `crates/weaver-sandbox/src/tests/unit.rs:7` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
</code_context>

<issue_to_address>
**suggestion (testing):** The `canonicalises_nonexistent_child_when_parent_exists` test name suggests behaviour it doesn’t actually validate

This test only checks that the future file path is recorded in `read_write_paths`; it never exercises the canonicalisation logic (`canonicalise`/`rebuild_from_existing_ancestor`) used when spawning a sandboxed command. As a result, the name is misleading and `rebuild_from_existing_ancestor` remains untested. Either rename the test to reflect that it covers only profile recording, or (preferably) add a sandbox-level test that drives `Sandbox::spawn`/`collect_exceptions` with a non-existent child path under a temp dir and asserts that the resulting exception path matches the expected rebuilt canonical path.
</issue_to_address>

### Comment 6
<location> `docs/weaver-design.md:989-990` </location>
<code_context>
+v0.8.1 with Weaver defaults. Executables must be supplied as absolute paths and
+whitelisted explicitly; the wrapper canonicalises the whitelist before launch
+to prevent symlink escapes. Standard Linux library roots (`/lib`, `/lib64`,
+`/usr/lib`, and their architecture-specific variants) are whitelisted for
+read-only access by default so dynamically linked binaries remain functional
+without exposing the wider filesystem. Network access remains disabled unless
+requested, and the environment is isolated by default with an allowlist for
</code_context>

<issue_to_address>
**nitpick (typo):** Consider adding a comma before "so" for clarity in this result clause.

In this sentence, consider changing to: "...read-only access by default, so dynamically linked binaries remain functional...". The comma makes it clearer that the second part describes the consequence of the default behavior.

```suggestion
`/usr/lib`, and their architecture-specific variants) are whitelisted for
read-only access by default, so dynamically linked binaries remain functional
```
</issue_to_address>

### Comment 7
<location> `docs/users-guide.md:145-146` </location>
<code_context>
+
+External tools launched by the daemon now run inside the `weaver-sandbox`
+wrapper around `birdcage` 0.8.1. Linux namespaces and `seccomp-bpf` filters are
+applied automatically, networking is disabled by default, and only a small set
+of standard library directories are readable to keep dynamically linked
+executables functioning. Commands must be provided as absolute paths and added
</code_context>

<issue_to_address>
**suggestion (typo):** The sentence here is a bit of a comma splice; consider adjusting punctuation or conjunctions.

In the phrase “are applied automatically, networking is disabled by default,” two independent clauses are joined only by a comma. Consider changing that comma to “and” or a semicolon to avoid the comma splice.

```suggestion
wrapper around `birdcage` 0.8.1. Linux namespaces and `seccomp-bpf` filters are
applied automatically; networking is disabled by default, and only a small set
```
</issue_to_address>

### Comment 8
<location> `crates/weaver-sandbox/src/sandbox.rs:80` </location>
<code_context>
+        })
+    }
+
+    fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
+        let mut exceptions = Vec::new();
+        let read_only = canonicalised_set(self.profile.read_only_paths())?;
</code_context>

<issue_to_address>
**issue (complexity):** Consider unifying program handling within `collect_exceptions` and simplifying `canonicalised_set` to avoid special cases and make the sandbox path logic easier to follow.

You can simplify the “double treatment” of the program and make the path handling a bit easier to follow without changing behaviour.

### 1. Unify program handling in `collect_exceptions`

Right now `program` is:

- canonicalised in `canonical_program`
- checked against `executable_paths` via `canonicalised_set`
- and then pushed again as a bespoke exception in `collect_exceptions`.

You can fold the program into the executable set up front and let the generic logic handle it, removing the special case and the need to reason about “did we already include this path or not?”.

```rust
fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
    let mut exceptions = Vec::new();
    let read_only = canonicalised_set(self.profile.read_only_paths())?;
    let read_write = canonicalised_set(self.profile.read_write_paths())?;

    // Start with the canonicalised executable paths.
    let mut executables = canonicalised_set(self.profile.executable_paths())?;
    // Ensure the canonicalised program is in the executable set as well.
    executables.insert(program.to_path_buf());

    for path in read_only {
        exceptions.push(Exception::Read(path));
    }
    for path in read_write {
        exceptions.push(Exception::WriteAndRead(path));
    }
    for path in executables {
        exceptions.push(Exception::ExecuteAndRead(path));
    }

    exceptions.extend(self.profile.environment_policy().to_exceptions());

    if matches!(self.profile.network_policy(), NetworkPolicy::Allow) {
        exceptions.push(Exception::Networking);
    }

    Ok(exceptions)
}
```

This keeps all functionality intact (the program is still required to be whitelisted and still gets an `ExecuteAndRead` exception) but removes one branch and the bespoke push at the bottom.

### 2. Trim `canonicalised_set` to make behaviour more obvious

You can make `canonicalised_set` more self‑contained and readable by expressing it in terms of iterator combinators. Behaviour stays identical, but the “canonicalise → dedupe” story is more obvious:

```rust
fn canonicalised_set(paths: &[PathBuf]) -> Result<BTreeSet<PathBuf>, SandboxError> {
    let mut set = BTreeSet::new();
    for path in paths {
        set.insert(canonicalise(path)?);
    }
    Ok(set)
}
```

or, if you prefer a more functional style without changing semantics:

```rust
fn canonicalised_set(paths: &[PathBuf]) -> Result<BTreeSet<PathBuf>, SandboxError> {
    paths
        .iter()
        .map(|p| canonicalise(p))
        .collect()
}
```

(Using `FromIterator` for `Result<BTreeSet<_>, _>`.)

These changes don’t touch the ancestor‑rebuild semantics, but they reduce branching and bespoke handling in the hot path, making the code easier to audit.
</issue_to_address>

### Comment 9
<location> `crates/weaver-sandbox/src/lib.rs:1` </location>
<code_context>
+//! Sandboxing utilities for Weaver processes.
+//!
+//! The `weaver-sandbox` crate wraps the [`birdcage`] library with policy
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both unit and behavioural tests covering the new `weaver-sandbox` crate, including its error paths and sandbox policies.

The `weaver-sandbox` crate introduces a substantial new feature surface (profiles, env guard, runtime helpers, sandbox orchestration), but the added test files under `crates/weaver-sandbox/src/tests` and `crates/weaver-sandbox/tests/features/sandbox.feature` are empty in this diff. Implement unit tests for modules like `EnvGuard`, `SandboxProfile`, `Sandbox`, and `runtime::thread_count/linux_runtime_roots`, and add behavioural/feature tests that exercise end-to-end sandboxed command execution and validation of policies (e.g. absolute path requirement, whitelist enforcement, environment isolation, networking toggle, multi-threaded error). Ensure these tests are committed as part of this change.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural and unit tests are required.

</details>
</issue_to_address>

### Comment 10
<location> `crates/weaver-sandbox/src/env_guard.rs:32` </location>
<code_context>
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
+                unsafe { env::remove_var(key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `env::remove_var` is a safe API and does not require an `unsafe` block, so this use of `unsafe` violates the "avoid unsafe" guideline.

The standard library guarantees that `std::env::remove_var` is safe to call; wrapping it in `unsafe` does not provide additional correctness and conflicts with the project rule to avoid `unsafe` unless strictly necessary. Please remove the `unsafe` block here and call `env::remove_var` directly (keeping the surrounding logic unchanged).

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

### Comment 11
<location> `crates/weaver-sandbox/src/env_guard.rs:38` </location>
<code_context>
+
+        for (key, value) in &self.original {
+            // Safety: see note above regarding env mutation policy.
+            unsafe { env::set_var(key, value) };
+        }
+    }
</code_context>

<issue_to_address>
**issue (review_instructions):** `env::set_var` is also safe and does not justify an `unsafe` block under the current guidelines.

Similarly to `remove_var`, `std::env::set_var` is a safe function and does not need to be wrapped in `unsafe`. Please drop the `unsafe` block and call `env::set_var(key, value)` directly so that the implementation adheres to the project’s restriction on `unsafe` usage.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

In Sandbox::collect_exceptions you always push an ExecuteAndRead exception for program even if it is already present in the canonicalised executable set, which can be trivially de-duplicated by checking membership before pushing to avoid redundant exceptions.
Both the behavioural test support (tests/support/mod.rs) and env_guard tests define their own OnceLock<Mutex<()>> for environment serialisation; consider centralising this environment lock in a single helper to avoid duplication and ensure all env-mutating tests share the same synchronisation primitive.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `Sandbox::collect_exceptions` you always push an `ExecuteAndRead` exception for `program` even if it is already present in the canonicalised executable set, which can be trivially de-duplicated by checking membership before pushing to avoid redundant exceptions.
- Both the behavioural test support (`tests/support/mod.rs`) and `env_guard` tests define their own `OnceLock<Mutex<()>>` for environment serialisation; consider centralising this environment lock in a single helper to avoid duplication and ensure all env-mutating tests share the same synchronisation primitive.

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/sandbox.rs:80-105` </location>
<code_context>
+        let mut exceptions = Vec::new();
+        let read_only = canonicalised_set(self.profile.read_only_paths())?;
+        let read_write = canonicalised_set(self.profile.read_write_paths())?;
+        let executables = canonicalised_set(self.profile.executable_paths())?;
+
+        for path in read_only {
+            exceptions.push(Exception::Read(path));
+        }
+        for path in read_write {
+            exceptions.push(Exception::WriteAndRead(path));
+        }
+        for path in executables {
+            exceptions.push(Exception::ExecuteAndRead(path));
+        }
+
+        exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
+
+        exceptions.extend(self.profile.environment_policy().to_exceptions());
</code_context>

<issue_to_address>
**suggestion:** Avoid adding a duplicate `ExecuteAndRead` exception when the program is already whitelisted as executable.

In `collect_exceptions`, you convert all profile `executable_paths` into `ExecuteAndRead` exceptions, then always add another `ExecuteAndRead(program)`. If the canonicalised `program` is already in the profile, this duplicates the exception. To avoid this, only add the extra entry when it’s not already in the `executables` set:

```rust
let executables = canonicalised_set(self.profile.executable_paths())?;

for path in &executables {
    exceptions.push(Exception::ExecuteAndRead(path.clone()));
}

if !executables.contains(program) {
    exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
}
```

This preserves behavior while avoiding duplicate entries.

```suggestion
    fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
        let mut exceptions = Vec::new();
        let read_only = canonicalised_set(self.profile.read_only_paths())?;
        let read_write = canonicalised_set(self.profile.read_write_paths())?;
        let executables = canonicalised_set(self.profile.executable_paths())?;

        for path in read_only {
            exceptions.push(Exception::Read(path));
        }
        for path in read_write {
            exceptions.push(Exception::WriteAndRead(path));
        }
        for path in &executables {
            exceptions.push(Exception::ExecuteAndRead(path.clone()));
        }

        if !executables.contains(program) {
            exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
        }

        exceptions.extend(self.profile.environment_policy().to_exceptions());

        if matches!(self.profile.network_policy(), NetworkPolicy::Allow) {
            exceptions.push(Exception::Networking);
        }

        Ok(exceptions)
    }
```
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/tests/unit.rs:85-86` </location>
<code_context>
+}
+
+#[test]
+fn canonicalises_nonexistent_child_when_parent_exists() {
+    let base = tempfile::tempdir().expect("tempdir");
+    let target = base.path().join("future_dir").join("file.txt");
+
+    let profile = SandboxProfile::new().allow_read_write_path(&target);
+
+    let set = profile.read_write_paths();
+    assert!(
+        set.iter().any(|p| p.ends_with("file.txt")),
</code_context>

<issue_to_address>
**suggestion (testing):** This test name suggests canonicalisation behaviour, but it currently only asserts that the path is recorded.

Here the test exercises only that `SandboxProfile` records a path ending in `file.txt`; the actual non-existent-child canonicalisation happens in `sandbox.rs` during spawn/exception collection, not when adding paths to the profile.

To make this clearer and improve coverage, either:
- Rename the test to something like `records_future_child_paths_when_parent_exists`, or
- Add/extend a test that goes through the full sandbox flow: create a temp dir with an existing parent but no leaf file, add the path via `allow_read_write_path`, trigger `collect_exceptions` (e.g. by spawning a whitelisted no-op program), and assert that the resulting `Exception::WriteAndRead` path matches the expected canonicalised form.

```suggestion
#[test]
fn records_future_child_paths_when_parent_exists() {
```
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:126-127` </location>
<code_context>
+    );
+}
+
+#[then("environment markers are cleaned up")]
+fn then_environment_cleaned(world: &RefCell<TestWorld>) {
+    world.borrow_mut().restore_env();
+    assert_ne!(
</code_context>

<issue_to_address>
**suggestion (testing):** The behaviour tests don't currently exercise the MultiThreaded or ThreadCountUnavailable error paths.

Since `Sandbox::spawn` explicitly guards against multi-threaded contexts and `/proc` parsing failures, it would be useful to add tests that cover those preflight paths. You could introduce a small abstraction (e.g., a test-only hook or trait) around `thread_count()` so tests can:

- Force a value > 1 and assert `SandboxError::MultiThreaded`.
- Force an `io::Error` and assert `SandboxError::ThreadCountUnavailable`.

These can be unit tests rather than BDD scenarios and would exercise the most security-sensitive checks.

Suggested implementation:

```rust
#[then("environment markers are cleaned up")]
fn then_environment_cleaned(world: &RefCell<TestWorld>) {
    world.borrow_mut().restore_env();
    assert_ne!(
        std::env::var_os("KEEP_ME"),
        Some(std::ffi::OsString::from("present")),
        "KEEP_ME still holds the scenario value after restoration"
    );
    assert_ne!(
        std::env::var_os("DROP_ME"),
        Some(std::ffi::OsString::from("remove-me")),
        "DROP_ME still holds the scenario value after restoration"
    );
}

#[cfg(test)]
mod thread_count_error_tests {
    use super::*;
    use crate::{Sandbox, SandboxError};

    /// Helper to construct a sandbox instance suitable for preflight testing.
    ///
    /// This assumes `Sandbox::builder()` exists; adjust to match the actual
    /// constructor API if needed.
    fn make_sandbox_for_thread_test() -> Sandbox {
        Sandbox::builder()
            .build()
            .expect("failed to build sandbox for thread-count tests")
    }

    /// When the injected thread counter reports more than one thread,
    /// we should fail fast with `SandboxError::MultiThreaded`.
    #[test]
    fn spawn_fails_when_thread_count_greater_than_one() {
        let sandbox = make_sandbox_for_thread_test();

        // `spawn_with_thread_counter` is a test-only hook that bypasses the
        // real `thread_count()` implementation.
        let result = sandbox.spawn_with_thread_counter(|| Ok(2));

        match result {
            Err(SandboxError::MultiThreaded(_)) => {}
            other => panic!("expected SandboxError::MultiThreaded, got {:?}", other),
        }
    }

    /// When the injected thread counter fails with an `io::Error`,
    /// we should fail fast with `SandboxError::ThreadCountUnavailable`.
    #[test]
    fn spawn_fails_when_thread_count_unavailable() {
        let sandbox = make_sandbox_for_thread_test();
        let io_err = std::io::Error::new(std::io::ErrorKind::Other, "synthetic /proc failure");

        let result = sandbox.spawn_with_thread_counter(|| Err(io_err));

        match result {
            Err(SandboxError::ThreadCountUnavailable(_)) => {}
            other => panic!(
                "expected SandboxError::ThreadCountUnavailable, got {:?}",
                other
            ),
        }
    }
}

```

To make these tests compile and exercise the intended preflight paths, the core sandbox code will need a small abstraction around `thread_count()`:

1. In the module where `Sandbox::spawn` is implemented:
   - Extract the existing thread-count logic into a function or trait, e.g.:
     ```rust
     pub trait ThreadCounter {
         fn thread_count(&self) -> std::io::Result<usize>;
     }

     pub struct ProcThreadCounter;

     impl ThreadCounter for ProcThreadCounter {
         fn thread_count(&self) -> std::io::Result<usize> {
             thread_count() // existing implementation
         }
     }
     ```
   - Update the internal implementation of `spawn` to use a generic thread counter, defaulting to the real implementation:
     ```rust
     impl Sandbox {
         fn spawn_internal<C, F>(&self, counter: C /* ... other args ... */) -> Result<..., SandboxError>
         where
             C: FnOnce() -> std::io::Result<usize>,
         {
             let threads = counter().map_err(SandboxError::ThreadCountUnavailable)?;
             if threads > 1 {
                 return Err(SandboxError::MultiThreaded(threads));
             }

             // existing spawn logic continues here...
         }

         pub fn spawn(/* existing signature */) -> Result<..., SandboxError> {
             self.spawn_internal(|| thread_count(), /* forward other args */)
         }

         /// Test-only hook used by `thread_count_error_tests`.
         #[cfg(test)]
         pub fn spawn_with_thread_counter<C>(&self, counter: C /* mirror spawn args as needed */)
             -> Result<..., SandboxError>
         where
             C: FnOnce() -> std::io::Result<usize>,
         {
             self.spawn_internal(counter /* forward other args */)
         }
     }
     ```
   - Ensure `SandboxError::MultiThreaded` and `SandboxError::ThreadCountUnavailable` variants are already defined and used here; if their payloads differ from `usize` or `io::Error`, adjust the tests’ pattern matches accordingly.

2. Align the tests with the real `Sandbox::builder` / `spawn` signatures:
   - If `Sandbox` is constructed differently, update `make_sandbox_for_thread_test` to use the correct constructor.
   - If `spawn` takes arguments (command, env, etc.), mirror those in `spawn_with_thread_counter` and pass simple dummy values from the tests that do not affect the preflight logic.

3. If the crate uses a different module path for `Sandbox`/`SandboxError`, adjust the `use crate::{Sandbox, SandboxError};` line to the correct paths.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/sandbox.rs:138` </location>
<code_context>
+    }
+}
+
+fn rebuild_from_existing_ancestor(path: &Path) -> Result<PathBuf, SandboxError> {
+    let mut missing_components: Vec<PathBuf> = Vec::new();
+    let mut cursor = Some(path);
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying `rebuild_from_existing_ancestor` using `ancestors`/`strip_prefix` and replacing the `BTreeSet`-based path handling with straightforward `Vec<PathBuf>` operations to make the sandbox code easier to follow.

You can keep the current behavior and noticeably reduce complexity in two focused spots:

### 1. Simplify `rebuild_from_existing_ancestor`

You can avoid the manual parent-walking loop, the explicit `Vec` of components, and the `rev()`-fold by using `Path::ancestors()` plus `strip_prefix`:

```rust
fn rebuild_from_existing_ancestor(path: &Path) -> Result<PathBuf, SandboxError> {
    // Find the closest existing ancestor
    let existing = path
        .ancestors()
        .find(|p| p.exists())
        .ok_or_else(|| SandboxError::MissingPath {
            path: path.to_path_buf(),
        })?;

    // Canonicalize that ancestor
    let base = fs::canonicalize(existing).map_err(|source| SandboxError::CanonicalisationFailed {
        path: existing.to_path_buf(),
        source,
    })?;

    // Re-attach the missing tail (path relative to existing ancestor)
    let tail = path.strip_prefix(existing).unwrap_or_else(|_| Path::new(""));
    Ok(base.join(tail))
}
```

This preserves the semantics of “canonicalize the first existing ancestor and then re-append missing components,” but removes the reverse tracking and explicit loop state.

### 2. Avoid `BTreeSet` unless you specifically need ordering/dedup

If you don’t rely on set semantics (ordering, dedup) for correctness, you can drop the `BTreeSet` abstraction and just canonicalize into a `Vec<PathBuf>`. That makes the logic more obvious and avoids per-spawn set construction:

```rust
fn canonicalised_paths(paths: &[PathBuf]) -> Result<Vec<PathBuf>, SandboxError> {
    paths.iter().map(|p| canonicalise(p)).collect()
}
```

Usage:

```rust
fn ensure_program_whitelisted(&self, program: &Path) -> Result<(), SandboxError> {
    let authorised = canonicalised_paths(self.profile.executable_paths())?;
    if authorised.iter().any(|p| p == program) {
        return Ok(());
    }
    Err(SandboxError::ExecutableNotAuthorised {
        program: program.to_path_buf(),
    })
}

fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
    let mut exceptions = Vec::new();
    let read_only = canonicalised_paths(self.profile.read_only_paths())?;
    let read_write = canonicalised_paths(self.profile.read_write_paths())?;
    let executables = canonicalised_paths(self.profile.executable_paths())?;

    for path in read_only {
        exceptions.push(Exception::Read(path));
    }
    for path in read_write {
        exceptions.push(Exception::WriteAndRead(path));
    }
    for path in executables {
        exceptions.push(Exception::ExecuteAndRead(path));
    }

    exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
    exceptions.extend(self.profile.environment_policy().to_exceptions());

    if matches!(self.profile.network_policy(), NetworkPolicy::Allow) {
        exceptions.push(Exception::Networking);
    }

    Ok(exceptions)
}
```

Behavior is preserved (duplicates in the profile still just translate to repeated exceptions, which `birdcage` can ignore or handle), but the per-spawn logic is simpler and more explicit. If you *do* need dedup, you can still dedup the `Vec` with a small helper instead of exposing `BTreeSet` in the control flow.
</issue_to_address>

### Comment 5
<location> `crates/weaver-sandbox/src/lib.rs:51` </location>
<code_context>
+pub use birdcage::process;
+pub use error::SandboxError;
+pub use profile::{EnvironmentPolicy, NetworkPolicy, SandboxProfile};
+pub use sandbox::{Sandbox, SandboxChild, SandboxCommand, SandboxOutput};
--- /dev/null
+++ b/crates/weaver-sandbox/src/env_guard.rs
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit tests for the new `weaver-sandbox` crate; the newly added test files are currently empty and do not exercise the sandbox behaviour or error paths.

Implement unit tests under `crates/weaver-sandbox/src/tests` that cover the key components (`SandboxProfile`, `Sandbox`, `EnvGuard`, `runtime::thread_count`, and `runtime::linux_runtime_roots`). Exercise at least:
- successful spawning of an authorised absolute executable
- rejection of non-absolute paths (`ProgramNotAbsolute`)
- rejection of non-whitelisted executables (`ExecutableNotAuthorised`)
- handling of missing and non-canonicalisable paths (`MissingPath`, `CanonicalisationFailed`)
- behaviour of environment policies (`Isolated`, `AllowList`, `InheritAll`) and `EnvGuard` restoring the parent environment
- `NetworkPolicy` toggling and how it affects collected `Exception`s
- single-threaded vs multi-threaded preflight (`MultiThreaded`, `ThreadCountUnavailable`).

In addition, implement behavioural tests (e.g. BDD-style scenarios using `rstest-bdd` and `sandbox.feature`) that drive the crate end-to-end via `Sandbox::spawn`, asserting observable behaviour (filesystem access, environment isolation, networking defaults, and the multi-threaded error behaviour). Ensure these tests are actually populated and executed rather than leaving the added test files empty.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/env_guard.rs:29` </location>
<code_context>
+        // Remove variables introduced while the guard was active.
+        for key in current.keys() {
+            if !expected_keys.contains(key) {
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used around safe `std::env` functions without an apparent technical need, which conflicts with the "avoid unsafe unless absolutely necessary" instruction.

The `restore` method wraps `env::remove_var` and `env::set_var` in `unsafe` blocks:

```rust
unsafe { env::remove_var(key) };
...
unsafe { env::set_var(key, value) };
```

These standard library APIs are already safe to call, so the `unsafe` blocks do not appear technically required to uphold memory or thread safety. That makes this a direct use of `unsafe` without a demonstrated necessity, contrary to the review instruction.

Unless there is a concrete UB-related reason that is missing from the comment, please remove the `unsafe` blocks and call the safe functions directly (or otherwise justify and encapsulate the `unsafe` in the narrowest possible way).

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

The canonicalise/rebuild_from_existing_ancestor logic currently returns Ok for paths whose leaf component does not exist (e.g. /definitely/missing/tool becomes a canonical path even though it’s missing), which means Sandbox::spawn will never produce SandboxError::MissingPath for such cases; consider changing rebuild_from_existing_ancestor (or canonical_program) to error when the full path does not exist so the error variants and behaviour match the intent of the API and tests.
In Sandbox::collect_exceptions, ExecuteAndRead(program) is always pushed even when program is already present in the canonicalised executable set, which is harmless but redundant; you could avoid the duplicate by either not re-adding program when it’s already authorised or by excluding it from the initial executables set.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `canonicalise`/`rebuild_from_existing_ancestor` logic currently returns `Ok` for paths whose leaf component does not exist (e.g. `/definitely/missing/tool` becomes a canonical path even though it’s missing), which means `Sandbox::spawn` will never produce `SandboxError::MissingPath` for such cases; consider changing `rebuild_from_existing_ancestor` (or `canonical_program`) to error when the full path does not exist so the error variants and behaviour match the intent of the API and tests.
- In `Sandbox::collect_exceptions`, `ExecuteAndRead(program)` is always pushed even when `program` is already present in the canonicalised executable set, which is harmless but redundant; you could avoid the duplicate by either not re-adding `program` when it’s already authorised or by excluding it from the initial `executables` set.

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/sandbox.rs:92-96` </location>
<code_context>
+        for path in read_write {
+            exceptions.push(Exception::WriteAndRead(path));
+        }
+        for path in executables {
+            exceptions.push(Exception::ExecuteAndRead(path));
+        }
+
+        exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
+
+        exceptions.extend(self.profile.environment_policy().to_exceptions());
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Avoid adding a duplicate `ExecuteAndRead` exception for the program path.

You’re pushing `ExecuteAndRead` for all `executables` and then again for `program`, which is guaranteed to be one of them. This can create duplicate exceptions for the same path if `birdcage` doesn’t deduplicate. Consider either skipping the final push when `program` is already in `executables`, or removing the loop over `executables` and relying only on the program-specific exception, if that matches your threat model.

```suggestion
        for path in &executables {
            exceptions.push(Exception::ExecuteAndRead(path.to_path_buf()));
        }

        if !executables.iter().any(|p| p == program) {
            exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
        }
```
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/tests/sandbox_unit.rs:1` </location>
<code_context>
+//! Unit tests covering sandbox spawn preflight errors.
+
+use std::path::PathBuf;
</code_context>

<issue_to_address>
**suggestion (testing):** Missing coverage for the `MultiThreaded` and `ThreadCountUnavailable` sandbox preflight errors.

These variants are part of the safety contract for `Sandbox::spawn` and should be explicitly exercised. Please add tests that:

- Assert `SandboxError::MultiThreaded` when `thread_count()` reports more than one thread (e.g. by spawning a parked helper thread; you can gate this behind `#[cfg(target_os = "linux")]` if needed).
- Assert `SandboxError::ThreadCountUnavailable` when `thread_count()` fails, possibly by making `thread_count` injectable or adding a small test seam.

This helps ensure the sandbox continues to fail closed if `/proc` or threading behaviour changes.

Suggested implementation:

```rust
 //! Unit tests covering sandbox spawn preflight errors.

use std::{path::PathBuf, thread, time::Duration};

use crate::sandbox::{Sandbox, SandboxCommand};
use crate::{SandboxError, SandboxProfile};

```

```rust
fn sandbox_with_profile(profile: SandboxProfile) -> Sandbox {
    Sandbox::new(profile)
}

#[cfg(target_os = "linux")]
#[test]
fn rejects_spawn_when_multi_threaded() {
    // Ensure we have at least one extra thread in the process before attempting
    // to spawn a sandboxed command. The sandbox should detect this and fail
    // closed with `SandboxError::MultiThreaded`.
    let helper = thread::spawn(|| {
        // Park the helper thread so it stays alive for the duration of the test.
        thread::park();
    });

    // Give the helper thread a brief moment to start and register in `/proc`.
    // The exact timing is not critical; we just want to avoid flakiness.
    thread::sleep(Duration::from_millis(50));

    let profile = SandboxProfile::default();
    let sandbox = sandbox_with_profile(profile);

    // Use a benign program path; the error we care about is the preflight
    // multi-threaded check, not command resolution.
    let command = SandboxCommand::new(PathBuf::from("/bin/true"));

    let result = sandbox.spawn(&command);
    assert!(
        matches!(result, Err(SandboxError::MultiThreaded)),
        "expected MultiThreaded error when spawning sandbox in a multi-threaded process, got: {result:?}"
    );

    // Cleanly shut down the helper thread.
    helper.thread().unpark();
    helper.join().expect("helper thread should join cleanly");
}

#[test]
fn rejects_spawn_when_thread_count_unavailable() {
    // This test verifies that a failure to determine the thread count causes
    // `Sandbox::spawn` to fail closed with `SandboxError::ThreadCountUnavailable`.
    //
    // It relies on a test-only seam in the sandbox implementation that allows
    // overriding the thread-counting logic.
    use crate::sandbox::testing::with_thread_count_override;

    let profile = SandboxProfile::default();
    let sandbox = sandbox_with_profile(profile);
    let command = SandboxCommand::new(PathBuf::from("/bin/true"));

    let result = with_thread_count_override(|| {
        // Simulate a failure to determine the thread count.
        Err(std::io::Error::new(
            std::io::ErrorKind::Other,
            "simulated thread count failure",
        ))
        .into()
    }, || sandbox.spawn(&command));

    assert!(
        matches!(result, Err(SandboxError::ThreadCountUnavailable)),
        "expected ThreadCountUnavailable error when thread counting fails, got: {result:?}"
    );
}

#[test]
fn rejects_relative_program_paths() {

```

To make the `rejects_spawn_when_thread_count_unavailable` test compile and behave as intended, you will need to introduce a small test seam in the sandbox implementation:

1. In the sandbox module (likely `crates/weaver-sandbox/src/sandbox.rs` or similar), extract the logic that determines the current process thread count into a helper function, e.g.:
   ```rust
   fn thread_count() -> std::io::Result<usize> { /* existing logic */ }
   ```

2. Under `#[cfg(test)]`, add a mechanism to override this function for the duration of a closure, for example:
   ```rust
   pub mod testing {
       use std::cell::RefCell;
       use std::io;

       thread_local! {
           static OVERRIDE: RefCell<Option<Box<dyn Fn() -> io::Result<usize>>>> = RefCell::new(None);
       }

       pub fn with_thread_count_override<F, R>(
           override_impl: impl Fn() -> io::Result<usize> + 'static,
           f: F,
       ) -> R
       where
           F: FnOnce() -> R,
       {
           OVERRIDE.with(|slot| {
               *slot.borrow_mut() = Some(Box::new(override_impl));
               let result = f();
               *slot.borrow_mut() = None;
               result
           })
       }

       pub fn thread_count() -> io::Result<usize> {
           OVERRIDE.with(|slot| {
               if let Some(ref f) = *slot.borrow() {
                   f()
               } else {
                   super::thread_count()
               }
           })
       }
   }
   ```

3. Update the production `Sandbox::spawn` implementation to call `thread_count()` (or `testing::thread_count()` under `cfg(test)`) so that during tests, `with_thread_count_override` can force `thread_count()` to fail and surface `SandboxError::ThreadCountUnavailable`.

4. Ensure `SandboxCommand::new` and `SandboxProfile::default()` exist and are used consistently with the rest of the test file; if the existing tests construct commands/profiles differently, mirror that pattern in the two new tests instead of the placeholders above.
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/unit.rs:7-16` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
</code_context>

<issue_to_address>
**suggestion (testing):** The `canonicalises_nonexistent_child_when_parent_exists` test name suggests canonicalisation behaviour, but it only checks path recording on the profile.

The current body only checks that `SandboxProfile` records a path ending with `file.txt`, and doesn’t touch the `rebuild_from_existing_ancestor` / canonicalisation logic in `sandbox.rs`, so the name is misleading.

Either rename the test to match what it actually verifies, or adjust it to call the canonicalisation helper (e.g. via a small test-only wrapper) and assert on the rebuilt path so the name and behaviour align.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/tests/env_guard.rs:17-18` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
+    const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
</code_context>

<issue_to_address>
**suggestion (testing):** EnvGuard tests don't verify that restoration also happens via Drop when `restore` is not called explicitly.

Right now both tests only cover the explicit `snapshot.restore()` path. Please add a test that relies on `Drop` instead (e.g., create an `EnvGuard` in a scope, mutate env vars as in the existing tests, then assert after the scope ends that the env is restored). This will ensure the `Drop`-based restoration path—how EnvGuard is typically used—remains correct as the implementation evolves.

Suggested implementation:

```rust
#[test]
fn restores_modified_and_removed_environment_variables() {
    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
    const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
    const EPHEMERAL: &str = "WEAVER_ENV_GUARD_EPHEMERAL";

    let _guard = lock_env();

    unsafe { env::set_var(EXISTING, "original") };
    unsafe { env::set_var(ANOTHER, "keep") };

    let snapshot = EnvGuard::capture();

    unsafe { env::set_var(EXISTING, "changed") };
    unsafe { env::remove_var(ANOTHER) };
    unsafe { env::set_var(EPHEMERAL, "ephemeral") };

    snapshot.restore();

    assert_eq!(unsafe { env::var(EXISTING).unwrap() }, "original");
    assert_eq!(unsafe { env::var(ANOTHER).unwrap() }, "keep");
    assert!(unsafe { env::var_os(EPHEMERAL) }.is_none());
}

#[test]
fn restores_environment_variables_on_drop() {
    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
    const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
    const EPHEMERAL: &str = "WEAVER_ENV_GUARD_EPHEMERAL";

    let _guard = lock_env();

    // Establish a known baseline
    unsafe { env::set_var(EXISTING, "original") };
    unsafe { env::set_var(ANOTHER, "keep") };
    unsafe { env::remove_var(EPHEMERAL) };

    {
        let _snapshot = EnvGuard::capture();

        // Mutate the environment while the guard is in scope
        unsafe { env::set_var(EXISTING, "changed") };
        unsafe { env::remove_var(ANOTHER) };
        unsafe { env::set_var(EPHEMERAL, "ephemeral") };

        // No explicit restore() call here; rely on Drop.
    }

    // After the guard is dropped, the environment should be restored
    assert_eq!(unsafe { env::var(EXISTING).unwrap() }, "original");
    assert_eq!(unsafe { env::var(ANOTHER).unwrap() }, "keep");
    assert!(unsafe { env::var_os(EPHEMERAL) }.is_none());
}

```

If the rest of the file uses a different pattern for assertions or unsafe env access (e.g., helper functions or a custom macro), you may want to align the new test with those conventions. Also ensure `use std::env;` is present at the top of this file if it isn’t already.
</issue_to_address>

### Comment 5
<location> `docs/users-guide.md:145-148` </location>
<code_context>
+
+External tools launched by the daemon now run inside the `weaver-sandbox`
+wrapper around `birdcage` 0.8.1. Linux namespaces and `seccomp-bpf` filters are
+applied automatically, networking is disabled by default, and only a small set
+of standard library directories are readable to keep dynamically linked
+executables functioning. Commands must be provided as absolute paths and added
</code_context>

<issue_to_address>
**suggestion (typo):** Consider adjusting punctuation/structure to avoid a comma splice in this sentence.

The phrase "applied automatically, networking is disabled by default, and only a small set" links three clauses with commas and reads like a comma splice. Consider semicolons or splitting into two sentences, e.g. "applied automatically; networking is disabled by default; and only a small set of standard library directories are readable" or similar.

```suggestion
wrapper around `birdcage` 0.8.1. Linux namespaces and `seccomp-bpf` filters are
applied automatically; networking is disabled by default; and only a small set
of standard library directories are readable to keep dynamically linked
executables functioning. Commands must be provided as absolute paths and added
```
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/lib.rs:51` </location>
<code_context>
+pub use birdcage::process;
+pub use error::SandboxError;
+pub use profile::{EnvironmentPolicy, NetworkPolicy, SandboxProfile};
+pub use sandbox::{Sandbox, SandboxChild, SandboxCommand, SandboxOutput};
--- /dev/null
+++ b/crates/weaver-sandbox/src/env_guard.rs
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both unit and behavioural tests for the new `weaver-sandbox` crate; the crate currently exposes non-trivial sandbox behaviour without any exercising tests.

Implement unit tests for the core building blocks (`SandboxProfile`, `EnvironmentPolicy`, `NetworkPolicy`, `EnvGuard`, `runtime::thread_count`, error conversions, and the path canonicalisation helpers) and behavioural tests that exercise the end-to-end sandbox flow.

At a minimum:
- Add unit tests under `crates/weaver-sandbox/src/tests/` that cover:
  - Successful spawning of an authorised absolute executable.
  - Rejection of non-absolute program paths (`ProgramNotAbsolute`).
  - Rejection of executables not authorised in the profile (`ExecutableNotAuthorised`).
  - Correct derivation of `Exception`s from read-only/read-write/executable paths and environment/network policies.
  - `EnvGuard` restoring the environment after `birdcage` manipulation.
  - `thread_count` parsing, including malformed `/proc/self/status` data (using test doubles or helper functions where direct `/proc` manipulation is not feasible).
- Add behavioural/acceptance tests (for example in `crates/weaver-sandbox/tests/features/sandbox.feature` and the corresponding Rust glue) that validate the observable behaviour of the sandbox in a real or simulated run:
  - A happy-path command runs successfully inside the sandbox with expected stdout/stderr.
  - Networking is denied by default and allowed only when requested.
  - Environment is stripped by default and inherited correctly when whitelisted or when `InheritAll` is used.
  - Multi-threaded invocation returns `SandboxError::MultiThreaded` instead of panicking.

Ensure these tests run in CI and demonstrably cover both the behavioural and unit-level expectations for this new feature, in line with the project testing rules.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural and unit tests are required.

</details>
</issue_to_address>

### Comment 7
<location> `crates/weaver-sandbox/src/env_guard.rs:32` </location>
<code_context>
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
+                unsafe { env::remove_var(key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used around `env::remove_var`/`env::set_var` even though these APIs are safe; this violates the guideline to avoid `unsafe` unless it is strictly necessary.

`std::env::remove_var` and `std::env::set_var` are safe functions and do not require an `unsafe` block. The comment suggests a project policy, but per the review instructions `unsafe` should be avoided unless absolutely necessary.

Please remove the `unsafe` blocks here (and around `env::set_var` below) or, if there is a concrete unsafety that cannot be expressed with safe code, encapsulate it in the narrowest possible helper with a clear justification tied to actual undefined-behaviour constraints.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/sandbox.rs:92-96` </location>
<code_context>
+        for path in read_write {
+            exceptions.push(Exception::WriteAndRead(path));
+        }
+        for path in executables {
+            exceptions.push(Exception::ExecuteAndRead(path));
+        }
+
+        exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
+
+        exceptions.extend(self.profile.environment_policy().to_exceptions());
</code_context>

<issue_to_address>
**suggestion:** Avoid adding a duplicate ExecuteAndRead exception for the program path.

Since `ensure_program_whitelisted` uses the same canonicalisation as `collect_exceptions`, an authorised `program` will already be in `executables`. In that case, `Exception::ExecuteAndRead(program)` is added twice. Consider either skipping the explicit push when `program` is already in `executables`, or not inserting `program` into `executables` when building exceptions.
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/sandbox.rs:82-84` </location>
<code_context>
+
+    fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
+        let mut exceptions = Vec::new();
+        let read_only = canonicalised_set(self.profile.read_only_paths())?;
+        let read_write = canonicalised_set(self.profile.read_write_paths())?;
+        let executables = canonicalised_set(self.profile.executable_paths())?;
+
+        for path in read_only {
</code_context>

<issue_to_address>
**suggestion (performance):** Canonicalising all profile paths on every spawn may be unnecessarily expensive.

Each `spawn` call re-canonicalises all read-only, read-write, and executable paths, adding repeated filesystem work that could be costly when profiles are reused. Consider canonicalising once when constructing `SandboxProfile` (or lazily caching the canonicalised sets on first use) and then reusing those cached paths in `spawn`.

Suggested implementation:

```rust
    fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
        let mut exceptions = Vec::new();
        let read_only = self.profile.read_only_paths_canonicalised()?;
        let read_write = self.profile.read_write_paths_canonicalised()?;
        let executables = self.profile.executable_paths_canonicalised()?;

```

To fully implement the caching and avoid repeated canonicalisation per `spawn`, you will also need to:

1. **Extend `SandboxProfile` to hold cached canonical paths**, for example (names can be adjusted to match your conventions):
   ```rust
   use once_cell::sync::OnceCell;
   use std::collections::HashSet;
   use std::path::PathBuf;

   pub struct SandboxProfile {
       // existing fields...
       read_only_paths: Vec<PathBuf>,
       read_write_paths: Vec<PathBuf>,
       executable_paths: Vec<PathBuf>,

       read_only_paths_canon: OnceCell<HashSet<PathBuf>>,
       read_write_paths_canon: OnceCell<HashSet<PathBuf>>,
       executable_paths_canon: OnceCell<HashSet<PathBuf>>,
   }
   ```

2. **Initialize the new `OnceCell` fields** in the `SandboxProfile` constructor / builder:
   ```rust
   impl SandboxProfile {
       pub fn new(/* existing params */) -> Self {
           Self {
               // existing initialisation...
               read_only_paths,
               read_write_paths,
               executable_paths,
               read_only_paths_canon: OnceCell::new(),
               read_write_paths_canon: OnceCell::new(),
               executable_paths_canon: OnceCell::new(),
           }
       }
   }
   ```

3. **Add the three new methods used in `collect_exceptions`** to lazily canonicalise and cache the sets:
   ```rust
   impl SandboxProfile {
       pub fn read_only_paths_canonicalised(&self) -> Result<&HashSet<PathBuf>, SandboxError> {
           self.read_only_paths_canon.get_or_try_init(|| {
               canonicalised_set(&self.read_only_paths)
           })
       }

       pub fn read_write_paths_canonicalised(&self) -> Result<&HashSet<PathBuf>, SandboxError> {
           self.read_write_paths_canon.get_or_try_init(|| {
               canonicalised_set(&self.read_write_paths)
           })
       }

       pub fn executable_paths_canonicalised(&self) -> Result<&HashSet<PathBuf>, SandboxError> {
           self.executable_paths_canon.get_or_try_init(|| {
               canonicalised_set(&self.executable_paths)
           })
       }
   }
   ```

4. **Ensure `canonicalised_set` is updated (if needed)** to accept the underlying path collection type used in `SandboxProfile` (e.g. `&[PathBuf]` or `&HashSet<PathBuf>`), so that it can be reused by these new methods without per-call allocation.

With these additional changes, canonicalisation will happen at most once per profile per path set, and `collect_exceptions` will reuse the cached canonicalised sets on every `spawn`.
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/profile.rs:163` </location>
<code_context>
+                let _ = keys.insert(key);
+                Self::AllowList(keys)
+            }
+            Self::InheritAll => Self::InheritAll,
+        }
+    }
</code_context>

<issue_to_address>
**question:** Allowing a variable while in `InheritAll` mode is a no-op, which may be surprising to callers.

In `EnvironmentPolicy::InheritAll`, calling `allow_environment_variable`/`with_allowed` effectively does nothing. While this matches the existing type, it’s not obvious from the builder-style API and may mislead callers into thinking they are constructing an allow-list. Consider either documenting this explicitly on `allow_environment_variable` or having `with_allowed` convert `InheritAll` into an `AllowList` seeded with the requested key, depending on the intended semantics.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/env_guard.rs:27-32` </location>
<code_context>
+        let expected_keys: HashSet<&OsString> = self.original.keys().collect();
+
+        // Remove variables introduced while the guard was active.
+        for key in current.keys() {
+            if !expected_keys.contains(key) {
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
+                unsafe { env::remove_var(key) };
+            }
+        }
</code_context>

<issue_to_address>
**suggestion:** The use of `unsafe` around `env::remove_var` appears unnecessary and could be avoided or better isolated.

Since `env::remove_var` / `env::set_var` are safe APIs, the `unsafe` here is a policy choice rather than required by the type system. If you want to keep that convention, consider wrapping env mutation in small helpers (e.g. `unsafe fn unset_env(...)`) and using those here. That keeps the `unsafe` surface minimal and the justification centralized and easier to audit.

Suggested implementation:

```rust
unsafe fn unset_env_var<K: AsRef<std::ffi::OsStr>>(key: K) {
    // Safety: by project policy, environment mutation is considered unsafe
    // until the std APIs settle for Rust 2024. Centralize that policy here
    // so that all env mutation is easy to audit.
    std::env::remove_var(key);
}

impl EnvGuard {

```

```rust
    pub(crate) fn restore(&self) {
        let current: HashMap<OsString, OsString> = env::vars_os().collect();
        let expected_keys: HashSet<&OsString> = self.original.keys().collect();

        // Remove variables introduced while the guard was active.
        for key in current.keys() {
            if !expected_keys.contains(key) {
                // Safety: env mutation is funneled through `unset_env_var` to
                // keep the unsafe surface minimal and well-documented.
                unsafe { unset_env_var(key) };
            }
        }

```

If there are other call sites in this crate that directly use `unsafe { env::remove_var(...) }` or `unsafe { env::set_var(...) }` to follow the same policy, they should be updated to call small helpers like `unset_env_var` (and a corresponding `set_env_var`) instead. This will further centralize and minimize the `unsafe` surface around environment mutation.
</issue_to_address>

### Comment 5
<location> `crates/weaver-sandbox/src/tests/unit.rs:7-16` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
</code_context>

<issue_to_address>
**issue (testing):** Test `canonicalises_nonexistent_child_when_parent_exists` name and intent don’t match the behaviour being exercised

This test only verifies that `SandboxProfile::allow_read_write_path` records a non-existent path; it never exercises the `canonicalise` / `rebuild_from_existing_ancestor` logic implied by its name. Since `SandboxProfile` doesn’t canonicalise paths, please either rename the test to match its actual behaviour, or refactor it to drive the canonicalisation path (for example, via `Sandbox::spawn` with a non-existent child under an existing parent and asserting the expected `SandboxError`).
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/tests/sandbox_unit.rs:1-10` </location>
<code_context>
+//! Unit tests covering sandbox spawn preflight errors.
+
+use std::path::PathBuf;
+
+use crate::sandbox::{Sandbox, SandboxCommand};
+use crate::{SandboxError, SandboxProfile};
+
+fn sandbox_with_profile(profile: SandboxProfile) -> Sandbox {
+    Sandbox::new(profile)
+}
+
+#[test]
+fn rejects_relative_program_paths() {
+    let sandbox = sandbox_with_profile(SandboxProfile::new());
+    let command = SandboxCommand::new("relative/bin");
+
+    let err = sandbox.spawn(command).expect_err("spawn should fail");
+    match err {
+        SandboxError::ProgramNotAbsolute(path) => {
+            assert_eq!(path, PathBuf::from("relative/bin"));
</code_context>

<issue_to_address>
**suggestion (testing):** Add unit tests for multi-threaded and thread-count-unavailable sandbox preflight failures

The tests don’t currently cover the preflight branches in `Sandbox::spawn` for:

- `SandboxError::MultiThreaded { .. }` when `thread_count() > 1`
- `SandboxError::ThreadCountUnavailable { .. }` when `thread_count()` returns `Err`

Consider adding tests that force these conditions (e.g., by making `thread_count` overridable in tests via a small abstraction or `cfg(test)` hook) and asserting that `spawn` produces the expected `SandboxError` variants.

Suggested implementation:

```rust
fn sandbox_with_profile(profile: SandboxProfile) -> Sandbox {
    Sandbox::new(profile)
}

/// Test-only helper to construct a `Sandbox` with a controlled thread-count
/// implementation so we can exercise preflight error paths.
///
/// This assumes a test-only constructor on `Sandbox` that allows injecting a
/// custom thread-count function. See `<additional_changes>` for the required
/// implementation on `Sandbox`.
#[cfg(test)]
fn sandbox_with_forced_thread_count<F>(profile: SandboxProfile, thread_count_fn: F) -> Sandbox
where
    F: Fn() -> Result<usize, Box<dyn std::error::Error + Send + Sync>> + Send + Sync + 'static,
{
    Sandbox::with_thread_counter_for_tests(profile, Box::new(thread_count_fn))
}

#[test]
fn rejects_multithreaded_processes() {
    // Force `thread_count()` to report more than one thread so that
    // `Sandbox::spawn` must fail with `SandboxError::MultiThreaded { .. }`.
    let sandbox = sandbox_with_forced_thread_count(SandboxProfile::new(), || Ok(4));
    let command = SandboxCommand::new("/usr/bin/true");

    let err = sandbox
        .spawn(command)
        .expect_err("spawn should fail for multi-threaded processes");
    match err {
        SandboxError::MultiThreaded { thread_count } => {
            assert_eq!(thread_count, 4);
        }
        other => panic!("expected MultiThreaded error, got: {other:?}"),
    }
}

#[test]
fn rejects_when_thread_count_unavailable() {
    // Force `thread_count()` to fail so that `Sandbox::spawn` must surface a
    // `SandboxError::ThreadCountUnavailable { .. }`.
    let sandbox = sandbox_with_forced_thread_count(SandboxProfile::new(), || {
        Err("thread count failed".into())
    });
    let command = SandboxCommand::new("/usr/bin/true");

    let err = sandbox
        .spawn(command)
        .expect_err("spawn should fail when thread count is unavailable");
    match err {
        SandboxError::ThreadCountUnavailable { .. } => {
            // Variant match is sufficient; the exact error type/message is an implementation detail.
        }
        other => panic!("expected ThreadCountUnavailable error, got: {other:?}"),
    }
}

```

` for the required
/// implementation on `Sandbox`.
#[cfg(test)]
fn sandbox_with_forced_thread_count<F>(profile: SandboxProfile, thread_count_fn: F) -> Sandbox
where
    F: Fn() -> Result<usize, Box<dyn std::error::Error + Send + Sync>> + Send + Sync + 'static,
{
    Sandbox::with_thread_counter_for_tests(profile, Box::new(thread_count_fn))
}

#[test]
fn rejects_multithreaded_processes() {
    // Force `thread_count()` to report more than one thread so that
    // `Sandbox::spawn` must fail with `SandboxError::MultiThreaded { .. }`.
    let sandbox = sandbox_with_forced_thread_count(SandboxProfile::new(), || Ok(4));
    let command = SandboxCommand::new("/usr/bin/true");

    let err = sandbox
        .spawn(command)
        .expect_err("spawn should fail for multi-threaded processes");
    match err {
        SandboxError::MultiThreaded { thread_count } => {
            assert_eq!(thread_count, 4);
        }
        other => panic!("expected MultiThreaded error, got: {other:?}"),
    }
}

#[test]
fn rejects_when_thread_count_unavailable() {
    // Force `thread_count()` to fail so that `Sandbox::spawn` must surface a
    // `SandboxError::ThreadCountUnavailable { .. }`.
    let sandbox = sandbox_with_forced_thread_count(SandboxProfile::new(), || {
        Err("thread count failed".into())
    });
    let command = SandboxCommand::new("/usr/bin/true");

    let err = sandbox
        .spawn(command)
        .expect_err("spawn should fail when thread count is unavailable");
    match err {
        SandboxError::ThreadCountUnavailable { .. } => {
            // Variant match is sufficient; the exact error type/message is an implementation detail.
        }
        other => panic!("expected ThreadCountUnavailable error, got: {other:?}"),
    }
}
>>>>>>> REPLACE
</file_operation>
</file_operations>

<additional_changes>
To make these tests compile and correctly exercise the preflight branches in `Sandbox::spawn`, you will need to extend the `Sandbox` implementation (likely in `crates/weaver-sandbox/src/sandbox.rs` or similar):

1. **Add a test-only constructor that accepts a thread-count function:**
   - Behind `#[cfg(test)]`, add something like:
     ```rust
     impl Sandbox {
         #[cfg(test)]
         pub fn with_thread_counter_for_tests(
             profile: SandboxProfile,
             thread_counter: Box<dyn Fn() -> Result<usize, Box<dyn std::error::Error + Send + Sync>> + Send + Sync>,
         ) -> Self {
             Self {
                 profile,
                 thread_counter, // store this in the struct instead of (or in addition to) the production path
                 // ...copy any other fields you need from `Sandbox::new`
             }
         }
     }
     ```
   - Introduce a `thread_counter` field (or similar) on `Sandbox` that is used by `Sandbox::spawn` instead of calling a free `thread_count()` function directly. In non-test builds, `Sandbox::new` should set this field to the real implementation.

2. **Wire `Sandbox::spawn` to use the injected thread-count function:**
   - In `spawn`, replace direct calls to `thread_count()` (or equivalent) with something like:
     ```rust
     let thread_count = (self.thread_counter)().map_err(|source| SandboxError::ThreadCountUnavailable { source })?;
     if thread_count > 1 {
         return Err(SandboxError::MultiThreaded { thread_count });
     }
     ```
   - Ensure the `ThreadCountUnavailable` and `MultiThreaded` variants of `SandboxError` match the patterns used in the tests (names and field types).

3. **Adjust types if necessary:**
   - If your existing `SandboxError::ThreadCountUnavailable` wraps a specific error type (e.g. `std::io::Error` or `anyhow::Error`), update the signature of `with_thread_counter_for_tests` and the closure types in this test file to match.
   - If `thread_count` is not `usize`, update the tests and the helper to use the correct numeric type while still forcing `> 1` for the multithreaded case.

If your codebase already has a different mechanism for overriding the thread-count implementation in tests (e.g., a `cfg(test)` global setter or a `sandbox::testing` module), you can replace the `Sandbox::with_thread_counter_for_tests` helper and the `sandbox_with_forced_thread_count` wrapper with the existing abstraction and keep the test bodies largely the same.
</issue_to_address>

### Comment 7
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:1-8` </location>
<code_context>
+use rstest::fixture;
</code_context>

<issue_to_address>
**suggestion (testing):** Behavioural tests assume a Linux host but aren’t gated, which may cause failures on non-Linux platforms

These behavioural tests call `resolve_binary` from `tests::support`, which panics on non-Linux targets. Since `behaviour.rs` runs on all targets, scenarios will panic on non-Linux hosts even though the crate’s non-Linux defaults are valid.

Consider either:
- Gating the module or `sandbox_behaviour` with `#[cfg(target_os = "linux")]`, or
- Adding a guard like `if !cfg!(target_os = "linux") { return; }` in the tests

so the tests are skipped rather than panicking on non-Linux, matching the intended Linux-centric assumptions.

```suggestion
#![cfg(target_os = "linux")]
//! Behavioural tests for sandbox spawning using `rstest-bdd`.

use std::cell::RefCell;

use rstest::fixture;
use rstest_bdd_macros::{given, scenario, then, when};

use crate::tests::support::TestWorld;
```
</issue_to_address>

### Comment 8
<location> `crates/weaver-sandbox/src/tests/env_guard.rs:44-66` </location>
<code_context>
+    unsafe { env::remove_var(ANOTHER) };
+}
+
+#[test]
+fn removes_variables_created_during_guard_lifetime() {
+    const PRE_EXISTING: &str = "WEAVER_ENV_GUARD_PRE_EXISTING";
+    const CREATED: &str = "WEAVER_ENV_GUARD_CREATED";
+
+    let _guard = lock_env();
+
+    unsafe { env::set_var(PRE_EXISTING, "value") };
+
+    let snapshot = EnvGuard::capture();
+
+    unsafe { env::set_var(CREATED, "temporary") };
+
+    assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));
+    assert_eq!(env::var(CREATED).as_deref(), Ok("temporary"));
+
+    snapshot.restore();
+
+    assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));
+    assert!(env::var(CREATED).is_err());
+
+    unsafe { env::remove_var(PRE_EXISTING) };
+}
</code_context>

<issue_to_address>
**suggestion (testing):** Add a test to explicitly verify `EnvGuard` restoration via `Drop` without calling `restore`

Current tests only validate restoration via `snapshot.restore()`, not via `Drop`, so the RAII behaviour isn’t exercised.

Please add a test that uses `EnvGuard::capture()`, mutates the environment, then lets the guard go out of scope without calling `restore`, and finally asserts that the environment matches the pre-snapshot state. This will explicitly cover the `Drop` path and confirm the documented behaviour.

```suggestion
#[test]
fn removes_variables_created_during_guard_lifetime() {
    const PRE_EXISTING: &str = "WEAVER_ENV_GUARD_PRE_EXISTING";
    const CREATED: &str = "WEAVER_ENV_GUARD_CREATED";

    let _guard = lock_env();

    unsafe { env::set_var(PRE_EXISTING, "value") };

    let snapshot = EnvGuard::capture();

    unsafe { env::set_var(CREATED, "temporary") };

    assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));
    assert_eq!(env::var(CREATED).as_deref(), Ok("temporary"));

    snapshot.restore();

    assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));
    assert!(env::var(CREATED).is_err());

    unsafe { env::remove_var(PRE_EXISTING) };
}

#[test]
fn restores_environment_on_drop() {
    const PRE_EXISTING: &str = "WEAVER_ENV_GUARD_DROP_PRE_EXISTING";
    const CREATED: &str = "WEAVER_ENV_GUARD_DROP_CREATED";

    let _guard = lock_env();

    // Set up initial environment
    unsafe { env::set_var(PRE_EXISTING, "original") };

    {
        // Capture the environment; rely on Drop for restoration
        let _snapshot = EnvGuard::capture();

        // Mutate both an existing and a new variable
        unsafe {
            env::set_var(PRE_EXISTING, "modified");
            env::set_var(CREATED, "temporary");
        }

        // Verify mutations are visible while the guard is alive
        assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("modified"));
        assert_eq!(env::var(CREATED).as_deref(), Ok("temporary"));
    }

    // After _snapshot is dropped, environment should be restored
    assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("original"));
    assert!(env::var(CREATED).is_err());

    // Cleanup
    unsafe { env::remove_var(PRE_EXISTING) };
}
```
</issue_to_address>

### Comment 9
<location> `crates/weaver-sandbox/src/env_guard.rs:22` </location>
<code_context>
+        }
+    }
+
+    pub(crate) fn restore(&self) {
+        let current: HashMap<OsString, OsString> = env::vars_os().collect();
+        let expected_keys: HashSet<&OsString> = self.original.keys().collect();
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying `restore` by clearing and then repopulating the environment from the captured snapshot, optionally storing the snapshot as a linear collection instead of a `HashMap`.

You can simplify `restore` without changing semantics by clearing the current environment and then re-populating it from the captured snapshot. This removes the extra `HashMap` and `HashSet` allocations and makes the unsafe section easier to reason about.

```rust
impl EnvGuard {
    pub(crate) fn restore(&self) {
        // Clear current environment
        for (key, _) in env::vars_os() {
            // Safety: project policy requires env mutation to be wrapped in unsafe.
            unsafe { env::remove_var(&key) };
        }

        // Restore original snapshot
        for (key, value) in &self.original {
            // Safety: see note above regarding env mutation policy.
            unsafe { env::set_var(key, value) };
        }
    }
}
```

If you want to reduce storage complexity as well (optional), you can store a linear collection instead of a `HashMap`, since environment keys are already unique:

```rust
#[derive(Debug)]
pub struct EnvGuard {
    original: Vec<(OsString, OsString)>,
}

impl EnvGuard {
    #[must_use]
    pub fn capture() -> Self {
        Self {
            original: env::vars_os().collect(),
        }
    }

    pub(crate) fn restore(&self) {
        for (key, _) in env::vars_os() {
            unsafe { env::remove_var(&key) };
        }

        for (key, value) in &self.original {
            unsafe { env::set_var(key, value) };
        }
    }
}
```

Both variants preserve the behavior of “restore env to exactly the captured snapshot” with less indirection and state.
</issue_to_address>

### Comment 10
<location> `crates/weaver-sandbox/src/lib.rs:51` </location>
<code_context>
+pub use birdcage::process;
+pub use error::SandboxError;
+pub use profile::{EnvironmentPolicy, NetworkPolicy, SandboxProfile};
+pub use sandbox::{Sandbox, SandboxChild, SandboxCommand, SandboxOutput};
--- /dev/null
+++ b/crates/weaver-sandbox/src/env_guard.rs
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both unit and behavioural tests covering the new `weaver-sandbox` crate behaviour; the newly added test modules and feature file are empty and do not exercise the implemented functionality.

Add comprehensive unit tests (e.g. in `src/tests/unit.rs` / `sandbox_unit.rs`) for the core behaviours of `SandboxProfile`, `Sandbox`, `EnvGuard`, and `runtime` helpers: absolute-path enforcement, executable allowlisting, canonicalisation and `MissingPath` handling, environment policy conversion to `birdcage::Exception`, `NetworkPolicy` handling, and single-threaded preflight checks (including `MultiThreaded` and `ThreadCountUnavailable`).

Add behavioural tests (e.g. BDD scenarios in `tests/features/sandbox.feature` with step definitions) that launch real subprocesses via `Sandbox::spawn` to verify the documented end-to-end behaviour: default network denial, environment isolation and allowlisting, requirement for absolute whitelisted executables, and successful execution paths when correctly configured. Ensure you actually populate the currently empty test files so that both unit and behavioural coverage enforce this new sandboxing feature’s contract.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 11
<location> `crates/weaver-sandbox/src/env_guard.rs:32` </location>
<code_context>
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
+                unsafe { env::remove_var(key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used around `env::remove_var` even though the standard library API is safe, which violates the guideline to avoid `unsafe` unless absolutely necessary.

`std::env::remove_var` (and `set_var` below) are safe APIs and do not require `unsafe` blocks. The project guideline is to avoid `unsafe` unless it is strictly necessary, so these `unsafe` wrappers should be removed and the calls made directly. If there is a concrete unsafety concern that justifies this, it should be documented explicitly next to the code and in project guidelines.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

### Comment 12
<location> `docs/weaver-design.md:987` </location>
<code_context>
+
+The first cut of the dedicated `weaver-sandbox` crate now wraps `birdcage`
+v0.8.1 with Weaver defaults. Executables must be supplied as absolute paths and
+whitelisted explicitly; the wrapper canonicalises the whitelist before launch
+to prevent symlink escapes. Standard Linux library roots (`/lib`, `/lib64`,
+`/usr/lib`, and their architecture-specific variants) are whitelisted for
</code_context>

<issue_to_address>
**issue (review_instructions):** The word “canonicalises” should use -ize spelling (“canonicalizes”) to follow the specified en-GB-oxendic convention.

The style guide here calls for en-GB-oxendic spelling, which prefers -ize rather than -ise. Suggest changing “canonicalises” to “canonicalizes” to match.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Use en-GB-oxendic (-ize / -yse / -our) spelling and grammar.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 970c172 and 474eb9d.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

Makefile (1 hunks)
crates/weaver-sandbox/Cargo.toml (1 hunks)
crates/weaver-sandbox/src/env_guard.rs (1 hunks)
crates/weaver-sandbox/src/lib.rs (1 hunks)
crates/weaver-sandbox/src/profile.rs (1 hunks)
crates/weaver-sandbox/src/runtime.rs (1 hunks)
crates/weaver-sandbox/src/sandbox.rs (1 hunks)
crates/weaver-sandbox/src/tests/behaviour.rs (1 hunks)
crates/weaver-sandbox/src/tests/env_guard.rs (1 hunks)
crates/weaver-sandbox/src/tests/sandbox_unit.rs (1 hunks)
crates/weaver-sandbox/src/tests/support/mod.rs (1 hunks)
crates/weaver-sandbox/src/tests/unit.rs (1 hunks)
crates/weaver-sandbox/tests/features/sandbox.feature (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing, which execute cargo fmt --workspace -- --check, cargo clippy --workspace --all-targets --all-features -- -D warnings, and cargo test --workspace respectively.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality; run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated tests with #[rstest(...)] parameterised cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line function definitions in Rust where appropriate (e.g., `pub fn new(id: u64) -> Self { ...

Files:

crates/weaver-sandbox/src/tests/sandbox_unit.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/tests/unit.rs
crates/weaver-sandbox/src/runtime.rs
crates/weaver-sandbox/src/env_guard.rs
crates/weaver-sandbox/src/lib.rs
crates/weaver-sandbox/src/tests/env_guard.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/profile.rs
crates/weaver-sandbox/src/sandbox.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap() in tests.

.expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and...

Files:

crates/weaver-sandbox/src/tests/sandbox_unit.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/tests/unit.rs
crates/weaver-sandbox/src/runtime.rs
crates/weaver-sandbox/src/env_guard.rs
crates/weaver-sandbox/src/lib.rs
crates/weaver-sandbox/src/tests/env_guard.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/profile.rs
crates/weaver-sandbox/src/sandbox.rs

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: leynos/weaver PR: 0
File: docs/roadmap.md:0-0
Timestamp: 2025-12-03T19:40:08.813Z
Learning: Implement `weaver-sandbox` crate using `birdcage` for focused scope, prioritising robust Linux support via namespaces and seccomp-bpf

🧬 Code graph analysis (9)

crates/weaver-sandbox/src/tests/sandbox_unit.rs (3)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (63-81)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/tests/support/mod.rs (3)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/env_guard.rs (1)

capture (22-26)

crates/weaver-sandbox/src/tests/unit.rs (2)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (63-81)

crates/weaver-sandbox/src/profile.rs (6)

new (58-66)

read_only_paths (111-113)

network_policy (132-134)

is_denied (146-148)

environment_policy (126-128)

read_write_paths (116-118)

crates/weaver-sandbox/src/runtime.rs (2)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/env_guard.rs (1)

crates/weaver-sandbox/src/tests/support/mod.rs (3)

set_var (36-38)

drop (42-47)

drop (142-144)

crates/weaver-sandbox/src/tests/env_guard.rs (1)

crates/weaver-sandbox/src/env_guard.rs (1)

capture (22-26)

crates/weaver-sandbox/src/tests/behaviour.rs (3)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (63-81)

crates/weaver-sandbox/src/profile.rs (1)

new (58-66)

crates/weaver-sandbox/src/profile.rs (4)

crates/weaver-sandbox/src/runtime.rs (1)

linux_runtime_roots (9-39)

crates/weaver-sandbox/src/sandbox.rs (1)

new (31-33)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (63-81)

crates/weaver-lsp-host/src/capability.rs (1)

key (25-31)

crates/weaver-sandbox/src/sandbox.rs (3)

crates/weaver-sandbox/src/runtime.rs (1)

thread_count (42-62)

crates/weaver-sandbox/src/profile.rs (2)

new (58-66)

network_policy (132-134)

crates/weaver-sandbox/src/env_guard.rs (2)

capture (22-26)

drop (50-52)

🔍 Remote MCP Deepwiki, Ref

Summary of additional facts found (relevant to reviewing PR #33)

Workspace / dependency:
- birdcage added as workspace dependency at version 0.8.1 (crate wrapper target).,
Platform focus / runtime helpers:
- Implementation is Linux‑centric: linux_runtime_roots() returns canonicalised runtime roots on Linux and empty Vec on other OSes; thread_count() parses /proc/self/status on Linux and returns Ok(1) on non‑Linux. (preflight checks will behave differently off Linux).
Single‑thread enforcement and Makefile change:
- Tests now run RUST_TEST_THREADS=1 only for the weaver-sandbox crate (Makefile runs two cargo test invocations: one with RUST_TEST_THREADS=1 -p weaver-sandbox, then workspace tests --exclude weaver-sandbox), avoiding forcing the whole workspace single‑threaded.
Path canonicalisation / whitelisting:
- canonicalise() will rebuild non‑existent target paths from the nearest existing ancestor (rebuild_from_existing_ancestor) so MissingPath is only emitted when no ancestor exists — addresses prior concern about allowing non‑existent output paths.
EnvGuard and environment handling:
- New EnvGuard captures parent env and restores it on Drop (removes variables created during scope, restores originals). Tests serialize env mutations with a global mutex; env mutation uses unsafe blocks with explicit safety notes.
Sandbox behavior and errors:
- Sandbox::spawn enforces single‑threaded preflight (returns SandboxError::MultiThreaded if >1), requires absolute canonicalised program paths, validates against profile whitelists (ExecutableNotAuthorised), and aggregates birdcage Exceptions (read/write/exec, Environment/FullEnvironment, Networking) from SandboxProfile.
Tests and test harness:
- Adds unit tests, rstest‑bdd behavioural tests, feature scenarios (fixtures, whitelist access, env allowlist/inheritance, multi‑thread rejection). Test support includes TestWorld harness with TempDir fixtures and resolve_binary helper (Linux‑specific behavior in tests).
Documentation & roadmap:
- docs/weaver-design.md, docs/users-guide.md updated to document weaver‑sandbox defaults (birdcage v0.8.1, absolute path requirement, env isolation default, network denied, single‑thread note); roadmap entry marked complete.

Tools used:

Deepwiki_read_wiki_contents (repo: leynos/weaver) — primary source for crate, tests, Makefile, docs, runtime behavior.
Ref_ref_search_documentation — search used to confirm dependency/birdcage mention.

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Sourcery review

🔇 Additional comments (36)

Makefile (1)

19-21: LGTM!

The test target correctly isolates the single-threaded requirement to weaver-sandbox whilst allowing the rest of the workspace to run with full parallelism. This addresses the prior concern about global RUST_TEST_THREADS=1 forcing the entire workspace single-threaded.

crates/weaver-sandbox/src/tests/sandbox_unit.rs (1)

12-51: LGTM!

Tests thoroughly cover the three preflight rejection scenarios (relative paths, missing paths, unwhitelisted executables) with clear assertions and descriptive panic messages.

crates/weaver-sandbox/src/tests/unit.rs (3)

1-6: LGTM!

Module documentation is present. Imports are appropriate for the tests.

35-67: Past review concerns addressed.

Tests now cover allow_networking(), allow_full_environment(), and the behaviour that allowlist mutations have no effect after allow_full_environment() is called. This resolves the outstanding feedback from sourcery-ai.

69-97: LGTM!

Path recording and canonicalisation tests exercise the expected behaviour for read-only, read-write, and non-existent child paths. The use of tempfile::tempdir() with .expect() is appropriate for test setup.

crates/weaver-sandbox/src/tests/env_guard.rs (2)

17-66: LGTM!

Tests correctly use unsafe blocks for environment mutations as required by Rust 2024. The mutex guard serialises tests appropriately, and cleanup at the end of each test ensures isolation.

8-15: The two ENV_MUTEX definitions serve independent test contexts and should not be consolidated. env_guard.rs is a standalone unit test file that tests the EnvGuard type itself and requires its own environment lock for isolation. support/mod.rs provides shared fixtures for behaviour tests with its own separate lock. These are not duplicates but legitimate independent uses—unit tests cannot depend on behaviour test support modules.

Likely an incorrect or invalid review comment.

crates/weaver-sandbox/src/env_guard.rs (4)

1-7: LGTM!

Module documentation clearly explains the Rust 2024 policy requiring unsafe blocks for environment mutations. This addresses the outdated past review comments from sourcery-ai—those suggestions are no longer applicable given that "Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such" per coding guidelines.

13-26: LGTM!

The EnvGuard struct and capture() implementation are clean. Using HashMap for storage is appropriate, and the #[must_use] attribute correctly signals that discarding the guard defeats its purpose.

28-46: Past review concerns are now resolved.

The previous sourcery-ai comments requesting removal of unsafe blocks are obsolete. Per coding guidelines, Rust 2024 mandates unsafe for env::set_var and env::remove_var. The implementation correctly:

Snapshots current environment before mutation (avoiding iterator invalidation)

Removes variables introduced during guard lifetime

Restores original values

The safety comments adequately document the rationale.

49-53: LGTM!

The Drop implementation delegates to restore(), ensuring automatic cleanup when the guard goes out of scope.

crates/weaver-sandbox/Cargo.toml (1)

1-14: LGTM: Clean workspace-based manifest.

The manifest correctly uses workspace-based dependencies and includes appropriate testing infrastructure (rstest, rstest-bdd, tempfile) that aligns with the test scaffolding in the crate.

crates/weaver-sandbox/tests/features/sandbox.feature (1)

1-45: LGTM: Comprehensive BDD scenarios.

The feature file provides clear coverage of sandbox isolation behaviours including file access restrictions and environment policy modes (allowlist, isolated, full inheritance). Scenarios are well-structured and align with the implemented step definitions.

crates/weaver-sandbox/src/runtime.rs (2)

9-39: LGTM: Platform-aware runtime roots.

The function correctly includes both x86_64 and aarch64 library paths, filters by existence, and canonicalises. The silent error dropping via .ok() is acceptable for whitelist filtering, where unresolvable paths should be excluded.

41-62: LGTM: Robust thread counting.

The Linux implementation provides explicit error handling with descriptive messages, while the non-Linux fallback safely returns 1. The parsing logic is clear and handles malformed data gracefully.

crates/weaver-sandbox/src/lib.rs (1)

1-51: LGTM: Well-documented API surface.

The module documentation clearly explains the zero-trust design, restrictive defaults, and single-threaded requirement. The example demonstrates typical usage and includes proper attributes (no_run). The public API surface is cleanly defined through selective re-exports.

crates/weaver-sandbox/src/tests/support/mod.rs (3)

50-145: LGTM: Well-structured test harness.

The TestWorld fixture provides clean isolation with TempDir, lazy EnvHandle acquisition, and guaranteed cleanup via Drop. The configuration methods follow a clear pattern and align with the BDD scenarios.

147-161: LGTM: Platform-appropriate binary resolution.

The function is correctly cfg-gated with a Linux implementation that searches candidates and a non-Linux variant that provides a clear diagnostic. This aligns with the Linux-centric design documented in the PR.

163-168: LGTM: Simple fixture writer.

The helper correctly creates and writes fixture files with clear error messages for test debugging.

crates/weaver-sandbox/src/tests/behaviour.rs (5)

10-68: LGTM: Clean step definitions.

The given steps correctly configure TestWorld with single borrows and cover all environment policy modes (isolated, allowlist, full inheritance). Implementation aligns well with the feature file scenarios.

70-73: LGTM: Straightforward launch step.

The when step correctly triggers sandbox launch via TestWorld.

75-124: LGTM: Clear assertion steps.

The then steps provide specific assertions with descriptive error messages. The stdout checks correctly handle Gherkin-quoted strings with trim_matches('"').

126-139: LGTM: Robust environment cleanup verification.

The step correctly verifies that test-specific values have been restored without assuming the host environment lacks these variables. This addresses the concern from past reviews.

141-144: LGTM: Correct scenario binding.

The scenario function correctly binds step definitions to the feature file.

crates/weaver-sandbox/src/profile.rs (5)

8-18: LGTM: Well-designed environment policy enum.

The three variants (Isolated, AllowList, InheritAll) clearly express the policy modes, with Isolated as the secure default. Using BTreeSet for the allowlist ensures deterministic ordering.

20-28: LGTM: Clear network policy enum.

The binary choice (Deny/Allow) is well-documented, with Deny as the secure default.

30-141: LGTM: Clean profile builder.

The SandboxProfile provides a fluent builder API with appropriate defaults (Linux runtime roots, restricted environment and network). The accessors are correctly scoped, and the doc example demonstrates typical usage.

143-149: LGTM: Clear predicate method.

The is_denied method provides a readable way to check network policy state.

151-178: LGTM: Clean policy helpers.

The with_allowed method correctly transitions between policy states, and to_exceptions provides a clear mapping to birdcage exceptions. Both helpers are appropriately scoped as crate-private.

crates/weaver-sandbox/src/sandbox.rs (7)

15-33: LGTM: Clean type aliases and constructor.

The type aliases provide a cleaner public API whilst maintaining access to the underlying birdcage types. The Sandbox struct is minimal and focused.

35-57: LGTM: Well-orchestrated spawn flow.

The spawn method enforces necessary preflights (single-thread, absolute path, whitelist), captures environment state with EnvGuard, collects birdcage exceptions, and spawns the sandboxed process. The implementation aligns with the documented behaviour.

59-78: LGTM: Clear preflight checks.

Both methods provide clear validation with specific error variants. The single-thread check prevents the birdcage assertion panic, and the whitelist check enforces explicit executable authorisation.

80-105: LGTM: Comprehensive exception collection.

The method correctly aggregates birdcage exceptions from the profile's policies. The delegation to environment_policy().to_exceptions() cleanly handles the environment policy logic, and the automatic inclusion of the program's ExecuteAndRead exception is appropriate.

107-114: LGTM: Clear program path validation.

The method correctly enforces the absolute path requirement before canonicalising.

116-123: LGTM: Correct path set canonicalisation.

The function correctly canonicalises paths and collects them into a BTreeSet for uniqueness and ordering. Ignoring the insert result is appropriate when duplicates after canonicalisation are expected.

125-167: LGTM: Robust path canonicalisation with ancestor fallback.

The canonicalise function correctly avoids TOCTOU by relying solely on fs::canonicalize and mapping NotFound to rebuild_from_existing_ancestor. The rebuild helper correctly walks up to find an existing ancestor, canonicalises it, and reconstructs the full path—addressing the PR objective of allowing non-existent output paths to be whitelisted.

Based on learnings, this approach provides focused sandboxing with robust Linux support.

leynos · 2025-12-04T17:52:43Z

@coderabbitai Have the following now been resolved?


### Comment 1
<location> `crates/weaver-sandbox/src/env_guard.rs:32-35` </location>
<code_context>
+        // Remove variables introduced while the guard was active.
+        for (key, _) in env::vars_os() {
+            if !expected_keys.contains(&key) {
+                // SAFETY: keys originate from the host OS and were previously
+                // present in the environment, so removal cannot violate
+                // invariants expected by `std::env`.
+                unsafe { env::remove_var(&key) };
+            }
+        }
</code_context>

<issue_to_address>
**suggestion:** The `unsafe` blocks around `env::remove_var`/`env::set_var` are unnecessary and could be removed to simplify the code.

These functions are explicitly safe and support `OsString` keys/values, so the `unsafe` blocks and `SAFETY` comments don’t add protection and instead increase audit noise. Removing them simplifies the implementation without changing semantics.

Suggested implementation:

```rust
        // Remove variables introduced while the guard was active.
        for (key, _) in env::vars_os() {
            if !expected_keys.contains(&key) {
                env::remove_var(&key);
            }
        }

```

1. Search the rest of `crates/weaver-sandbox/src/env_guard.rs` for any other `unsafe` blocks surrounding `env::set_var` or `env::remove_var`.
2. Apply the same pattern: remove the `unsafe` block and any corresponding `SAFETY` comment, and call `env::set_var` / `env::remove_var` directly as safe functions.
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/sandbox.rs:133-140` </location>
<code_context>
+    Ok(set)
+}
+
+fn canonicalise(path: &Path) -> Result<PathBuf, SandboxError> {
+    if !path.exists() {
+        return Err(SandboxError::MissingPath {
+            path: path.to_path_buf(),
+        });
+    }
+
+    fs::canonicalize(path).map_err(|source| SandboxError::CanonicalisationFailed {
+        path: path.to_path_buf(),
+        source,
</code_context>

<issue_to_address>
**suggestion (bug_risk):** The explicit `exists()` check before `canonicalize` introduces a TOCTTOU window and might be better folded into the error handling.

Because `path.exists()` and `fs::canonicalize(path)` are separate operations, the filesystem can change in between, so you may see different behavior depending on timing. Instead, you can rely on `fs::canonicalize` alone and map `io::ErrorKind::NotFound` to `SandboxError::MissingPath`, which preserves behavior while avoiding both the TOCTTOU window and the extra syscall.
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/unit.rs:36` </location>
<code_context>
+}
+
+#[test]
+fn network_is_denied_by_default() {
+    let profile = SandboxProfile::new();
+    assert_eq!(profile.network_policy(), NetworkPolicy::Deny);
</code_context>

<issue_to_address>
**suggestion (testing):** Extend unit tests to cover `allow_networking` and full-environment inheritance behaviour

Current tests cover the default network policy and environment allowlist deduplication. To complete coverage of `SandboxProfile`’s public API, please also add tests that:

- Use `SandboxProfile::new().allow_networking()` and assert `network_policy() == NetworkPolicy::Allow` and `NetworkPolicy::Allow.is_denied()` is `false`.
- Call `allow_full_environment()` and assert `environment_policy() == EnvironmentPolicy::InheritAll`.
- Optionally confirm the transition `Isolated` → `AllowList` → `InheritAll` (e.g. adding allowlist entries after `allow_full_environment` has no effect).

This will exercise the positive/opt-in paths for networking and environment inheritance.

Suggested implementation:

```rust
#[test]
fn network_is_denied_by_default() {
    let profile = SandboxProfile::new();
    assert_eq!(profile.network_policy(), NetworkPolicy::Deny);
}

#[test]
fn network_can_be_allowed() {
    let profile = SandboxProfile::new().allow_networking();

    assert_eq!(profile.network_policy(), NetworkPolicy::Allow);
    assert!(!NetworkPolicy::Allow.is_denied());
}

#[test]
fn full_environment_can_be_inherited() {
    let profile = SandboxProfile::new().allow_full_environment();

    assert!(matches!(
        profile.environment_policy(),
        EnvironmentPolicy::InheritAll
    ));
}

#[test]
fn read_write_paths_are_recorded() {

```

If your `SandboxProfile` API exposes methods for adding to the environment allowlist (e.g. `allow_env_var`, `allow_environment_key`, etc.), consider adding an additional test to assert that calling that method *after* `allow_full_environment()` does not change the policy away from `EnvironmentPolicy::InheritAll`. The skeleton would be:

```rust
#[test]
fn environment_allowlist_has_no_effect_after_full_inherit() {
    let profile = SandboxProfile::new()
        .allow_full_environment()
        .allow_environment_key("SHOULD_BE_IGNORED");

    assert!(matches!(
        profile.environment_policy(),
        EnvironmentPolicy::InheritAll
    ));
}
```

You will need to replace `allow_environment_key` with the actual allowlist API used elsewhere in this test module to ensure the transition `Isolated → AllowList → InheritAll` behaves as expected and that post-`allow_full_environment` allowlist mutations have no effect.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:47-48` </location>
<code_context>
+    env::set_var("DROP_ME", "remove-me");
+}
+
+#[given("the sandbox allows only KEEP_ME to be inherited")]
+fn given_environment_allowlist(world: &RefCell<TestWorld>) {
+    let mut world = world.borrow_mut();
+    world.configure_env_reader();
</code_context>

<issue_to_address>
**suggestion (testing):** Behavioural tests only cover the allowlist environment policy; add scenarios for default isolation and full inheritance

Currently only the allowlist mode is exercised via the `KEEP_ME` / `DROP_ME` scenario. Please also add BDD scenarios for:

- **Default isolation**: no environment policy configured; the env-reading command should see neither `KEEP_ME` nor `DROP_ME`.
- **Full inheritance**: using `allow_full_environment()`; both `KEEP_ME` and `DROP_ME` should be visible inside the sandbox.

You can reuse the same step definitions with different Given/When combinations so that all environment policies are behaviourally covered.

Suggested implementation:

```rust
        .allow_executable(world.command.as_ref().unwrap().get_program())
        .allow_read_path(&world.allowed_file);
}

#[given("environment variables KEEP_ME and DROP_ME are set")]
fn given_environment_variables(_world: &RefCell<TestWorld>) {
    env::set_var("KEEP_ME", "present");
    env::set_var("DROP_ME", "remove-me");
}

#[given("the sandbox allows only KEEP_ME to be inherited")]
fn given_environment_allowlist(world: &RefCell<TestWorld>) {
    let mut world = world.borrow_mut();
    world.configure_env_reader();
    world.profile = world
        .profile
        .clone()
        .allow_environment_variable("KEEP_ME");
}

#[given("the sandbox uses the default environment isolation")]
fn given_environment_default_isolation(world: &RefCell<TestWorld>) {
    // Configure the env-reading command but do not enable any environment
    // inheritance policy; this exercises the default isolation behaviour.
    let mut world = world.borrow_mut();
    world.configure_env_reader();
    // Intentionally do not modify `world.profile` here.
}

#[given("the sandbox inherits the full environment")]
fn given_environment_full_inheritance(world: &RefCell<TestWorld>) {
    // Configure the env-reading command and enable full environment
    // inheritance so that all variables (KEEP_ME and DROP_ME) are visible.
    let mut world = world.borrow_mut();
    world.configure_env_reader();
    world.profile = world.profile.clone().allow_full_environment();
}

#[when("the sandbox launches the command")]
fn when_launch(world: &RefCell<TestWorld>) {
    world.borrow_mut().launch();
}

```

To fully exercise these new step definitions, add corresponding scenarios in the relevant `.feature` file, for example:
1. A scenario using:
   - `Given environment variables KEEP_ME and DROP_ME are set`
   - `And the sandbox uses the default environment isolation`
   - `When the sandbox launches the command`
   - `Then the environment inside the sandbox does not contain KEEP_ME or DROP_ME` (reusing your existing Then-steps for negative checks).
2. A scenario using:
   - `Given environment variables KEEP_ME and DROP_ME are set`
   - `And the sandbox inherits the full environment`
   - `When the sandbox launches the command`
   - `Then the environment inside the sandbox contains KEEP_ME and DROP_ME`.

If your existing Then-steps assert only the allowlist behaviour, you will need to add/adjust them so that they can assert both presence and absence of individual variables for these new scenarios.
</issue_to_address>

### Comment 5
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:113-114` </location>
<code_context>
+    );
+}
+
+#[then("environment markers are cleaned up")]
+fn then_environment_cleaned(_world: &RefCell<TestWorld>) {
+    env::remove_var("KEEP_ME");
+    env::remove_var("DROP_ME");
</code_context>

<issue_to_address>
**suggestion (testing):** Consider making environment cleanup more robust in behavioural tests

This step only runs if the scenario reaches it; if a step fails earlier, `KEEP_ME` and `DROP_ME` can leak into later tests. To avoid cross-test contamination, consider wrapping each scenario in an env-guard style helper/fixture that snapshots and restores `std::env` (or an `rstest` fixture with `Drop`-based cleanup) so cleanup is guaranteed regardless of step ordering or panics.

Suggested implementation:

```rust
#[then("environment markers are cleaned up")]
fn then_environment_cleaned(_world: &RefCell<TestWorld>) {
    // This step asserts that the environment has been cleaned up by the
    // scenario-level env guard / fixture, rather than performing cleanup itself.
    assert!(
        env::var("KEEP_ME").is_err(),
        "KEEP_ME was expected to be unset after scenario cleanup"
    );
    assert!(
        env::var("DROP_ME").is_err(),
        "DROP_ME was expected to be unset after scenario cleanup"
    );
}

```

To fully implement robust environment cleanup (so it happens even if steps fail or panic), you should also:

1. Introduce an environment guard that snapshots and restores the process environment:
   - Define a helper like:
     ```rust
     struct EnvGuard {
         snapshot: std::collections::HashMap<std::ffi::OsString, std::ffi::OsString>,
     }

     impl EnvGuard {
         fn new() -> Self {
             let snapshot = std::env::vars_os().collect();
             Self { snapshot }
         }
     }

     impl Drop for EnvGuard {
         fn drop(&mut self) {
             use std::collections::HashSet;
             use std::env;

             let before = &self.snapshot;
             let after: std::collections::HashMap<_, _> = env::vars_os().collect();

             // Remove any variables that were added after the snapshot.
             let before_keys: HashSet<_> = before.keys().cloned().collect();
             for (k, _) in &after {
                 if !before_keys.contains(k) {
                     env::remove_var(k);
                 }
             }

             // Restore original values.
             for (k, v) in before {
                 env::set_var(k, v);
             }
         }
     }
     ```
   - Place this near your test helpers (either in this file or a shared test module), and add the necessary imports (`HashMap`, `HashSet`, `OsString`, etc.) at the top of the file/module.

2. Tie the guard to the scenario/world lifecycle so it is always dropped:
   - Option A: Add an `EnvGuard` field to `TestWorld` and construct it in `TestWorld`'s constructor/`Default` implementation. That way, when the world is dropped at the end of a scenario, the guard’s `Drop` runs and restores the environment.
   - Option B: If your cucumber setup supports per-scenario fixtures, create the `EnvGuard` at the start of each scenario and keep it alive for the scenario’s duration (e.g. in a wrapper around `TestWorld` or a dedicated per-scenario struct).

3. Ensure that any steps that mutate `std::env` rely on this guard rather than local ad‑hoc cleanup, so environment isolation is guaranteed even when a step panics or the scenario aborts early.
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/tests/support/mod.rs:15` </location>
<code_context>
+use crate::process::Stdio;
+
+/// Shared state for behavioural sandbox tests.
+pub struct TestWorld {
+    pub profile: SandboxProfile,
+    pub command: Option<SandboxCommand>,
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for `EnvGuard` behaviour to ensure environment restoration is correct and future changes remain safe

Current behavioural tests only exercise `EnvGuard` indirectly via environment mutation. Because `EnvGuard::restore` uses `unsafe`, it would be safer to add focused tests that: (1) snapshot the environment, perform add/change/remove operations, drop the guard, and assert the environment is exactly restored; and (2) confirm that variables created during the guard’s lifetime are removed while pre-existing ones retain their original values. Placing these in a dedicated `env_guard_tests` module alongside the other unit tests would help ensure the restoration logic and `unsafe` usage remain correct over time.

Suggested implementation:

```rust
// Sandbox test support.
use crate::sandbox::{Sandbox, SandboxChild, SandboxCommand, SandboxOutput};
use crate::process::Stdio;

#[cfg(test)]
mod env_guard_tests {
    use std::env;

    use crate::sandbox::EnvGuard;

    /// Ensures that the environment is fully restored after a mix of
    /// additions, changes, and removals while an EnvGuard is active.
    #[test]
    fn restores_modified_and_removed_environment_variables() {
        const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
        const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
        const EPHEMERAL: &str = "WEAVER_ENV_GUARD_EPHEMERAL";

        // Establish a known baseline outside the guard.
        env::set_var(EXISTING, "original");
        env::set_var(ANOTHER, "keep");

        // Snapshot the environment.
        let guard = EnvGuard::new().expect("failed to create EnvGuard");

        // Mutate environment in various ways while the guard is active.
        env::set_var(EXISTING, "changed");      // modify existing
        env::remove_var(ANOTHER);               // remove existing
        env::set_var(EPHEMERAL, "ephemeral");   // add new

        // Dropping the guard must restore the environment to its original state.
        drop(guard);

        // Pre-existing variables must be restored to their original values.
        assert_eq!(env::var(EXISTING).as_deref(), Ok("original"));
        assert_eq!(env::var(ANOTHER).as_deref(), Ok("keep"));

        // Variables created during the guard's lifetime must not remain.
        assert!(env::var(EPHEMERAL).is_err());

        // Cleanup baseline variables for other tests.
        env::remove_var(EXISTING);
        env::remove_var(ANOTHER);
    }

    /// Ensures that variables created only during the guard's lifetime
    /// are removed, while pre-existing variables retain their value.
    #[test]
    fn removes_variables_created_during_guard_lifetime() {
        const PRE_EXISTING: &str = "WEAVER_ENV_GUARD_PRE_EXISTING";
        const CREATED: &str = "WEAVER_ENV_GUARD_CREATED";

        // Set up an initial variable before we create the guard.
        env::set_var(PRE_EXISTING, "value");

        let guard = EnvGuard::new().expect("failed to create EnvGuard");

        // Create a variable that should not survive after the guard is dropped.
        env::set_var(CREATED, "temporary");

        // Sanity-check that both variables are visible while the guard is active.
        assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));
        assert_eq!(env::var(CREATED).as_deref(), Ok("temporary"));

        // Dropping the guard should restore the environment snapshot.
        drop(guard);

        // The pre-existing variable must still have its original value.
        assert_eq!(env::var(PRE_EXISTING).as_deref(), Ok("value"));

        // The variable created during the guard's lifetime must be removed.
        assert!(env::var(CREATED).is_err());

        // Cleanup baseline variable.
        env::remove_var(PRE_EXISTING);
    }
}

```

These tests assume that:

1. `EnvGuard` is exported from `crate::sandbox` and is constructible via `EnvGuard::new() -> Result<EnvGuard, _>`.
2. Dropping `EnvGuard` restores the environment snapshot (i.e. `Drop` calls the `unsafe` `restore` implementation).

If the actual API differs, you should adjust the tests accordingly, for example:
- If the constructor is named differently (e.g. `EnvGuard::capture()`, `EnvGuard::snapshot()`, or a plain `EnvGuard::new()` without `Result`), update the creation call.
- If `EnvGuard` lives in a different module (e.g. `crate::process::EnvGuard`), fix the import path.
- If restoration is triggered by an explicit `restore()` rather than `Drop`, keep the tests’ logic the same but call that method instead of `drop(guard)`.
</issue_to_address>

### Comment 7
<location> `crates/weaver-sandbox/src/env_guard.rs:9` </location>
<code_context>
+
+/// Restores the parent process environment after `birdcage` strips variables.
+#[derive(Debug)]
+pub struct EnvGuard {
+    original: Vec<(OsString, OsString)>,
+}
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying `EnvGuard::restore` to work on in-memory snapshots using a temporary `HashSet` so you can avoid rebuilding a `BTreeSet` and remove the unnecessary `unsafe` blocks around environment mutation.

You can simplify `EnvGuard` and remove both the `BTreeSet` and the `unsafe` blocks by doing all diffs in memory on snapshots.

Two focused changes:

1. **Avoid rebuilding a `BTreeSet` every restore**

You don’t need a sorted set, and you don’t need to recompute it on every `restore`. Either compute a `HashSet` once in `capture`, or just build one locally from `self.original` inside `restore`.

For example, keep the struct as-is and simplify `restore`:

```rust
use std::collections::HashSet;
// ...

fn restore(&self) {
    // Snapshot current environment before mutating it.
    let current: Vec<(OsString, OsString)> = env::vars_os().collect();

    let expected_keys: HashSet<&OsString> =
        self.original.iter().map(|(k, _)| k).collect();

    // Remove vars that were introduced while the guard was active.
    for (key, _) in &current {
        if !expected_keys.contains(key) {
            env::remove_var(key);
        }
    }

    // Restore the original variables.
    for (key, value) in &self.original {
        env::set_var(key, value);
    }
}
```

That lets you:

- Drop `original_keys` entirely.
- Replace `BTreeSet` with a simpler `HashSet<&OsString>` created on the fly.
- Avoid extra allocations per key (no cloning of `OsString` just to test membership).

2. **Remove unnecessary `unsafe`**

Wrapping `env::remove_var`/`env::set_var` in `unsafe` doesn’t buy anything; they are safe functions and the real issue is mutating while iterating the environment. By snapshotting first (`current: Vec<_> = env::vars_os().collect()`), you never mutate during iteration, so all `unsafe` blocks can be removed without changing behavior.

With these adjustments, `EnvGuard` stays functionally identical but is simpler, purely safe, and avoids the extra data structure/extra passes.
</issue_to_address>

### Comment 8
<location> `crates/weaver-sandbox/src/lib.rs:1` </location>
<code_context>
+//! Sandboxing utilities for Weaver processes.
+//!
+//! The `weaver-sandbox` crate wraps the [`birdcage`] library with policy
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit tests that exercise the new weaver-sandbox crate’s functionality.

Add unit tests (e.g. in `src/tests/unit.rs`) that validate core behaviours such as profile defaults, environment policies, path canonicalisation and authorisation logic, and error cases (e.g. `ProgramNotAbsolute`, `MissingPath`, `MultiThreaded`). Add behavioural tests (e.g. in `src/tests/behaviour.rs` and the BDD feature you stubbed in `tests/features/sandbox.feature`) that launch real processes under the sandbox and verify the documented guarantees: absolute-path enforcement, whitelist enforcement, environment isolation, default network denial, and successful execution of whitelisted binaries.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural and unit tests are required.

</details>
</issue_to_address>

### Comment 9
<location> `crates/weaver-sandbox/src/env_guard.rs:35` </location>
<code_context>
+                // SAFETY: keys originate from the host OS and were previously
+                // present in the environment, so removal cannot violate
+                // invariants expected by `std::env`.
+                unsafe { env::remove_var(&key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used here even though `std::env::remove_var` is a safe API, which violates the guideline to avoid `unsafe` unless absolutely necessary.

The call to `env::remove_var` is already safe and does not require an `unsafe` block. Please remove the `unsafe` here (and below for `set_var`) unless there is a specific unsafe API being invoked that cannot be expressed safely.

If there is a subtle invariants-related reason this must be unsafe, it should be documented explicitly and ideally wrapped in a narrowly scoped helper so that other code can remain safe.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

### Comment 10
<location> `crates/weaver-sandbox/src/env_guard.rs:43` </location>
<code_context>
+            // SAFETY: keys and values were captured from the process
+            // environment before sandboxing mutated it, so restoring them
+            // preserves the prior state without introducing invalid data.
+            unsafe { env::set_var(key, value) };
+        }
+    }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is also used around `std::env::set_var`, which is a safe function and does not justify an unsafe block under the stated guidelines.

Similar to the `remove_var` call above, `env::set_var` is safe to call and does not require an `unsafe` block. Please drop the `unsafe` here unless you are actually calling an unsafe API, in which case the reason should be clearly justified and isolated.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

### Comment 11
<location> `crates/weaver-sandbox/src/profile.rs:92` </location>
<code_context>
+    /// Whitelists an environment variable for inheritance.
+    #[must_use]
+    pub fn allow_environment_variable(mut self, key: impl Into<String>) -> Self {
+        match &mut self.environment {
+            EnvironmentPolicy::Isolated => {
+                let mut allow = BTreeSet::new();
</code_context>

<issue_to_address>
**issue (review_instructions):** This `match` over `EnvironmentPolicy` has three branches but is implemented inline instead of being moved into a dedicated predicate/helper as required.

The guideline asks for conditionals with more than two branches to be moved into a predicate (or helper) function. Here you could, for example, add a method on `EnvironmentPolicy` to return an updated policy when allowing a new variable, and call that from `allow_environment_variable` instead of inlining the three-way `match`.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Move conditionals with >2 branches into a predicate function.

</details>
</issue_to_address>

### Comment 12
<location> `crates/weaver-sandbox/src/sandbox.rs:98` </location>
<code_context>
+
+        exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
+
+        match self.profile.environment_policy() {
+            EnvironmentPolicy::Isolated => {}
+            EnvironmentPolicy::AllowList(keys) => {
</code_context>

<issue_to_address>
**issue (review_instructions):** This three-branch `match` over `EnvironmentPolicy` is another multi-branch conditional that is not factored into a predicate/helper as requested.

To align with the instruction to move conditionals with more than two branches into a predicate function, consider extracting this `EnvironmentPolicy` handling into a helper (e.g. a method that returns a collection of environment-related `Exception`s) and calling that from here. That keeps the branching logic encapsulated and re-usable.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Move conditionals with >2 branches into a predicate function.

</details>
</issue_to_address>

…execution - Introduce caching for canonicalised path sets in SandboxProfile for read-only, read-write, and executable paths. - Update Sandbox to use cached canonicalised paths for validation and exceptions. - Add thread counter injection in Sandbox for testing and production use. - Enforce single-threaded process constraint when spawning sandboxed commands; return errors if multi-threaded or thread count unavailable. - Refactor EnvGuard to safely clear and restore environment variables using helper functions. - Add tests for multi-threaded rejection and environment guard restore on drop. - Improve error handling around path canonicalisation including support for nonexistent future paths. - Minor doc improvements and style cleanup related to sandbox usage and design. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

leynos · 2025-12-04T19:07:09Z

@coderabbitai Have the following now been resolved?

Please address the comments from this code review:

## Overall Comments
- The `canonicalise`/`rebuild_from_existing_ancestor` logic currently returns `Ok` for paths whose leaf component does not exist (e.g. `/definitely/missing/tool` becomes a canonical path even though it’s missing), which means `Sandbox::spawn` will never produce `SandboxError::MissingPath` for such cases; consider changing `rebuild_from_existing_ancestor` (or `canonical_program`) to error when the full path does not exist so the error variants and behaviour match the intent of the API and tests.
- In `Sandbox::collect_exceptions`, `ExecuteAndRead(program)` is always pushed even when `program` is already present in the canonicalised executable set, which is harmless but redundant; you could avoid the duplicate by either not re-adding `program` when it’s already authorised or by excluding it from the initial `executables` set.

## Individual Comments

### Comment 2
<location> `crates/weaver-sandbox/src/tests/sandbox_unit.rs:1` </location>
<code_context>
+//! Unit tests covering sandbox spawn preflight errors.
+
+use std::path::PathBuf;
</code_context>

<issue_to_address>
**suggestion (testing):** Missing coverage for the `MultiThreaded` and `ThreadCountUnavailable` sandbox preflight errors.

These variants are part of the safety contract for `Sandbox::spawn` and should be explicitly exercised. Please add tests that:

- Assert `SandboxError::MultiThreaded` when `thread_count()` reports more than one thread (e.g. by spawning a parked helper thread; you can gate this behind `#[cfg(target_os = "linux")]` if needed).
- Assert `SandboxError::ThreadCountUnavailable` when `thread_count()` fails, possibly by making `thread_count` injectable or adding a small test seam.

This helps ensure the sandbox continues to fail closed if `/proc` or threading behaviour changes.

Suggested implementation:

```rust
 //! Unit tests covering sandbox spawn preflight errors.

use std::{path::PathBuf, thread, time::Duration};

use crate::sandbox::{Sandbox, SandboxCommand};
use crate::{SandboxError, SandboxProfile};

```

```rust
fn sandbox_with_profile(profile: SandboxProfile) -> Sandbox {
    Sandbox::new(profile)
}

#[cfg(target_os = "linux")]
#[test]
fn rejects_spawn_when_multi_threaded() {
    // Ensure we have at least one extra thread in the process before attempting
    // to spawn a sandboxed command. The sandbox should detect this and fail
    // closed with `SandboxError::MultiThreaded`.
    let helper = thread::spawn(|| {
        // Park the helper thread so it stays alive for the duration of the test.
        thread::park();
    });

    // Give the helper thread a brief moment to start and register in `/proc`.
    // The exact timing is not critical; we just want to avoid flakiness.
    thread::sleep(Duration::from_millis(50));

    let profile = SandboxProfile::default();
    let sandbox = sandbox_with_profile(profile);

    // Use a benign program path; the error we care about is the preflight
    // multi-threaded check, not command resolution.
    let command = SandboxCommand::new(PathBuf::from("/bin/true"));

    let result = sandbox.spawn(&command);
    assert!(
        matches!(result, Err(SandboxError::MultiThreaded)),
        "expected MultiThreaded error when spawning sandbox in a multi-threaded process, got: {result:?}"
    );

    // Cleanly shut down the helper thread.
    helper.thread().unpark();
    helper.join().expect("helper thread should join cleanly");
}

#[test]
fn rejects_spawn_when_thread_count_unavailable() {
    // This test verifies that a failure to determine the thread count causes
    // `Sandbox::spawn` to fail closed with `SandboxError::ThreadCountUnavailable`.
    //
    // It relies on a test-only seam in the sandbox implementation that allows
    // overriding the thread-counting logic.
    use crate::sandbox::testing::with_thread_count_override;

    let profile = SandboxProfile::default();
    let sandbox = sandbox_with_profile(profile);
    let command = SandboxCommand::new(PathBuf::from("/bin/true"));

    let result = with_thread_count_override(|| {
        // Simulate a failure to determine the thread count.
        Err(std::io::Error::new(
            std::io::ErrorKind::Other,
            "simulated thread count failure",
        ))
        .into()
    }, || sandbox.spawn(&command));

    assert!(
        matches!(result, Err(SandboxError::ThreadCountUnavailable)),
        "expected ThreadCountUnavailable error when thread counting fails, got: {result:?}"
    );
}

#[test]
fn rejects_relative_program_paths() {

```

To make the `rejects_spawn_when_thread_count_unavailable` test compile and behave as intended, you will need to introduce a small test seam in the sandbox implementation:

1. In the sandbox module (likely `crates/weaver-sandbox/src/sandbox.rs` or similar), extract the logic that determines the current process thread count into a helper function, e.g.:
   ```rust
   fn thread_count() -> std::io::Result<usize> { /* existing logic */ }
   ```

2. Under `#[cfg(test)]`, add a mechanism to override this function for the duration of a closure, for example:
   ```rust
   pub mod testing {
       use std::cell::RefCell;
       use std::io;

       thread_local! {
           static OVERRIDE: RefCell<Option<Box<dyn Fn() -> io::Result<usize>>>> = RefCell::new(None);
       }

       pub fn with_thread_count_override<F, R>(
           override_impl: impl Fn() -> io::Result<usize> + 'static,
           f: F,
       ) -> R
       where
           F: FnOnce() -> R,
       {
           OVERRIDE.with(|slot| {
               *slot.borrow_mut() = Some(Box::new(override_impl));
               let result = f();
               *slot.borrow_mut() = None;
               result
           })
       }

       pub fn thread_count() -> io::Result<usize> {
           OVERRIDE.with(|slot| {
               if let Some(ref f) = *slot.borrow() {
                   f()
               } else {
                   super::thread_count()
               }
           })
       }
   }
   ```

3. Update the production `Sandbox::spawn` implementation to call `thread_count()` (or `testing::thread_count()` under `cfg(test)`) so that during tests, `with_thread_count_override` can force `thread_count()` to fail and surface `SandboxError::ThreadCountUnavailable`.

4. Ensure `SandboxCommand::new` and `SandboxProfile::default()` exist and are used consistently with the rest of the test file; if the existing tests construct commands/profiles differently, mirror that pattern in the two new tests instead of the placeholders above.
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/unit.rs:7-16` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
</code_context>

<issue_to_address>
**suggestion (testing):** The `canonicalises_nonexistent_child_when_parent_exists` test name suggests canonicalisation behaviour, but it only checks path recording on the profile.

The current body only checks that `SandboxProfile` records a path ending with `file.txt`, and doesn’t touch the `rebuild_from_existing_ancestor` / canonicalisation logic in `sandbox.rs`, so the name is misleading.

Either rename the test to match what it actually verifies, or adjust it to call the canonicalisation helper (e.g. via a small test-only wrapper) and assert on the rebuilt path so the name and behaviour align.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/tests/env_guard.rs:17-18` </location>
<code_context>
+        .expect("env mutex poisoned")
+}
+
+#[test]
+fn restores_modified_and_removed_environment_variables() {
+    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
+    const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
</code_context>

<issue_to_address>
**suggestion (testing):** EnvGuard tests don't verify that restoration also happens via Drop when `restore` is not called explicitly.

Right now both tests only cover the explicit `snapshot.restore()` path. Please add a test that relies on `Drop` instead (e.g., create an `EnvGuard` in a scope, mutate env vars as in the existing tests, then assert after the scope ends that the env is restored). This will ensure the `Drop`-based restoration path—how EnvGuard is typically used—remains correct as the implementation evolves.

Suggested implementation:

```rust
#[test]
fn restores_modified_and_removed_environment_variables() {
    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
    const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
    const EPHEMERAL: &str = "WEAVER_ENV_GUARD_EPHEMERAL";

    let _guard = lock_env();

    unsafe { env::set_var(EXISTING, "original") };
    unsafe { env::set_var(ANOTHER, "keep") };

    let snapshot = EnvGuard::capture();

    unsafe { env::set_var(EXISTING, "changed") };
    unsafe { env::remove_var(ANOTHER) };
    unsafe { env::set_var(EPHEMERAL, "ephemeral") };

    snapshot.restore();

    assert_eq!(unsafe { env::var(EXISTING).unwrap() }, "original");
    assert_eq!(unsafe { env::var(ANOTHER).unwrap() }, "keep");
    assert!(unsafe { env::var_os(EPHEMERAL) }.is_none());
}

#[test]
fn restores_environment_variables_on_drop() {
    const EXISTING: &str = "WEAVER_ENV_GUARD_EXISTING";
    const ANOTHER: &str = "WEAVER_ENV_GUARD_ANOTHER";
    const EPHEMERAL: &str = "WEAVER_ENV_GUARD_EPHEMERAL";

    let _guard = lock_env();

    // Establish a known baseline
    unsafe { env::set_var(EXISTING, "original") };
    unsafe { env::set_var(ANOTHER, "keep") };
    unsafe { env::remove_var(EPHEMERAL) };

    {
        let _snapshot = EnvGuard::capture();

        // Mutate the environment while the guard is in scope
        unsafe { env::set_var(EXISTING, "changed") };
        unsafe { env::remove_var(ANOTHER) };
        unsafe { env::set_var(EPHEMERAL, "ephemeral") };

        // No explicit restore() call here; rely on Drop.
    }

    // After the guard is dropped, the environment should be restored
    assert_eq!(unsafe { env::var(EXISTING).unwrap() }, "original");
    assert_eq!(unsafe { env::var(ANOTHER).unwrap() }, "keep");
    assert!(unsafe { env::var_os(EPHEMERAL) }.is_none());
}

```

If the rest of the file uses a different pattern for assertions or unsafe env access (e.g., helper functions or a custom macro), you may want to align the new test with those conventions. Also ensure `use std::env;` is present at the top of this file if it isn’t already.
</issue_to_address>

### Comment 5
<location> `docs/users-guide.md:145-148` </location>
<code_context>
+
+External tools launched by the daemon now run inside the `weaver-sandbox`
+wrapper around `birdcage` 0.8.1. Linux namespaces and `seccomp-bpf` filters are
+applied automatically, networking is disabled by default, and only a small set
+of standard library directories are readable to keep dynamically linked
+executables functioning. Commands must be provided as absolute paths and added
</code_context>

<issue_to_address>
**suggestion (typo):** Consider adjusting punctuation/structure to avoid a comma splice in this sentence.

The phrase "applied automatically, networking is disabled by default, and only a small set" links three clauses with commas and reads like a comma splice. Consider semicolons or splitting into two sentences, e.g. "applied automatically; networking is disabled by default; and only a small set of standard library directories are readable" or similar.

```suggestion
wrapper around `birdcage` 0.8.1. Linux namespaces and `seccomp-bpf` filters are
applied automatically; networking is disabled by default; and only a small set
of standard library directories are readable to keep dynamically linked
executables functioning. Commands must be provided as absolute paths and added
```
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/lib.rs:51` </location>
<code_context>
+pub use birdcage::process;
+pub use error::SandboxError;
+pub use profile::{EnvironmentPolicy, NetworkPolicy, SandboxProfile};
+pub use sandbox::{Sandbox, SandboxChild, SandboxCommand, SandboxOutput};
--- /dev/null
+++ b/crates/weaver-sandbox/src/env_guard.rs
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both unit and behavioural tests for the new `weaver-sandbox` crate; the crate currently exposes non-trivial sandbox behaviour without any exercising tests.

Implement unit tests for the core building blocks (`SandboxProfile`, `EnvironmentPolicy`, `NetworkPolicy`, `EnvGuard`, `runtime::thread_count`, error conversions, and the path canonicalisation helpers) and behavioural tests that exercise the end-to-end sandbox flow.

At a minimum:
- Add unit tests under `crates/weaver-sandbox/src/tests/` that cover:
  - Successful spawning of an authorised absolute executable.
  - Rejection of non-absolute program paths (`ProgramNotAbsolute`).
  - Rejection of executables not authorised in the profile (`ExecutableNotAuthorised`).
  - Correct derivation of `Exception`s from read-only/read-write/executable paths and environment/network policies.
  - `EnvGuard` restoring the environment after `birdcage` manipulation.
  - `thread_count` parsing, including malformed `/proc/self/status` data (using test doubles or helper functions where direct `/proc` manipulation is not feasible).
- Add behavioural/acceptance tests (for example in `crates/weaver-sandbox/tests/features/sandbox.feature` and the corresponding Rust glue) that validate the observable behaviour of the sandbox in a real or simulated run:
  - A happy-path command runs successfully inside the sandbox with expected stdout/stderr.
  - Networking is denied by default and allowed only when requested.
  - Environment is stripped by default and inherited correctly when whitelisted or when `InheritAll` is used.
  - Multi-threaded invocation returns `SandboxError::MultiThreaded` instead of panicking.

Ensure these tests run in CI and demonstrably cover both the behavioural and unit-level expectations for this new feature, in line with the project testing rules.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural and unit tests are required.

</details>
</issue_to_address>

### Comment 7
<location> `crates/weaver-sandbox/src/env_guard.rs:32` </location>
<code_context>
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
+                unsafe { env::remove_var(key) };
+            }
+        }
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used around `env::remove_var`/`env::set_var` even though these APIs are safe; this violates the guideline to avoid `unsafe` unless it is strictly necessary.

`std::env::remove_var` and `std::env::set_var` are safe functions and do not require an `unsafe` block. The comment suggests a project policy, but per the review instructions `unsafe` should be avoided unless absolutely necessary.

Please remove the `unsafe` blocks here (and around `env::set_var` below) or, if there is a concrete unsafety that cannot be expressed with safe code, encapsulate it in the narrowest possible helper with a clear justification tied to actual undefined-behaviour constraints.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

The fallback logic in the `canonicalise` function to rebuild a missing path from existing ancestors was removed. Now, if a path is missing, an error is always returned immediately without attempting to reconstruct the path from ancestors. This simplifies error handling and code complexity in sandbox path resolution. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

leynos · 2025-12-04T19:08:10Z

@coderabbitai Have the following now been resolved?

Please address the comments from this code review:

## Overall Comments
- In `Sandbox::collect_exceptions` you always push an `ExecuteAndRead` exception for `program` even if it is already present in the canonicalised executable set, which can be trivially de-duplicated by checking membership before pushing to avoid redundant exceptions.
- Both the behavioural test support (`tests/support/mod.rs`) and `env_guard` tests define their own `OnceLock<Mutex<()>>` for environment serialisation; consider centralising this environment lock in a single helper to avoid duplication and ensure all env-mutating tests share the same synchronisation primitive.

## Individual Comments

### Comment 1
<location> `crates/weaver-sandbox/src/sandbox.rs:80-105` </location>
<code_context>
+        let mut exceptions = Vec::new();
+        let read_only = canonicalised_set(self.profile.read_only_paths())?;
+        let read_write = canonicalised_set(self.profile.read_write_paths())?;
+        let executables = canonicalised_set(self.profile.executable_paths())?;
+
+        for path in read_only {
+            exceptions.push(Exception::Read(path));
+        }
+        for path in read_write {
+            exceptions.push(Exception::WriteAndRead(path));
+        }
+        for path in executables {
+            exceptions.push(Exception::ExecuteAndRead(path));
+        }
+
+        exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
+
+        exceptions.extend(self.profile.environment_policy().to_exceptions());
</code_context>

<issue_to_address>
**suggestion:** Avoid adding a duplicate `ExecuteAndRead` exception when the program is already whitelisted as executable.

In `collect_exceptions`, you convert all profile `executable_paths` into `ExecuteAndRead` exceptions, then always add another `ExecuteAndRead(program)`. If the canonicalised `program` is already in the profile, this duplicates the exception. To avoid this, only add the extra entry when it’s not already in the `executables` set:

```rust
let executables = canonicalised_set(self.profile.executable_paths())?;

for path in &executables {
    exceptions.push(Exception::ExecuteAndRead(path.clone()));
}

if !executables.contains(program) {
    exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
}
```

This preserves behavior while avoiding duplicate entries.

```suggestion
    fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
        let mut exceptions = Vec::new();
        let read_only = canonicalised_set(self.profile.read_only_paths())?;
        let read_write = canonicalised_set(self.profile.read_write_paths())?;
        let executables = canonicalised_set(self.profile.executable_paths())?;

        for path in read_only {
            exceptions.push(Exception::Read(path));
        }
        for path in read_write {
            exceptions.push(Exception::WriteAndRead(path));
        }
        for path in &executables {
            exceptions.push(Exception::ExecuteAndRead(path.clone()));
        }

        if !executables.contains(program) {
            exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
        }

        exceptions.extend(self.profile.environment_policy().to_exceptions());

        if matches!(self.profile.network_policy(), NetworkPolicy::Allow) {
            exceptions.push(Exception::Networking);
        }

        Ok(exceptions)
    }
```
</issue_to_address>

### Comment 2
<location> `crates/weaver-sandbox/src/tests/unit.rs:85-86` </location>
<code_context>
+}
+
+#[test]
+fn canonicalises_nonexistent_child_when_parent_exists() {
+    let base = tempfile::tempdir().expect("tempdir");
+    let target = base.path().join("future_dir").join("file.txt");
+
+    let profile = SandboxProfile::new().allow_read_write_path(&target);
+
+    let set = profile.read_write_paths();
+    assert!(
+        set.iter().any(|p| p.ends_with("file.txt")),
</code_context>

<issue_to_address>
**suggestion (testing):** This test name suggests canonicalisation behaviour, but it currently only asserts that the path is recorded.

Here the test exercises only that `SandboxProfile` records a path ending in `file.txt`; the actual non-existent-child canonicalisation happens in `sandbox.rs` during spawn/exception collection, not when adding paths to the profile.

To make this clearer and improve coverage, either:
- Rename the test to something like `records_future_child_paths_when_parent_exists`, or
- Add/extend a test that goes through the full sandbox flow: create a temp dir with an existing parent but no leaf file, add the path via `allow_read_write_path`, trigger `collect_exceptions` (e.g. by spawning a whitelisted no-op program), and assert that the resulting `Exception::WriteAndRead` path matches the expected canonicalised form.

```suggestion
#[test]
fn records_future_child_paths_when_parent_exists() {
```
</issue_to_address>

### Comment 3
<location> `crates/weaver-sandbox/src/tests/behaviour.rs:126-127` </location>
<code_context>
+    );
+}
+
+#[then("environment markers are cleaned up")]
+fn then_environment_cleaned(world: &RefCell<TestWorld>) {
+    world.borrow_mut().restore_env();
+    assert_ne!(
</code_context>

<issue_to_address>
**suggestion (testing):** The behaviour tests don't currently exercise the MultiThreaded or ThreadCountUnavailable error paths.

Since `Sandbox::spawn` explicitly guards against multi-threaded contexts and `/proc` parsing failures, it would be useful to add tests that cover those preflight paths. You could introduce a small abstraction (e.g., a test-only hook or trait) around `thread_count()` so tests can:

- Force a value > 1 and assert `SandboxError::MultiThreaded`.
- Force an `io::Error` and assert `SandboxError::ThreadCountUnavailable`.

These can be unit tests rather than BDD scenarios and would exercise the most security-sensitive checks.

Suggested implementation:

```rust
#[then("environment markers are cleaned up")]
fn then_environment_cleaned(world: &RefCell<TestWorld>) {
    world.borrow_mut().restore_env();
    assert_ne!(
        std::env::var_os("KEEP_ME"),
        Some(std::ffi::OsString::from("present")),
        "KEEP_ME still holds the scenario value after restoration"
    );
    assert_ne!(
        std::env::var_os("DROP_ME"),
        Some(std::ffi::OsString::from("remove-me")),
        "DROP_ME still holds the scenario value after restoration"
    );
}

#[cfg(test)]
mod thread_count_error_tests {
    use super::*;
    use crate::{Sandbox, SandboxError};

    /// Helper to construct a sandbox instance suitable for preflight testing.
    ///
    /// This assumes `Sandbox::builder()` exists; adjust to match the actual
    /// constructor API if needed.
    fn make_sandbox_for_thread_test() -> Sandbox {
        Sandbox::builder()
            .build()
            .expect("failed to build sandbox for thread-count tests")
    }

    /// When the injected thread counter reports more than one thread,
    /// we should fail fast with `SandboxError::MultiThreaded`.
    #[test]
    fn spawn_fails_when_thread_count_greater_than_one() {
        let sandbox = make_sandbox_for_thread_test();

        // `spawn_with_thread_counter` is a test-only hook that bypasses the
        // real `thread_count()` implementation.
        let result = sandbox.spawn_with_thread_counter(|| Ok(2));

        match result {
            Err(SandboxError::MultiThreaded(_)) => {}
            other => panic!("expected SandboxError::MultiThreaded, got {:?}", other),
        }
    }

    /// When the injected thread counter fails with an `io::Error`,
    /// we should fail fast with `SandboxError::ThreadCountUnavailable`.
    #[test]
    fn spawn_fails_when_thread_count_unavailable() {
        let sandbox = make_sandbox_for_thread_test();
        let io_err = std::io::Error::new(std::io::ErrorKind::Other, "synthetic /proc failure");

        let result = sandbox.spawn_with_thread_counter(|| Err(io_err));

        match result {
            Err(SandboxError::ThreadCountUnavailable(_)) => {}
            other => panic!(
                "expected SandboxError::ThreadCountUnavailable, got {:?}",
                other
            ),
        }
    }
}

```

To make these tests compile and exercise the intended preflight paths, the core sandbox code will need a small abstraction around `thread_count()`:

1. In the module where `Sandbox::spawn` is implemented:
   - Extract the existing thread-count logic into a function or trait, e.g.:
     ```rust
     pub trait ThreadCounter {
         fn thread_count(&self) -> std::io::Result<usize>;
     }

     pub struct ProcThreadCounter;

     impl ThreadCounter for ProcThreadCounter {
         fn thread_count(&self) -> std::io::Result<usize> {
             thread_count() // existing implementation
         }
     }
     ```
   - Update the internal implementation of `spawn` to use a generic thread counter, defaulting to the real implementation:
     ```rust
     impl Sandbox {
         fn spawn_internal<C, F>(&self, counter: C /* ... other args ... */) -> Result<..., SandboxError>
         where
             C: FnOnce() -> std::io::Result<usize>,
         {
             let threads = counter().map_err(SandboxError::ThreadCountUnavailable)?;
             if threads > 1 {
                 return Err(SandboxError::MultiThreaded(threads));
             }

             // existing spawn logic continues here...
         }

         pub fn spawn(/* existing signature */) -> Result<..., SandboxError> {
             self.spawn_internal(|| thread_count(), /* forward other args */)
         }

         /// Test-only hook used by `thread_count_error_tests`.
         #[cfg(test)]
         pub fn spawn_with_thread_counter<C>(&self, counter: C /* mirror spawn args as needed */)
             -> Result<..., SandboxError>
         where
             C: FnOnce() -> std::io::Result<usize>,
         {
             self.spawn_internal(counter /* forward other args */)
         }
     }
     ```
   - Ensure `SandboxError::MultiThreaded` and `SandboxError::ThreadCountUnavailable` variants are already defined and used here; if their payloads differ from `usize` or `io::Error`, adjust the tests’ pattern matches accordingly.

2. Align the tests with the real `Sandbox::builder` / `spawn` signatures:
   - If `Sandbox` is constructed differently, update `make_sandbox_for_thread_test` to use the correct constructor.
   - If `spawn` takes arguments (command, env, etc.), mirror those in `spawn_with_thread_counter` and pass simple dummy values from the tests that do not affect the preflight logic.

3. If the crate uses a different module path for `Sandbox`/`SandboxError`, adjust the `use crate::{Sandbox, SandboxError};` line to the correct paths.
</issue_to_address>

### Comment 4
<location> `crates/weaver-sandbox/src/sandbox.rs:138` </location>
<code_context>
+    }
+}
+
+fn rebuild_from_existing_ancestor(path: &Path) -> Result<PathBuf, SandboxError> {
+    let mut missing_components: Vec<PathBuf> = Vec::new();
+    let mut cursor = Some(path);
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying `rebuild_from_existing_ancestor` using `ancestors`/`strip_prefix` and replacing the `BTreeSet`-based path handling with straightforward `Vec<PathBuf>` operations to make the sandbox code easier to follow.

You can keep the current behavior and noticeably reduce complexity in two focused spots:

### 1. Simplify `rebuild_from_existing_ancestor`

You can avoid the manual parent-walking loop, the explicit `Vec` of components, and the `rev()`-fold by using `Path::ancestors()` plus `strip_prefix`:

```rust
fn rebuild_from_existing_ancestor(path: &Path) -> Result<PathBuf, SandboxError> {
    // Find the closest existing ancestor
    let existing = path
        .ancestors()
        .find(|p| p.exists())
        .ok_or_else(|| SandboxError::MissingPath {
            path: path.to_path_buf(),
        })?;

    // Canonicalize that ancestor
    let base = fs::canonicalize(existing).map_err(|source| SandboxError::CanonicalisationFailed {
        path: existing.to_path_buf(),
        source,
    })?;

    // Re-attach the missing tail (path relative to existing ancestor)
    let tail = path.strip_prefix(existing).unwrap_or_else(|_| Path::new(""));
    Ok(base.join(tail))
}
```

This preserves the semantics of “canonicalize the first existing ancestor and then re-append missing components,” but removes the reverse tracking and explicit loop state.

### 2. Avoid `BTreeSet` unless you specifically need ordering/dedup

If you don’t rely on set semantics (ordering, dedup) for correctness, you can drop the `BTreeSet` abstraction and just canonicalize into a `Vec<PathBuf>`. That makes the logic more obvious and avoids per-spawn set construction:

```rust
fn canonicalised_paths(paths: &[PathBuf]) -> Result<Vec<PathBuf>, SandboxError> {
    paths.iter().map(|p| canonicalise(p)).collect()
}
```

Usage:

```rust
fn ensure_program_whitelisted(&self, program: &Path) -> Result<(), SandboxError> {
    let authorised = canonicalised_paths(self.profile.executable_paths())?;
    if authorised.iter().any(|p| p == program) {
        return Ok(());
    }
    Err(SandboxError::ExecutableNotAuthorised {
        program: program.to_path_buf(),
    })
}

fn collect_exceptions(&self, program: &Path) -> Result<Vec<Exception>, SandboxError> {
    let mut exceptions = Vec::new();
    let read_only = canonicalised_paths(self.profile.read_only_paths())?;
    let read_write = canonicalised_paths(self.profile.read_write_paths())?;
    let executables = canonicalised_paths(self.profile.executable_paths())?;

    for path in read_only {
        exceptions.push(Exception::Read(path));
    }
    for path in read_write {
        exceptions.push(Exception::WriteAndRead(path));
    }
    for path in executables {
        exceptions.push(Exception::ExecuteAndRead(path));
    }

    exceptions.push(Exception::ExecuteAndRead(program.to_path_buf()));
    exceptions.extend(self.profile.environment_policy().to_exceptions());

    if matches!(self.profile.network_policy(), NetworkPolicy::Allow) {
        exceptions.push(Exception::Networking);
    }

    Ok(exceptions)
}
```

Behavior is preserved (duplicates in the profile still just translate to repeated exceptions, which `birdcage` can ignore or handle), but the per-spawn logic is simpler and more explicit. If you *do* need dedup, you can still dedup the `Vec` with a small helper instead of exposing `BTreeSet` in the control flow.
</issue_to_address>

### Comment 5
<location> `crates/weaver-sandbox/src/lib.rs:51` </location>
<code_context>
+pub use birdcage::process;
+pub use error::SandboxError;
+pub use profile::{EnvironmentPolicy, NetworkPolicy, SandboxProfile};
+pub use sandbox::{Sandbox, SandboxChild, SandboxCommand, SandboxOutput};
--- /dev/null
+++ b/crates/weaver-sandbox/src/env_guard.rs
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit tests for the new `weaver-sandbox` crate; the newly added test files are currently empty and do not exercise the sandbox behaviour or error paths.

Implement unit tests under `crates/weaver-sandbox/src/tests` that cover the key components (`SandboxProfile`, `Sandbox`, `EnvGuard`, `runtime::thread_count`, and `runtime::linux_runtime_roots`). Exercise at least:
- successful spawning of an authorised absolute executable
- rejection of non-absolute paths (`ProgramNotAbsolute`)
- rejection of non-whitelisted executables (`ExecutableNotAuthorised`)
- handling of missing and non-canonicalisable paths (`MissingPath`, `CanonicalisationFailed`)
- behaviour of environment policies (`Isolated`, `AllowList`, `InheritAll`) and `EnvGuard` restoring the parent environment
- `NetworkPolicy` toggling and how it affects collected `Exception`s
- single-threaded vs multi-threaded preflight (`MultiThreaded`, `ThreadCountUnavailable`).

In addition, implement behavioural tests (e.g. BDD-style scenarios using `rstest-bdd` and `sandbox.feature`) that drive the crate end-to-end via `Sandbox::spawn`, asserting observable behaviour (filesystem access, environment isolation, networking defaults, and the multi-threaded error behaviour). Ensure these tests are actually populated and executed rather than leaving the added test files empty.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 6
<location> `crates/weaver-sandbox/src/env_guard.rs:29` </location>
<code_context>
+        // Remove variables introduced while the guard was active.
+        for key in current.keys() {
+            if !expected_keys.contains(key) {
+                // Safety: project policy requires env mutation to be wrapped in
+                // `unsafe` until the std APIs settle for Rust 2024. We mutate
+                // only after snapshotting to avoid iterator invalidation.
</code_context>

<issue_to_address>
**issue (review_instructions):** `unsafe` is used around safe `std::env` functions without an apparent technical need, which conflicts with the "avoid unsafe unless absolutely necessary" instruction.

The `restore` method wraps `env::remove_var` and `env::set_var` in `unsafe` blocks:

```rust
unsafe { env::remove_var(key) };
...
unsafe { env::set_var(key, value) };
```

These standard library APIs are already safe to call, so the `unsafe` blocks do not appear technically required to uphold memory or thread safety. That makes this a direct use of `unsafe` without a demonstrated necessity, contrary to the review instruction.

Unless there is a concrete UB-related reason that is missing from the comment, please remove the `unsafe` blocks and call the safe functions directly (or otherwise justify and encapsulate the `unsafe` in the narrowest possible way).

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Avoid `unsafe` unless absolutely necessary.

</details>
</issue_to_address>

leynos · 2025-12-04T20:10:42Z

@sourcery-ai resolve

…aths Replaced the use of BTreeSet with Vec<PathBuf> for the canonicalised executable, read-only, and read-write path sets in SandboxProfile. This change involved updating related methods to use Vec and changing containment checks to iter().any(). Additionally, moved environment mutex locking logic from env_guard tests into a shared test support module to consolidate and reuse the environment locking mechanism across tests. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 9

♻️ Duplicate comments (3)

docs/weaver-design.md (1)
989-990: Add comma before "so" in result clause.

Lines 989–990 omit a comma before "so", which introduces a result clause. Add a comma for grammatical clarity and to separate the independent clause properly.

Apply this diff:
 `/usr/lib`, and their architecture-specific variants) are whitelisted for
-read-only access by default so dynamically linked binaries remain functional
+read-only access by default, so dynamically linked binaries remain functional
crates/weaver-sandbox/src/tests/env_guard.rs (1)

8-84: Keep EnvGuard tests; they now exercise both explicit and Drop-based restoration.

These tests lock the shared env mutex, mutate env under EnvGuard::capture(), and verify restoration via both restore() and Drop, including handling of ephemeral variables. This gives good confidence in EnvGuard’s RAII semantics.
crates/weaver-sandbox/src/tests/support/mod.rs (1)
32-34: Add SAFETY documentation for unsafe environment mutation.

The unsafe block lacks a safety comment explaining why this usage is correct. Per coding guidelines, all unsafe code must be documented clearly.

Apply this diff:
     fn set_var(&mut self, key: &'static str, value: &str) {
+        // SAFETY: Environment mutation is unsafe in Rust 2024. This is safe
+        // because the ENV_MUTEX (via `lock_env()`) serialises all env access
+        // across test threads, and `EnvGuard::restore` reverts changes on Drop.
         unsafe { std::env::set_var(key, value) };
     }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 474eb9d and 68bc7b4.

📒 Files selected for processing (12)

crates/weaver-sandbox/src/env_guard.rs (1 hunks)
crates/weaver-sandbox/src/profile.rs (1 hunks)
crates/weaver-sandbox/src/sandbox.rs (1 hunks)
crates/weaver-sandbox/src/tests/behaviour.rs (1 hunks)
crates/weaver-sandbox/src/tests/env_guard.rs (1 hunks)
crates/weaver-sandbox/src/tests/mod.rs (1 hunks)
crates/weaver-sandbox/src/tests/sandbox_unit.rs (1 hunks)
crates/weaver-sandbox/src/tests/support/env.rs (1 hunks)
crates/weaver-sandbox/src/tests/support/mod.rs (1 hunks)
crates/weaver-sandbox/src/tests/unit.rs (1 hunks)
docs/users-guide.md (1 hunks)
docs/weaver-design.md (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing, which execute cargo fmt --workspace -- --check, cargo clippy --workspace --all-targets --all-features -- -D warnings, and cargo test --workspace respectively.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality; run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated tests with #[rstest(...)] parameterised cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line function definitions in Rust where appropriate (e.g., `pub fn new(id: u64) -> Self { ...

Files:

crates/weaver-sandbox/src/tests/support/env.rs
crates/weaver-sandbox/src/tests/env_guard.rs
crates/weaver-sandbox/src/tests/unit.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/tests/sandbox_unit.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/tests/mod.rs
crates/weaver-sandbox/src/env_guard.rs
crates/weaver-sandbox/src/sandbox.rs
crates/weaver-sandbox/src/profile.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap() in tests.

.expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and...

Files:

crates/weaver-sandbox/src/tests/support/env.rs
crates/weaver-sandbox/src/tests/env_guard.rs
crates/weaver-sandbox/src/tests/unit.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/tests/sandbox_unit.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/tests/mod.rs
crates/weaver-sandbox/src/env_guard.rs
crates/weaver-sandbox/src/sandbox.rs
crates/weaver-sandbox/src/profile.rs

docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Documentation must use en-GB-oxendict ("-ize" / "-yse" / "-our") spelling and grammar, except for the "LICENSE" file which is left unchanged for community consistency.

docs/**/*.md: Use British English based on Oxford English Dictionary (en-GB-oxendict), including: -ize suffix (realize, organization), -lyse suffix (analyse, paralyse), -our suffix (colour, behaviour), -re suffix (calibre, centre), double 'l' (cancelled, counsellor), maintain 'e' (likeable, liveable), -ogue suffix (analogue, catalogue)
Use US spelling when used in an API, for example color
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns in documentation, for example: 'Lille Industries are expanding'
Write headings in sentence case using Markdown headings (#, ##, ###) in order without skipping levels
Follow markdownlint recommendations for Markdown consistency
Provide code blocks and lists using standard Markdown syntax, always providing a language identifier for fenced code blocks (use plaintext for non-code text)
Use - as the first level bullet and renumber lists when items change in documentation
Prefer inline links using [text](url) or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row in documentation
Expand any uncommon acronym on first use in documentation, for example, Continuous Integration (CI)
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use footnotes referenced with [^label] in documentation
Include Mermaid diagrams where they add clarity to documentation
When embedding figures in documentation, use ![alt text](path/to/image) and provide brief alt text describing the content
Add a short description before each Mermaid diagram so screen readers can understand it
Define roa...

Files:

docs/users-guide.md
docs/weaver-design.md

**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Validate Markdown files using make markdownlint.
Run make fmt after any documentation changes to format all Markdown files and fix table markup.
Validate Mermaid diagrams in Markdown files by running make nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Tables and headings in Markdown must not be wrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

docs/users-guide.md
docs/weaver-design.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Headings must not be wrapped.

Documents must start with a level 1 heading

Headings must correctly increase or decrease by no more than one level at a time

Use GitHub-flavoured Markdown style for footnotes and endnotes.

Numbered footnotes must be numbered by order of appearance in the document.

Files:

docs/users-guide.md
docs/weaver-design.md

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: leynos/weaver PR: 0
File: docs/roadmap.md:0-0
Timestamp: 2025-12-03T19:40:08.813Z
Learning: Implement `weaver-sandbox` crate using `birdcage` for focused scope, prioritising robust Linux support via namespaces and seccomp-bpf

🧬 Code graph analysis (6)

crates/weaver-sandbox/src/tests/env_guard.rs (3)

crates/weaver-sandbox/src/tests/support/env.rs (1)

lock_env (5-10)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

set_var (32-34)

crates/weaver-sandbox/src/env_guard.rs (1)

capture (31-35)

crates/weaver-sandbox/src/tests/sandbox_unit.rs (3)

crates/weaver-sandbox/src/profile.rs (1)

new (61-72)

crates/weaver-sandbox/src/sandbox.rs (2)

new (39-44)

with_thread_counter_for_tests (47-55)

crates/weaver-sandbox/src/runtime.rs (1)

thread_count (42-62)

crates/weaver-sandbox/src/tests/behaviour.rs (3)

crates/weaver-sandbox/src/profile.rs (1)

new (61-72)

crates/weaver-sandbox/src/sandbox.rs (1)

new (39-44)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (59-77)

crates/weaver-sandbox/src/env_guard.rs (1)

crates/weaver-sandbox/src/tests/support/mod.rs (4)

set_env_var (102-110)

set_var (32-34)

drop (38-43)

drop (138-140)

crates/weaver-sandbox/src/sandbox.rs (4)

crates/weaver-sandbox/src/runtime.rs (1)

thread_count (42-62)

crates/weaver-sandbox/src/profile.rs (2)

new (61-72)

network_policy (169-171)

crates/weaver-sandbox/src/env_guard.rs (1)

capture (31-35)

crates/weaverd/src/process/guard.rs (1)

paths (85-87)

crates/weaver-sandbox/src/profile.rs (2)

crates/weaver-sandbox/src/runtime.rs (1)

linux_runtime_roots (9-39)

crates/weaver-sandbox/src/sandbox.rs (2)

new (39-44)

canonicalised_set (140-147)

🪛 LanguageTool

docs/users-guide.md

[misspelling] ~149-~149: This word is normally spelled as one.
Context: ...wlist before launch; requests made from multi-threaded contexts return a MultiThreaded error...

(EN_COMPOUNDS_MULTI_THREADED)

🔍 Remote MCP Deepwiki

Summary of additional concrete facts relevant to reviewing PR #33

Workspace/dependency
- birdcage added as a workspace dependency (v0.8.1) and new workspace member crates/weaver-sandbox introduced.
Test execution / Makefile
- Makefile now runs two cargo test invocations: first -p weaver-sandbox with RUST_TEST_THREADS=1 (single-threaded), then workspace tests with --exclude weaver-sandbox (so only sandbox tests are single-threaded).
Platform focus / runtime helpers
- Implementation is Linux‑centric:
  - linux_runtime_roots() returns canonicalised runtime roots only on Linux (empty Vec on other OSes).
  - thread_count() parses /proc/self/status on Linux; on non-Linux it returns Ok(1). Preflight behavior differs off Linux.
Path canonicalisation / whitelisting behavior
- canonicalise() rebuilds non-existent child paths from the nearest existing ancestor (rebuild_from_existing_ancestor) so MissingPath is only emitted when no ancestor exists — enables whitelisting output paths.
EnvGuard / environment handling & tests
- EnvGuard captures parent environment and restores it on Drop; tests serialize env mutations with a global mutex pattern. Current repo state: multiple test modules define their own OnceLock<Mutex<()>> (reviewer-requested centralisation remains outstanding).
Public API surface added
- New crate exports and types: Sandbox, SandboxCommand/Child/Output aliases, SandboxProfile, EnvironmentPolicy, NetworkPolicy, SandboxError and related public methods. spawn() enforces single-thread preflight and canonicalises/validates executables.
Tests added
- Unit tests, rstest-bdd behavioural tests, feature scenarios and test support harness (TestWorld, EnvHandle, env lock) were added to cover canonicalisation, env_guard Drop behavior, multi-thread preflight errors, and behavioural scenarios.
Outstanding reviewer action items (concrete)
- Centralise the environment mutex used by tests (avoid duplicate OnceLock<Mutex<()>> across modules).
- Consider simplifying canonicalisation logic (replace BTreeSet use and complex rebuild logic with ancestors()/strip_prefix() unless set semantics justified).
- Remove or justify remaining unsafe usage around std::env::set_var/remove_var (these APIs are safe; unsafe must be justified).

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Sourcery review

🔇 Additional comments (7)

crates/weaver-sandbox/src/tests/mod.rs (1)

1-6: Keep centralised test module wiring as-is.

The tests module is succinct, documented, and cleanly organises behaviour, env_guard, support, and unit tests. No changes are required.

crates/weaver-sandbox/src/tests/unit.rs (1)

7-97: Retain these profile and policy unit tests; coverage is solid.

These tests exercise the key SandboxProfile behaviours (runtime roots, env allow-list semantics including “ignore after InheritAll”, network defaults/overrides, and path recording, including non-existent future paths). This aligns well with the public API and previous review feedback.

crates/weaver-sandbox/src/tests/support/env.rs (1)

1-10: Keep lock_env as the single shared environment mutex for tests.

This module centralises the OnceLock<Mutex<()>> used to serialise env mutations in tests, which matches the project’s testing policy and removes the earlier duplication across test modules.

crates/weaver-sandbox/src/tests/sandbox_unit.rs (1)

21-92: Retain these spawn preflight tests; they cover the important error variants.

The tests for relative paths, missing programmes, unwhitelisted executables, MultiThreaded, and ThreadCountUnavailable exercise the key safety guarantees of Sandbox::spawn and the injected thread-count seam. This is an appropriate level of coverage for the sandbox preflight behaviour.
crates/weaver-sandbox/src/env_guard.rs (1)
1-54: Keep EnvGuard’s snapshot-and-restore implementation; the unsafe surface is narrow and well-documented.

EnvGuard captures the environment as a simple Vec<(OsString, OsString)>, clears all current variables, then restores the snapshot, with all env mutations funnelled through small unsafe helpers as required by the project’s env policy. This matches the documented intent and the accompanying tests.
#!/bin/bash
# Run only the EnvGuard tests to double-check behaviour after any future edits.
cargo test -p weaver-sandbox env_guard
crates/weaver-sandbox/src/tests/support/mod.rs (1)

1-164: Test support scaffolding is well-structured.

The TestWorld fixture provides clean orchestration for behavioural tests with temporary directory isolation. The EnvHandle pattern correctly serialises environment mutations via mutex and restores on drop. The #[cfg(target_os = "linux")] guard on resolve_binary addresses platform concerns.

crates/weaver-sandbox/src/tests/behaviour.rs (1)

1-145: Behavioural test structure is solid.

The #![cfg(target_os = "linux")] module gate correctly restricts these tests to supported platforms. Step definitions are well-organised with clear assertions. The then_environment_cleaned step correctly uses assert_ne to verify scenario-specific values are removed without assuming the host environment lacks these variables.

…ucture Added Figure 5.1, a detailed Mermaid class diagram, to the weaver-design documentation. The diagram illustrates the sandbox crate structure and its integration points, including classes such as Sandbox, SandboxProfile, EnvironmentPolicy, NetworkPolicy, and related externals. This enhances understanding of the sandbox component design and relationships. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

… error handling - Introduce a helper method `canonicalised_paths` in `SandboxProfile` to reduce duplicated caching logic for path canonicalisation. - Change `allow_environment_variable` to avoid cloning large environment policies by moving instead. - Enhance `canonicalise` function to handle non-existent paths by rebuilding from nearest existing ancestor, allowing whitelisting of future outputs. - Simplify exception collection by removing redundant executable check. - Improve environment mutation safety in tests. - Refactor test code for clarity and reduce duplication. - Fix minor typo in documentation from "multi-threaded" to "multithreaded". Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

leynos · 2025-12-05T00:06:30Z

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

crates/weaver-sandbox/src/sandbox.rs

Comment on lines +145 to +192

fn canonicalise(path: &Path, require_exists: bool) -> Result<PathBuf, SandboxError> {
    match fs::canonicalize(path) {
        Ok(resolved) => Ok(resolved),
        Err(error) if error.kind() == std::io::ErrorKind::NotFound => {
            if require_exists {
                return Err(SandboxError::MissingPath {
                    path: path.to_path_buf(),
                });
            }

            // Rebuild the path from the closest existing ancestor so callers can
            // whitelist future outputs (e.g. files that will be created by the
            // sandboxed command) under an allowed directory.
            let mut components_to_append = Vec::new();
            let mut ancestor_opt = path;
            while let Some(parent) = ancestor_opt.parent() {
                components_to_append.push(ancestor_opt.file_name().map(PathBuf::from));
                ancestor_opt = parent;
                if ancestor_opt.exists() {
                    break;
                }
            }

            if !ancestor_opt.exists() {
                return Err(SandboxError::MissingPath {
                    path: path.to_path_buf(),
                });
            }

            let mut rebuilt = fs::canonicalize(ancestor_opt).map_err(|source| {
                SandboxError::CanonicalisationFailed {
                    path: ancestor_opt.to_path_buf(),
                    source,
                }
            })?;

            for component in components_to_append.into_iter().rev().flatten() {
                rebuilt.push(component);
            }

            Ok(rebuilt)
        }
        Err(source) => Err(SandboxError::CanonicalisationFailed {
            path: path.to_path_buf(),
            source,
        }),
    }
}

❌ New issue: Bumpy Road Ahead
canonicalise has 2 blocks with nested conditional logic. Any nesting of 2 or deeper is considered. Threshold is 2 blocks per function

Refactored the path canonicalization process to move the logic for rebuilding a path from its closest existing ancestor into a new helper function `rebuild_from_existing_ancestor`. This improves code clarity and consolidates error handling related to path existence and canonicalization. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e8ff79 and d2f0f3c.

📒 Files selected for processing (6)

crates/weaver-sandbox/src/profile.rs (1 hunks)
crates/weaver-sandbox/src/sandbox.rs (1 hunks)
crates/weaver-sandbox/src/tests/behaviour.rs (1 hunks)
crates/weaver-sandbox/src/tests/sandbox_unit.rs (1 hunks)
crates/weaver-sandbox/src/tests/support/mod.rs (1 hunks)
docs/users-guide.md (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing, which execute cargo fmt --workspace -- --check, cargo clippy --workspace --all-targets --all-features -- -D warnings, and cargo test --workspace respectively.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality; run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated tests with #[rstest(...)] parameterised cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line function definitions in Rust where appropriate (e.g., `pub fn new(id: u64) -> Self { ...

Files:

crates/weaver-sandbox/src/tests/sandbox_unit.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/sandbox.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/profile.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap() in tests.

.expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and...

Files:

crates/weaver-sandbox/src/tests/sandbox_unit.rs
crates/weaver-sandbox/src/tests/behaviour.rs
crates/weaver-sandbox/src/sandbox.rs
crates/weaver-sandbox/src/tests/support/mod.rs
crates/weaver-sandbox/src/profile.rs

docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Documentation must use en-GB-oxendict ("-ize" / "-yse" / "-our") spelling and grammar, except for the "LICENSE" file which is left unchanged for community consistency.

docs/**/*.md: Use British English based on Oxford English Dictionary (en-GB-oxendict), including: -ize suffix (realize, organization), -lyse suffix (analyse, paralyse), -our suffix (colour, behaviour), -re suffix (calibre, centre), double 'l' (cancelled, counsellor), maintain 'e' (likeable, liveable), -ogue suffix (analogue, catalogue)
Use US spelling when used in an API, for example color
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns in documentation, for example: 'Lille Industries are expanding'
Write headings in sentence case using Markdown headings (#, ##, ###) in order without skipping levels
Follow markdownlint recommendations for Markdown consistency
Provide code blocks and lists using standard Markdown syntax, always providing a language identifier for fenced code blocks (use plaintext for non-code text)
Use - as the first level bullet and renumber lists when items change in documentation
Prefer inline links using [text](url) or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row in documentation
Expand any uncommon acronym on first use in documentation, for example, Continuous Integration (CI)
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use footnotes referenced with [^label] in documentation
Include Mermaid diagrams where they add clarity to documentation
When embedding figures in documentation, use ![alt text](path/to/image) and provide brief alt text describing the content
Add a short description before each Mermaid diagram so screen readers can understand it
Define roa...

Files:

docs/users-guide.md

**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Validate Markdown files using make markdownlint.
Run make fmt after any documentation changes to format all Markdown files and fix table markup.
Validate Mermaid diagrams in Markdown files by running make nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Tables and headings in Markdown must not be wrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

docs/users-guide.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Headings must not be wrapped.

Documents must start with a level 1 heading

Headings must correctly increase or decrease by no more than one level at a time

Use GitHub-flavoured Markdown style for footnotes and endnotes.

Numbered footnotes must be numbered by order of appearance in the document.

Files:

docs/users-guide.md

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: leynos/weaver PR: 0
File: docs/roadmap.md:0-0
Timestamp: 2025-12-03T19:40:08.813Z
Learning: Implement `weaver-sandbox` crate using `birdcage` for focused scope, prioritising robust Linux support via namespaces and seccomp-bpf

🧬 Code graph analysis (3)

crates/weaver-sandbox/src/tests/sandbox_unit.rs (2)

crates/weaver-sandbox/src/sandbox.rs (2)

with_thread_counter_for_tests (47-55)

new (39-44)

crates/weaver-sandbox/src/runtime.rs (1)

thread_count (42-62)

crates/weaver-sandbox/src/tests/behaviour.rs (3)

crates/weaver-sandbox/src/sandbox.rs (1)

new (39-44)

crates/weaver-sandbox/src/profile.rs (1)

new (61-72)

crates/weaver-sandbox/src/tests/support/mod.rs (1)

new (61-79)

crates/weaver-sandbox/src/profile.rs (2)

crates/weaver-sandbox/src/runtime.rs (1)

linux_runtime_roots (9-39)

crates/weaver-sandbox/src/sandbox.rs (2)

new (39-44)

canonicalised_set (136-143)

🔍 Remote MCP Ref

Summary of additional facts relevant to code review

Workspace + dependency: PR adds crates/weaver-sandbox as a workspace member and adds birdcage = "0.8.1" to workspace dependencies (weaver-sandbox wraps birdcage).
Test/CI changes: Makefile now runs tests in two steps — first RUST_TEST_THREADS=1 cargo test -p weaver-sandbox (scoped single-threaded), then cargo test --exclude weaver-sandbox for the rest of workspace.
Tests added: unit suites, sandbox_unit, env_guard unit tests, and rstest-bdd behavioural tests plus features/sandbox.feature (covers whitelist, env inheritance/isolation, network defaults).
Env mutex / test hygiene: PR replaced once_cell Lazy with std::sync::OnceLock for static test mutexes; reviewers flagged duplicated env-lock definitions and requested centralizing a single shared env mutex across tests.
Env mutability safety / unsafe usage: EnvGuard wraps std::env set/remove in narrow helpers and PR includes docs asserting environment mutations must be inside unsafe blocks on Rust 2024 toolchains; reviewers requested removing unnecessary unsafe or providing concrete justification.
Path canonicalisation behavior: canonicalise() and rebuild-from-ancestor logic have been iterated — earlier commits added rebuilding from an existing ancestor to allow whitelisting non-existent child paths; later commits removed that fallback (always error on missing path); a refactor extracted rebuild_from_existing_ancestor into a helper. Reviewers requested simplification (ancestors/strip_prefix) and justification for using BTreeSet semantics.
Thread-count handling: Sandbox enforces single-threaded preflight via thread_count() (Linux: reads /proc/self/status); tests inject a thread-counter seam to deterministically exercise MultiThreaded and ThreadCountUnavailable errors.
Platform scope: Implementation and many tests are Linux-centric (linux_runtime_roots, /proc parsing, resolve_binary); docs note Linux defaults.

Sources

Pull request page and commit/comments for PR #33 (weaver-sandbox): #33

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Sourcery review
GitHub Check: build-test

🔇 Additional comments (7)

docs/users-guide.md (1)

142-153: Documentation accurately describes sandbox defaults.

The new section correctly documents the birdcage 0.8.1 wrapper, default policies (networking disabled, environment stripped), absolute path requirement, and multithreaded context handling. Punctuation issues flagged in past reviews have been addressed.

crates/weaver-sandbox/src/tests/sandbox_unit.rs (1)

1-89: Unit tests comprehensively cover sandbox preflight errors.

The test suite validates all error paths: relative programme paths, missing paths, unwhitelisted executables, multithreaded contexts, and thread count unavailability. The test-only thread counter injection via sandbox_with_forced_thread_count enables deterministic testing of runtime conditions.

crates/weaver-sandbox/src/tests/support/mod.rs (1)

1-166: Test support infrastructure is well-structured and properly addresses past concerns.

EnvHandle correctly coordinates the environment mutex and snapshot, with restoration occurring before the guard releases. The TestWorld fixture provides comprehensive orchestration for behavioural tests, including temporary directory isolation and helper methods aligned with BDD scenarios. resolve_binary is properly gated to Linux.

crates/weaver-sandbox/src/sandbox.rs (1)

1-184: Sandbox orchestration correctly implements preflight checks and path canonicalisation.

The implementation properly enforces single-threaded execution, validates programme whitelisting, and handles both existing and future paths through the require_exists parameter. The rebuild_from_existing_ancestor helper uses ancestors()/strip_prefix as suggested in past reviews, avoiding manual parent-walking loops.

crates/weaver-sandbox/src/tests/behaviour.rs (1)

1-154: Behavioural tests are properly structured and address all past review concerns.

The module is correctly gated to Linux, preventing runtime panics on non-Linux hosts. Borrow checker issues have been resolved by extracting field values before mutable borrows. The then_environment_cleaned step now verifies test-specific values were removed rather than assuming the host environment lacks those variables, making tests more robust.

crates/weaver-sandbox/src/profile.rs (2)

95-105: Environment variable whitelisting correctly avoids cloning allow-lists.

Using std::mem::take to move the current policy before calling with_allowed eliminates the unnecessary clone that would occur when growing allow-lists. This addresses the performance concern raised in past reviews.

180-207: Environment policy helpers properly encapsulate multi-branch conditionals.

EnvironmentPolicy::with_allowed encapsulates the three-branch logic for accumulating allowed environment variables, and to_exceptions maps policy variants to birdcage::Exception types. This follows the coding guideline to extract multi-branch conditionals into helper functions.

…:sync::OnceCell Replaced usage of std::sync::OnceLock with once_cell::sync::OnceCell for caching canonicalized paths in sandbox profile. Updated Cargo.toml and Cargo.lock to add the once_cell dependency. This change simplifies the caching logic by using get_or_try_init directly, improving code clarity and error handling. Additionally added error mapping in sandbox.rs to signal invariant break more precisely. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2f0f3c and 485ba1c.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

crates/weaver-sandbox/Cargo.toml (1 hunks)
crates/weaver-sandbox/src/profile.rs (1 hunks)
crates/weaver-sandbox/src/sandbox.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Run make check-fmt, make lint, and make test before committing, which execute cargo fmt --workspace -- --check, cargo clippy --workspace --all-targets --all-features -- -D warnings, and cargo test --workspace respectively.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality; run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared setup in Rust tests.
Replace duplicated tests with #[rstest(...)] parameterised cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line function definitions in Rust where appropriate (e.g., `pub fn new(id: u64) -> Self { ...

Files:

crates/weaver-sandbox/src/sandbox.rs
crates/weaver-sandbox/src/profile.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap() in tests.

.expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and...

Files:

crates/weaver-sandbox/src/sandbox.rs
crates/weaver-sandbox/src/profile.rs

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: leynos/weaver PR: 0
File: docs/roadmap.md:0-0
Timestamp: 2025-12-03T19:40:08.813Z
Learning: Implement `weaver-sandbox` crate using `birdcage` for focused scope, prioritising robust Linux support via namespaces and seccomp-bpf

🧬 Code graph analysis (1)

crates/weaver-sandbox/src/sandbox.rs (3)

crates/weaver-sandbox/src/runtime.rs (1)

thread_count (42-62)

crates/weaver-sandbox/src/profile.rs (2)

new (63-74)

network_policy (148-150)

crates/weaver-sandbox/src/env_guard.rs (2)

capture (31-35)

drop (51-53)

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test
GitHub Check: Sourcery review

🔇 Additional comments (22)

crates/weaver-sandbox/Cargo.toml (1)

8-8: Do not remove once_cell dependency—it is still actively used in the implementation.

The crate uses once_cell::sync::OnceCell in crates/weaver-sandbox/src/profile.rs, so the dependency cannot be removed. While std::sync::OnceLock is used in tests, the implementation relies on once_cell. Keep the dependency declaration as-is.

Likely an incorrect or invalid review comment.

crates/weaver-sandbox/src/sandbox.rs (11)

1-14: LGTM!

Module documentation present, imports are well-organised and necessary. The std::fmt import at line 13 could be grouped with the other std imports at lines 3-4 for consistency, but this is minor.

15-26: LGTM!

Type aliases provide useful abstraction over birdcage types, and the thread_counter field enables dependency injection for testing—a sound approach that avoids environment mutation in tests.

28-34: LGTM!

Manual Debug implementation correctly handles the non-Debug closure field with finish_non_exhaustive().

36-55: LGTM!

The test seam via with_thread_counter_for_tests enables deterministic testing without environment mutation. The #[must_use] annotation is appropriate.

57-79: LGTM!

The spawn method correctly sequences preflight checks, environment capture, exception collection, and sandbox activation. The explicit drop(env_guard) at line 77 documents the intent to restore the environment immediately after spawning, which aids readability.

81-90: LGTM!

Error handling is correct and the logic properly enforces birdcage's single-threaded requirement.

92-100: LGTM!

Whitelist check is correct. The iter().any() pattern is appropriate for Vec<PathBuf> membership testing.

102-125: LGTM!

The _program parameter is correctly unused since ensure_program_whitelisted already guarantees the programme is in the executable whitelist, and the loop over executables will include it. Exception collection is well-structured and delegates policy mapping to helper methods.

127-134: LGTM!

Correct validation: programmes must be absolute and must exist.

145-162: LGTM!

The canonicalise function correctly implements the dual semantics: programmes must exist (require_exists=true), whilst whitelist paths may reference future outputs under existing ancestors (require_exists=false). The TOCTTOU concern from earlier reviews has been addressed.

164-187: LGTM!

The refactor using Path::ancestors() and strip_prefix() is clean and correct. The comment at lines 177-179 appropriately documents the defensive error mapping for an invariant that should always hold.

crates/weaver-sandbox/src/profile.rs (10)

1-9: LGTM!

Module documentation is present and imports are appropriate.

10-20: LGTM!

Secure-by-default with Isolated as the default variant. BTreeSet provides ordered and deduplicated environment variable names.

22-30: LGTM!

Deny-by-default network policy is the secure choice. Copy derive is appropriate for this small enum.

32-48: LGTM!

The struct design with OnceCell caching for canonicalised paths avoids repeated filesystem operations. Documentation clearly explains the restrictive defaults.

50-121: LGTM!

Builder pattern is well-implemented with #[must_use] annotations. The std::mem::take at line 104 correctly avoids cloning the environment policy on each allow_environment_variable call.

123-150: LGTM!

The accessor methods correctly delegate to canonicalised_paths, which uses get_or_try_init as requested in prior reviews. The network_policy accessor correctly returns a copy since NetworkPolicy is Copy.

153-161: LGTM!

The get_or_try_init pattern correctly handles fallible initialization and caching in a single atomic operation, addressing prior review feedback.

163-175: LGTM!

The Default implementation correctly delegates to new(), and is_denied() is a concise predicate.

177-203: LGTM!

The with_allowed method correctly handles all three policy variants, and to_exceptions properly maps policies to birdcage::Exception types. The factoring of these methods addresses prior review feedback about extracting multi-branch conditionals.

6-6: The recommendation to replace once_cell::sync::OnceCell with std::sync::OnceLock is not feasible on stable Rust. The code uses get_or_try_init() (line 159), which is only available as an unstable nightly feature (once_cell_try) on std::sync::OnceLock. While OnceLock itself was stabilized in Rust 1.70, get_or_try_init remains nightly-only. The once_cell dependency is required to support stable Rust builds.

Likely an incorrect or invalid review comment.

- Replace explicit loop in `canonicalised_set` with iterator and collect for cleaner code. - Change error mapping in `rebuild_from_existing_ancestor` from `MissingPath` to `CanonicalisationFailed` to better reflect an internal invariant violation rather than a caller error. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

codescene-delta-analysis Bot approved these changes Dec 3, 2025

View reviewed changes

leynos marked this pull request as ready for review December 4, 2025 02:21

sourcery-ai Bot reviewed Dec 4, 2025

View reviewed changes

coderabbitai Bot requested changes Dec 4, 2025

View reviewed changes

Comment thread crates/weaver-sandbox/src/runtime.rs

Comment thread crates/weaver-sandbox/src/tests/support/mod.rs

codescene-delta-analysis Bot approved these changes Dec 4, 2025

View reviewed changes

coderabbitai Bot requested changes Dec 4, 2025

View reviewed changes

codescene-delta-analysis Bot approved these changes Dec 4, 2025

View reviewed changes

sourcery-ai Bot reviewed Dec 4, 2025

View reviewed changes

Comment thread crates/weaver-sandbox/src/env_guard.rs Outdated

sourcery-ai Bot reviewed Dec 4, 2025

View reviewed changes

codescene-delta-analysis Bot approved these changes Dec 4, 2025

View reviewed changes

sourcery-ai Bot reviewed Dec 4, 2025

View reviewed changes

coderabbitai Bot requested changes Dec 4, 2025

View reviewed changes

Comment thread crates/weaver-sandbox/src/tests/sandbox_unit.rs Outdated

Comment thread crates/weaver-sandbox/src/tests/support/mod.rs

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

coderabbitai Bot requested changes Dec 4, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

codescene-delta-analysis Bot approved these changes Dec 5, 2025

View reviewed changes

coderabbitai Bot requested changes Dec 5, 2025

View reviewed changes

Comment thread crates/weaver-sandbox/src/profile.rs

Comment thread crates/weaver-sandbox/src/sandbox.rs

codescene-delta-analysis Bot approved these changes Dec 5, 2025

View reviewed changes

coderabbitai Bot requested changes Dec 5, 2025

View reviewed changes

Comment thread crates/weaver-sandbox/src/sandbox.rs

codescene-delta-analysis Bot approved these changes Dec 5, 2025

View reviewed changes

coderabbitai Bot approved these changes Dec 5, 2025

View reviewed changes

leynos merged commit 2f359eb into main Dec 5, 2025
4 checks passed

leynos deleted the terragon/feature/cli-daemon-foundation-5gg587 branch December 5, 2025 18:10

Conversation

leynos commented Dec 3, 2025 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why

How to test

Migration / Notes

Risk & rollback

Summary by Sourcery

Uh oh!

coderabbitai Bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Other AI code review bot(s) detected

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

sourcery-ai Bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Sandbox::spawn sandboxed command execution

Class diagram for the new weaver-sandbox crate

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leynos commented Dec 4, 2025

Uh oh!

coderabbitai Bot commented Dec 4, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Context Summary for PR Review

Project Context

Birdcage Library Context

Security Approach

Key Implementation Details

Testing Strategy

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Additional context relevant to reviewing PR #33

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leynos commented Dec 4, 2025

Uh oh!

coderabbitai Bot commented Dec 4, 2025

Issue 1: Non-existent path whitelisting ✅ RESOLVED

Issue 2: Global RUST_TEST_THREADS=1 ✅ RESOLVED

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

leynos commented Dec 3, 2025 •

edited by sourcery-ai Bot

Loading

coderabbitai Bot commented Dec 3, 2025 •

edited

Loading

sourcery-ai Bot commented Dec 3, 2025 •

edited

Loading

leynos commented Dec 4, 2025 •

edited

Loading