Skip to content

fix: Harden sandbox isolation and update allowed status handling#398

Merged
yacosta738 merged 5 commits into
mainfrom
runtime-sandbox
Apr 3, 2026
Merged

fix: Harden sandbox isolation and update allowed status handling#398
yacosta738 merged 5 commits into
mainfrom
runtime-sandbox

Conversation

@yacosta738
Copy link
Copy Markdown
Contributor

This pull request enhances sandbox security configuration and auditing in the agent runtime. It introduces a new require field to the sandbox config, allowing deployments to enforce the presence of a real OS-level sandbox and fail fast if unavailable. The PR also improves audit logging for shell command execution, including which sandbox backend was used, and strengthens tests for these changes.

Sandbox configuration and enforcement:

  • Added a require boolean field to SandboxConfig (default: false). When set to true, the agent will refuse to start if no real OS-level sandbox (e.g., Firejail, Landlock, Docker) is available, instead of silently falling back to the NoopSandbox. This is enforced in create_sandbox and detect_best_sandbox. [1] [2] [3] [4] [5]

  • Updated all code paths and tests to handle the new error-returning behavior of create_sandbox, ensuring proper propagation and test coverage for both required and non-required modes. [1] [2] [3] [4] [5] [6] [7]

Audit logging improvements:

  • Extended CommandExecutionLog and audit logging to include the sandbox_backend used for each shell command execution. This information is now recorded in audit logs and verified by new tests. [1] [2] [3] [4] [5]

  • Added detailed audit event logging for both shell and browser tool executions, including risk level, approval status, and environment details. [1] [2] [3]

These changes make sandboxing requirements explicit and auditable, improving the agent's security posture and compliance capabilities.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enforced OS sandboxing for shell commands, including environment sanitization and sandbox wrapping
    • New fail-closed sandbox option (require) to reject startup/actions when no real backend is available
    • Optional preflight health-check for computer-use sidecars that can block actions when required
    • Audit events now record the sandbox backend used for better traceability
  • Documentation

    • Added Runtime Sandbox Isolation guides (EN/ES) with configuration and operational guidance

Walkthrough

This PR wires an injected OS sandbox into tool construction and shell execution, makes sandbox detection failable via a new security.sandbox.require flag, propagates sandbox metadata into audits, and adds lazy browser sidecar health verification that can gate computer-use actions.

Changes

Cohort / File(s) Summary
Config
clients/agent-runtime/src/config/schema.rs
Add SandboxConfig.require: bool with #[serde(default)] and tests for default/roundtrip behavior.
Sandbox detection & exports
clients/agent-runtime/src/security/detect.rs, clients/agent-runtime/src/security/mod.rs
Make create_sandbox(...) -> anyhow::Result<Arc<dyn Sandbox>>; add require-aware fail-closed semantics; re-export CommandExecutionLog.
Bootstrap / wiring
clients/agent-runtime/src/bootstrap/mod.rs
Create sandbox at startup (create_sandbox(&config.security)?) and pass sandbox into tool registry construction.
Tool factories
clients/agent-runtime/src/tools/mod.rs
Thread sandbox: Arc<dyn Sandbox> through default_tools* and all_tools* signatures and propagate into tool constructors; wire browser sidecar_verification_required from config.
Shell tool
clients/agent-runtime/src/tools/shell.rs
Inject Arc<dyn Sandbox> into ShellTool; sanitize env, call sandbox.wrap_command() before spawn, handle wrap failures (error ToolResult), populate ToolResult.structured with risk_level and sandbox_backend, warn on mutating commands when using NoopSandbox.
Browser tool
clients/agent-runtime/src/tools/browser.rs
Add sidecar_verification_required flag, cached sidecar_health, lazy /v1/health preflight (auth, timeout, JSON checks), include computer_use.sidecar_health in ToolResult.structured, and gate failures by require.
Audit & Agent
clients/agent-runtime/src/security/audit.rs, clients/agent-runtime/src/agent/agent.rs
Add CommandExecutionLog.sandbox_backend: Option<String>; new Agent helpers log_shell_audit_event and log_browser_security_event; shell audit logging can cause ApprovalRequired when audit logging fails under strict mode.
Tests
clients/agent-runtime/tests/*, unit tests in tools/security modules
Update tests to construct/pass NoopSandbox, add tests for wrap_command invocation/failure, sandbox metadata in audits, and browser sidecar health-check behaviors.
Docs & specs
clients/web/apps/docs/.../runtime-sandbox-isolation.md, openspec/.../runtime-sandbox-hardening/*, openspec/specs/sandbox-hardening/spec.md
Add operator guide, spec, design, tasks, exploration, and verification docs describing sandbox fail-closed semantics, shell wrapping contract, audit propagation, and browser sidecar health-checking.
Misc tests updates
clients/agent-runtime/tests/mcp_*_*.rs
Pass explicit NoopSandbox into tools::all_tools(...) in e2e/regression tests.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent
    participant ShellTool as ShellTool
    participant Sandbox as Sandbox
    participant Audit as AuditLogger
    participant Proc as Process

    Agent->>ShellTool: execute(command)
    ShellTool->>ShellTool: validate via SecurityPolicy
    ShellTool->>ShellTool: sanitize environment
    ShellTool->>Sandbox: wrap_command(&mut cmd)
    alt wrap_command succeeds
        Sandbox-->>ShellTool: wrapped command
        ShellTool->>Proc: spawn & await (timeout)
        Proc-->>ShellTool: exit status, stdout/stderr
        ShellTool->>Audit: log_shell_audit_event(command, approved, risk_level, sandbox_backend)
        Audit-->>ShellTool: recorded
        ShellTool-->>Agent: ToolResult { success, structured }
    else wrap_command fails
        Sandbox-->>ShellTool: error
        ShellTool->>Audit: log_shell_audit_event(..., sandbox_backend)
        ShellTool-->>Agent: ToolResult { success: false, error (sandbox) }
    end
Loading
sequenceDiagram
    participant Agent as Agent
    participant BrowserTool as BrowserTool
    participant Sidecar as Sidecar
    participant Audit as AuditLogger

    Agent->>BrowserTool: execute_computer_use_action(action)
    alt first action & verification required
        BrowserTool->>BrowserTool: derive health URL
        BrowserTool->>Sidecar: GET /v1/health (bearer auth, timeout)
        alt sidecar responds healthy
            Sidecar-->>BrowserTool: {status:"healthy", isolation}
            BrowserTool->>BrowserTool: cache SidecarHealthStatus
            BrowserTool->>Audit: log_browser_security_event(action, sidecar_health)
        else sidecar fails
            alt require=true
                BrowserTool-->>Agent: error (reject action)
            else require=false
                BrowserTool->>BrowserTool: warn, cache failure
                BrowserTool->>Audit: log_browser_security_event(action, sidecar_health=None)
            end
        end
    else use cached or skip check
    end
    BrowserTool->>Sidecar: POST action
    Sidecar-->>BrowserTool: response
    BrowserTool-->>Agent: ToolResult { structured: { computer_use: { action, sidecar_health } } }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • yuniel-acosta
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title is concise and uses Conventional Commit prefix (fix) with a clear, specific description of the sandbox hardening and audit changes.
Description check ✅ Passed Description comprehensively covers objectives, technical changes, audit improvements, and sandbox configuration enhancements with proper context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch runtime-sandbox

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

✅ Contributor Report

User: @yacosta738
Status: Passed (12/13 metrics passed)

Metric Description Value Threshold Status
PR Merge Rate PRs merged vs closed 89% >= 30%
Repo Quality Repos with ≥100 stars 0 >= 0
Positive Reactions Positive reactions received 10 >= 1
Negative Reactions Negative reactions received 0 <= 5
Account Age GitHub account age 3079 days >= 30 days
Activity Consistency Regular activity over time 108% >= 0%
Issue Engagement Issues with community engagement 0 >= 0
Code Reviews Code reviews given to others 503 >= 0
Merger Diversity Unique maintainers who merged PRs 2 >= 0
Repo History Merge Rate Merge rate in this repo 91% >= 0%
Repo History Min PRs Previous PRs in this repo 193 >= 0
Profile Completeness Profile richness (bio, followers) 90 >= 0
Suspicious Patterns Spam-like activity detection 1 N/A

Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-04-03 to 2026-04-03

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 3, 2026

Deploying corvus with  Cloudflare Pages  Cloudflare Pages

Latest commit: 47a2835
Status: ✅  Deploy successful!
Preview URL: https://fac2b55f.corvus-42x.pages.dev
Branch Preview URL: https://runtime-sandbox.corvus-42x.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
clients/agent-runtime/src/security/detect.rs (1)

117-145: ⚠️ Potential issue | 🟠 Major

Probe Bubblewrap in Linux auto-detection too.

create_sandbox() accepts an explicit Bubblewrap backend on Linux above, but detect_best_sandbox() never probes it in the Linux path. A Linux host where Bubblewrap is the only available backend will incorrectly fall back to NoopSandbox or fail startup in require mode.

🛠️ Minimal fix
 #[cfg(target_os = "linux")]
 {
     // Try Landlock first (native, no dependencies)
     #[cfg(feature = "sandbox-landlock")]
     {
         if let Ok(sandbox) = super::landlock::LandlockSandbox::probe() {
             tracing::info!("Landlock sandbox enabled (Linux kernel 5.13+)");
             return Ok(Arc::new(sandbox));
         }
     }

     // Try Firejail second (user-space tool)
     if let Ok(sandbox) = super::firejail::FirejailSandbox::probe() {
         tracing::info!("Firejail sandbox enabled");
         return Ok(Arc::new(sandbox));
     }
+
+    #[cfg(feature = "sandbox-bubblewrap")]
+    {
+        if let Ok(sandbox) = super::bubblewrap::BubblewrapSandbox::probe() {
+            tracing::info!("Bubblewrap sandbox enabled");
+            return Ok(Arc::new(sandbox));
+        }
+    }
 }

Based on learnings "Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/security/detect.rs` around lines 117 - 145, The
Linux detection path in detect_best_sandbox() doesn't probe Bubblewrap, so add
probing for super::bubblewrap::BubblewrapSandbox::probe() inside the
#[cfg(target_os = "linux")] block (similar to how Landlock and Firejail are
probed) so that detect_best_sandbox() can return an Arc::new(bubblewrap) when
available; ensure the probe block is gated by #[cfg(feature =
"sandbox-bubblewrap")] to match create_sandbox() and keep behavior consistent
with require-mode/security expectations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 608-623: The audit write currently swallows failures in strict
mode by only changing log level inside the execute_tool_call path where
logger.log_command_event(CommandExecutionLog { .. }) is called; change the
logging helper(s) to return Result<(), AuditError> (or appropriate error type)
instead of unit, update the call sites in execute_tool_call to propagate errors
(use ? or map_err) so that when self.audit_strict is true the function returns
early on audit write failure, and mirror the same change for the other audit
call sites (the similar block around record_code_session_result) so strict mode
consistently fails the overall operation instead of only lowering the log level.
- Around line 586-617: The command string is recorded verbatim into
CommandExecutionLog via logger.log_command_event, which can leak secrets; before
building the CommandExecutionLog, sanitize/redact the `command` value (the local
variable `command`) using the same redaction logic used elsewhere in this module
(or add a helper like `redact_command`/`sanitize_shell_command`) to strip inline
secrets (passwords, tokens, auth headers, environment substitutions, or long
hex/ssh-like tokens) and then pass the sanitized value into CommandExecutionLog
so logger.log_command_event never receives raw sensitive payloads.

In `@clients/agent-runtime/src/bootstrap/mod.rs`:
- Around line 227-233: The sandbox is created too late; move the call to
crate::security::create_sandbox(&config.security) so it runs before
init_memory_and_observer_with_overrides(...) and create_runtime(...) to allow
security.sandbox.require = true to fail fast. Update the sequence in
bootstrap/mod.rs so sandbox is created first, then initialize memory/observer
via init_memory_and_observer_with_overrides, then call create_runtime, and
finally pass the created sandbox into tools::all_tools_with_runtime(...) (ensure
the variable names remain sandbox and are Arc/cloned as needed).

In `@clients/agent-runtime/src/security/audit.rs`:
- Around line 251-252: The call currently clones entry.sandbox_backend before
passing it to with_security; since entry is owned by log_command_event(), change
the code to move the backend into with_security instead of cloning it (i.e.,
pass entry.sandbox_backend directly to with_security), ensuring you no longer
call clone() on sandbox_backend in the chain that also uses with_result; update
any subsequent uses of entry after the move accordingly or reorder operations so
the move is valid.

In `@clients/agent-runtime/src/tools/browser.rs`:
- Around line 353-389: verify_computer_use_sidecar has a TOCTOU race: it
releases sidecar_health before calling fetch_computer_use_sidecar_health so
concurrent callers can all trigger the fetch; replace this ad-hoc caching with a
true once-init primitive (tokio::sync::OnceCell) for the sidecar health, or use
OnceCell/get_or_init semantics so the first caller runs
fetch_computer_use_sidecar_health and others await that result; update the
sidecar_health field and adjust verify_computer_use_sidecar to call
OnceCell::get_or_try_init (or equivalent) and reference
fetch_computer_use_sidecar_health and SidecarHealthStatus accordingly to ensure
only one health request is performed under concurrency.

In `@clients/agent-runtime/src/tools/shell.rs`:
- Around line 703-729: Replace the manual unsafe env manipulation and manual
restoration for CORVUS_TEST_SECRET with the module's EnvGuard RAII pattern and
mark the test with #[tokio::test(flavor = "current_thread")]; specifically,
remove the unsafe std::env::set_var / remove_var and the previous-value restore
logic around CORVUS_TEST_SECRET, create an EnvGuard instance at the start of the
test and use it to set CORVUS_TEST_SECRET to "super-secret" (so it will be
restored on drop), keep the rest of the test (EnvInspectSandbox, ShellTool::new,
execute call, and assertions) unchanged, and ensure the test function has the
current_thread tokio flavor to serialize env-dependent tests.

In `@clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md`:
- Around line 1-3: This file is missing the required docs frontmatter before the
H1 "Runtime Sandbox Isolation"; add a YAML frontmatter block (--- ... ---)
immediately at the top of the file containing the minimal required fields
(title, description, and any required metadata per the docs pipeline) so the
page builds and is indexed, then keep the existing "Runtime Sandbox Isolation"
heading and content unchanged.

In `@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md`:
- Around line 95-97: The rollback description incorrectly states that setting
sandbox.require = false and sandbox.backend = None restores pre-change behavior;
in reality backend = None still routes through NoopSandbox and leaves the new
mutating-command warning path active (so only reverting the PR removes
warnings). Update the prose in this section to state that a config-only change
(sandbox.require = false and sandbox.backend = None) disables enforcement but
does not remove the new warning behavior, and clarify that full removal of the
warnings requires reverting the PR (undoing the ShellTool field and its call) or
a code rollback; reference sandbox.require, sandbox.backend, NoopSandbox,
ShellTool, and the mutating-command warning path in the text so operators know
exactly what each option does.

In
`@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md`:
- Line 65: The scenario title "**R2-S3: Require mode with auto finds nothing
fails**" is awkward; update that header to a clearer phrasing such as "Require
mode with auto-detection finds no backend" (or similar) so the intent is
immediately readable; locate and replace the exact header string in the spec
file (the title line currently: R2-S3: Require mode with auto finds nothing
fails) with the new wording.

In
`@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md`:
- Line 1: Replace the second-level heading "## Verification Report" with a
top-level heading "# Verification Report" and ensure you add a blank line both
before and after every heading and table in this document to resolve
MD041/MD022/MD058; specifically update the "## Verification Report" heading and
any other headings/tables mentioned (occurrences around lines referenced: 8-9,
92-93, 104-105, 135) so each heading is preceded and followed by a single blank
line and tables have the required blank-line padding to make the markdownlint
checks pass.
- Around line 76-78: Update the compliance status for R4-S2 and R4-S3 in
verify-report.md to "Partial" until runtime propagation is validated: change the
entries corresponding to tests audit_log_command_event_writes_structured_entry
and audit_log_command_event_records_real_sandbox_backend to reflect partial
compliance because the current tests in
clients/agent-runtime/src/security/audit.rs only inject values into
CommandExecutionLog (proving serialization) rather than asserting that
NoopSandbox or the real sandbox backend name is emitted via the actual
shell/browser execution path; add a short note referencing CommandExecutionLog
and the need for an end-to-end runtime propagation test to flip these back to
compliant.

---

Outside diff comments:
In `@clients/agent-runtime/src/security/detect.rs`:
- Around line 117-145: The Linux detection path in detect_best_sandbox() doesn't
probe Bubblewrap, so add probing for
super::bubblewrap::BubblewrapSandbox::probe() inside the #[cfg(target_os =
"linux")] block (similar to how Landlock and Firejail are probed) so that
detect_best_sandbox() can return an Arc::new(bubblewrap) when available; ensure
the probe block is gated by #[cfg(feature = "sandbox-bubblewrap")] to match
create_sandbox() and keep behavior consistent with require-mode/security
expectations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: aae5ee38-5750-4b09-ad34-d325e5d653bc

📥 Commits

Reviewing files that changed from the base of the PR and between cc54da3 and aa7c7dc.

📒 Files selected for processing (20)
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/exploration.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/state.yaml
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/specs/sandbox-hardening/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: sonar
  • GitHub Check: pr-checks
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (8)
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/state.yaml
  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/security/audit.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/exploration.md
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/browser.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md
  • clients/agent-runtime/src/tools/mod.rs
clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
clients/agent-runtime/src/{security,gateway,tools}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks

Files:

  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:

  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
**/*.{md,mdx}

⚙️ CodeRabbit configuration file

**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

Files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/exploration.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md
clients/agent-runtime/src/tools/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Tool trait in src/tools/ with strict parameter schema, validate and sanitize all inputs, and return structured ToolResult without panics in runtime path

Files:

  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
🧠 Learnings (12)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/security/audit.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/exploration.md
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/browser.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md
  • clients/agent-runtime/src/tools/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/security/audit.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/exploration.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/browser.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md
  • clients/agent-runtime/src/tools/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/browser.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md
  • clients/agent-runtime/src/tools/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/security/mod.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/browser.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md
  • clients/agent-runtime/src/tools/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Do not add heavy dependencies for minor convenience; justify new crate additions

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Preserve release-size profile assumptions in `Cargo.toml` and avoid adding heavy dependencies unless clearly justified

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/tools/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Applied to files:

  • clients/agent-runtime/tests/mcp_native_regression.rs
  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

  • clients/agent-runtime/tests/mcp_runtime_e2e.rs
  • clients/agent-runtime/src/tools/browser.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow

Applied to files:

  • clients/agent-runtime/src/security/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/tools/shell.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Include threat/risk notes and rollback strategy for security, runtime, and gateway changes; add or update tests for boundary checks and failure modes

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/exploration.md
  • openspec/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
🪛 GitHub Actions: Docs Quality
clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md

[error] 1-1: Documentation metadata validation failed: missing frontmatter block.

🪛 LanguageTool
openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/tasks.md

[style] ~34-~34: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...= false, return Err(contradiction). Whenrequire = false`, preserve existing No...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

openspec/specs/sandbox-hardening/spec.md

[grammar] ~65-~65: Use a hyphen to join words.
Context: ...backend R2-S3: Require mode with auto finds nothing fails - Given `sandbox.r...

(QB_NEW_EN_HYPHEN)

openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md

[grammar] ~65-~65: Use a hyphen to join words.
Context: ...backend R2-S3: Require mode with auto finds nothing fails - Given `sandbox.r...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.22.0)
openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)


[warning] 8-8: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 9-9: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


[warning] 92-92: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 93-93: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


[warning] 104-104: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 105-105: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


[warning] 135-135: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md

[warning] 90-90: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (12)
clients/agent-runtime/src/tools/browser.rs (5)

57-67: LGTM!

Clean state modeling for sidecar health caching. Using tokio::sync::Mutex is appropriate for the async context.


391-396: LGTM!

Sensible timeout capping at 5s for health checks. Creating a new reqwest::Client per call is fine for this infrequent operation.


850-856: Good: Fail-closed enforcement wired correctly.

When sidecar_verification_required=true, failed health checks return Err here, blocking the action. This aligns with the spec's R5-S3 scenario.


2327-2355: LGTM!

Clean test infrastructure using ephemeral ports. The fire-and-forget tokio::spawn is fine for short-lived tests.


2624-2724: Good test coverage for sidecar verification scenarios.

Tests cover R5-S1 (healthy with isolation info), R5-S2 (graceful failure), R5-S3 (required mode rejection), and R5-S4 (missing endpoint). The low timeouts (50-200ms) keep tests fast.

openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md (1)

1-200: Well-structured security specification.

Requirements R1-R6 map cleanly to the implementation. Scenarios are testable and the acceptance criteria provide clear validation gates.

clients/agent-runtime/src/tools/mod.rs (4)

92-110: LGTM!

Clean threading of sandbox through the factory. Only ShellTool receives it—file tools don't execute shell commands and correctly skip sandbox dependency.


112-147: LGTM!

Explicit field mapping for ComputerUseConfig aids maintainability. Builder pattern for sidecar_verification_required is clean.


368-373: Good: Single sandbox.require governs both shell sandbox and sidecar verification.

This matches spec R5-S3—operators who mandate sandbox isolation also expect sidecar verification. The semantic coupling is intentional.


401-407: LGTM!

Clean test helper. NoopSandbox is appropriate for unit tests—real sandboxing would add unnecessary complexity.

openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/design.md (2)

9-40: Well-documented design decisions.

Each choice includes alternatives considered and rationale. The trade-off documentation (e.g., wrap_command for ALL executions vs. risk-based filtering) will help future maintainers.


209-243: LGTM!

Testing strategy covers both unit and integration layers with concrete examples. The MockSandbox pattern aligns with the actual test_sandbox() implementation.

Comment thread clients/agent-runtime/src/agent/agent.rs Outdated
Comment on lines +608 to +623
if let Err(error) = logger.log_command_event(CommandExecutionLog {
channel: "agent",
command,
risk_level,
approved,
allowed: result.structured.is_some(),
success: result.success,
duration_ms,
sandbox_backend,
}) {
if self.audit_strict {
tracing::error!("Failed to write shell command audit event: {error}");
} else {
tracing::warn!("Failed to write shell command audit event: {error}");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

audit_strict is only changing log level here.

When these writes fail, the helpers still return normally in strict mode. That lets shell/browser actions complete without the required audit trail, which is inconsistent with record_code_session_result() further down this file. Please make the helpers return a Result and propagate strict failures out of execute_tool_call().

Also applies to: 667-673

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/agent/agent.rs` around lines 608 - 623, The audit
write currently swallows failures in strict mode by only changing log level
inside the execute_tool_call path where
logger.log_command_event(CommandExecutionLog { .. }) is called; change the
logging helper(s) to return Result<(), AuditError> (or appropriate error type)
instead of unit, update the call sites in execute_tool_call to propagate errors
(use ? or map_err) so that when self.audit_strict is true the function returns
early on audit write failure, and mirror the same change for the other audit
call sites (the similar block around record_code_session_result) so strict mode
consistently fails the overall operation instead of only lowering the log level.

Comment on lines +227 to +233
let sandbox = crate::security::create_sandbox(&config.security)?;

let tools = tools::all_tools_with_runtime(
Arc::new(config.clone()),
&security,
Arc::clone(&runtime),
sandbox,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail before initializing runtime state.

create_sandbox() still runs after init_memory_and_observer_with_overrides() and create_runtime(), so security.sandbox.require = true can do avoidable setup work before startup aborts. Move sandbox creation ahead of the memory/runtime initialization so the new require mode is actually fail-fast.

Suggested reordering
     ) -> anyhow::Result<Self> {
         let profile = AgentProfile::from_config(config)?;
+        let sandbox = crate::security::create_sandbox(&config.security)?;
         let (memory, observer) = init_memory_and_observer_with_overrides(
             config,
             profile,
             memory_override,
             observer_override,
@@
-        let sandbox = crate::security::create_sandbox(&config.security)?;
-
         let tools = tools::all_tools_with_runtime(
             Arc::new(config.clone()),
             &security,
             Arc::clone(&runtime),
             sandbox,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/bootstrap/mod.rs` around lines 227 - 233, The
sandbox is created too late; move the call to
crate::security::create_sandbox(&config.security) so it runs before
init_memory_and_observer_with_overrides(...) and create_runtime(...) to allow
security.sandbox.require = true to fail fast. Update the sequence in
bootstrap/mod.rs so sandbox is created first, then initialize memory/observer
via init_memory_and_observer_with_overrides, then call create_runtime, and
finally pass the created sandbox into tools::all_tools_with_runtime(...) (ensure
the variable names remain sandbox and are Arc/cloned as needed).

Comment thread clients/agent-runtime/src/security/audit.rs Outdated
Comment thread clients/agent-runtime/src/tools/browser.rs
Comment thread openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md Outdated
Comment thread openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md Outdated
Comment thread openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md Outdated
- Propagate audit write errors in strict mode instead of swallowing them
- Redact shell commands before writing to audit log
- Move sandbox creation before memory/observer init for fail-fast
- Remove unnecessary clone of sandbox_backend in audit path
- Replace TOCTOU sidecar health cache with tokio::sync::OnceCell
- Use EnvGuard RAII pattern in shell env sanitization test
- Add Bubblewrap probe to Linux auto-detection path
- Correct rollback description in proposal
- Clarify spec scenario title and verify-report compliance status
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (2)
openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md (1)

138-139: ⚠️ Potential issue | 🟡 Minor

Add a blank line after the “### Verdict” heading.

Line 138 should be followed by one blank line before Line 139 to satisfy markdownlint heading spacing.

Suggested patch
 ### Verdict
+
 PASS WITH WARNINGS

As per coding guidelines, “Verify technical accuracy and that docs stay aligned with code changes.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md`
around lines 138 - 139, Add a single blank line immediately after the "###
Verdict" heading in the verify-report.md content so the heading is followed by
one empty line (i.e., change the block that currently reads "### Verdict"
followed immediately by "PASS WITH WARNINGS" to have one blank line between
them).
clients/agent-runtime/src/agent/agent.rs (1)

535-544: ⚠️ Potential issue | 🔴 Critical

Don't turn a completed shell command into a retriable tool failure.

log_shell_audit_event() runs after tool.execute() returns, so in strict mode this branch can report success: false even though the shell command already ran. That gives the model/user a false retry signal for non-idempotent shell operations. Please surface this as a terminal audit failure path, not a normal tool-execution failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/agent/agent.rs` around lines 535 - 544, The code
currently converts a post-execution audit logging error from
log_shell_audit_event into a retriable tool failure by returning
ToolExecutionResult with success: false and DispatchAction::Execute; instead,
treat audit logging failures as terminal audit failures so the already-run shell
command is not marked retriable. Update the branch handling
log_shell_audit_event errors inside the shell call path (where call.name ==
"shell") to return a terminal/audit-failure action (e.g.,
DispatchAction::TerminalFailure or the project’s terminal failure variant) and
preserve the tool's successful execution state (success: true and appropriate
output/message indicating audit logging failed), so the dispatcher will stop
retrying the tool but surface the audit error as terminal. Ensure you reference
the same symbols: log_shell_audit_event, ToolExecutionResult, call.name,
call.tool_call_id, and crate::agent::dispatcher::DispatchAction when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 605-625: The code is using result.structured.is_some() as a proxy
for "allowed", which is incorrect; update the ShellTool (the tool that builds
runtime command results) to carry explicit fields like allowed: bool and
risk_level: Option<String> (or risk_level: &str) in the result type, populate
those when approval/validation succeeds or fails, and then change the logger
call (logger.log_command_event / CommandExecutionLog construction) to use
result.allowed and result.risk_level.unwrap_or("unknown") instead of
result.structured.is_some() and the current risk_level extraction; ensure any
callers that create CommandExecutionLog use the new explicit fields so
approved-but-failed commands log allowed=true with the correct risk level.

In `@clients/agent-runtime/src/tools/browser.rs`:
- Around line 404-407: The error path currently injects the raw endpoint (which
may include user:pass) into logs and ToolResult. Parse the endpoint (e.g., using
url::Url) before any formatting and either strip userinfo (set username/password
to empty) or reject endpoints containing credentials during validation; update
the with_context formatting around request.send() in the health-call block and
any assignment to sidecar_health.endpoint (the code that
formats/returns/serializes sidecar_health around lines using
sidecar_health.endpoint) to use the sanitized URL (or return an error on
credentials) so no raw credentials are ever logged or returned.
- Around line 421-428: The health-response handling currently normalizes missing
status/isolation into permissive defaults; change it to fail closed by treating
any non-"healthy" status or missing/empty isolation as an error: in the function
that parses the health payload (the code that reads payload.get("status") and
payload.get("isolation")), replace the unwrap_or defaults with explicit
validation — if status is absent or status != "healthy" return an Err (or mark
the response unverified) and if isolation is absent/empty (e.g., equals {} after
cloning) likewise return an Err; update the caller to propagate this failure so
actions requiring security.sandbox.require=true are blocked, and add regression
tests exercising 200 responses with {} and {"status":"unhealthy"} to ensure they
are rejected.
- Around line 345-350: The computer_use_health_url function currently calls
computer_use_endpoint_url() and then uses url.set_path("/v1/health"), which
discards any configured base path; change the logic to preserve the endpoint's
base path by inspecting the path returned from computer_use_endpoint_url(): if
the path ends_with "/v1/actions" replace only that suffix with "/v1/health"
(keeping existing leading path segments), otherwise either accept known
alternative shapes or return an error validating unsupported path shapes; ensure
you still clear query/fragment and return the modified reqwest::Url from
computer_use_health_url, referencing computer_use_health_url and
computer_use_endpoint_url in your changes.

In `@clients/agent-runtime/src/tools/shell.rs`:
- Around line 191-195: The warning currently logs the raw shell command
(variable command) in the tracing::warn call inside the
should_warn_for_noop_sandbox branch; remove or redact that sensitive value so
secrets/tokens aren't emitted. Update the tracing::warn invocation used with
should_warn_for_noop_sandbox(self.security.as_ref(), self.sandbox.as_ref(),
command) to omit the command field (or replace it with a non-sensitive
placeholder such as "<redacted>" or command length/summary), ensuring only
non-secret context is logged and leaving the rest of the message unchanged.

In `@clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md`:
- Around line 42-47: The Linux backend autodetection order in the guide is out
of sync with the runtime: update the list for backend = "auto" so Linux reads
"landlock → bubblewrap → firejail → docker" (preserve macOS as "bubblewrap →
docker" and the fallback behavior when require = false -> none); mirror this
exact change in the corresponding translated/ES runtime-sandbox-isolation guide
so both docs match the detection code.
- Around line 122-127: Update the docs to clarify that the audited shell command
field is redacted/sanitized: change the "- command" bullet to something like "-
command (redacted/sanitized)" and ensure the Elasticsearch translation/schema
for the audit sink (including the security.sandbox_backend field) reflects that
only a sanitized command is stored; update any references to "command" in the
runtime-sandbox-isolation guide and the ES translation to explicitly state
redaction to keep docs aligned with the runtime change that strips sensitive
command text before the audit sink.

In `@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md`:
- Around line 71-81: Update the affected-areas table to mark
src/security/audit.rs as Modified and describe the change: that a
sandbox_backend field was added to the audit payload and that audit wiring now
includes sandbox_backend in command-event logging and tests (e.g., references to
sandbox_backend in AuditPayload and command-event logging/test helpers). Ensure
the Description briefly notes schema change (added sandbox_backend) and
logging/test blast radius so readers are aware this file's audit schema and
logging behavior were modified.

In
`@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md`:
- Line 8: The Markdown heading level jumps from H1 to H3 for the section titled
"Completeness"; update that heading ("### Completeness") to a proper H2 ("##
Completeness") so the document maintains a correct heading hierarchy and passes
lint checks, then verify related headings in verify-report.md to ensure
consistent levels throughout.

---

Duplicate comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 535-544: The code currently converts a post-execution audit
logging error from log_shell_audit_event into a retriable tool failure by
returning ToolExecutionResult with success: false and DispatchAction::Execute;
instead, treat audit logging failures as terminal audit failures so the
already-run shell command is not marked retriable. Update the branch handling
log_shell_audit_event errors inside the shell call path (where call.name ==
"shell") to return a terminal/audit-failure action (e.g.,
DispatchAction::TerminalFailure or the project’s terminal failure variant) and
preserve the tool's successful execution state (success: true and appropriate
output/message indicating audit logging failed), so the dispatcher will stop
retrying the tool but surface the audit error as terminal. Ensure you reference
the same symbols: log_shell_audit_event, ToolExecutionResult, call.name,
call.tool_call_id, and crate::agent::dispatcher::DispatchAction when making the
change.

In
`@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md`:
- Around line 138-139: Add a single blank line immediately after the "###
Verdict" heading in the verify-report.md content so the heading is followed by
one empty line (i.e., change the block that currently reads "### Verdict"
followed immediately by "PASS WITH WARNINGS" to have one blank line between
them).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: af4fc779-729e-4c6a-8b4a-c32288c1ba94

📥 Commits

Reviewing files that changed from the base of the PR and between aa7c7dc and 2f290bd.

📒 Files selected for processing (12)
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/web/apps/docs/astro.config.mjs
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pr-checks
  • GitHub Check: sonar
  • GitHub Check: submit-gradle
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (8)
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • clients/web/apps/docs/astro.config.mjs
  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/shell.rs
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
clients/agent-runtime/src/{security,gateway,tools}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks

Files:

  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:

  • clients/agent-runtime/src/security/audit.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
**/*.{md,mdx}

⚙️ CodeRabbit configuration file

**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

Files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
clients/agent-runtime/src/tools/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Tool trait in src/tools/ with strict parameter schema, validate and sanitize all inputs, and return structured ToolResult without panics in runtime path

Files:

  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Applied to files:

  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/shell.rs
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
  • clients/agent-runtime/src/security/detect.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path

Applied to files:

  • clients/agent-runtime/src/bootstrap/mod.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/security/detect.rs
  • clients/agent-runtime/src/tools/browser.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks

Applied to files:

  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/security/audit.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/shell.rs
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
  • clients/agent-runtime/src/security/detect.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow

Applied to files:

  • clients/agent-runtime/src/bootstrap/mod.rs
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/shell.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

  • clients/agent-runtime/src/security/audit.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/security/detect.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
  • clients/agent-runtime/src/tools/shell.rs
  • clients/agent-runtime/src/tools/browser.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Include threat/risk notes and rollback strategy for security, runtime, and gateway changes; add or update tests for boundary checks and failure modes

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/specs/sandbox-hardening/spec.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements

Applied to files:

  • clients/agent-runtime/src/agent/agent.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Applied to files:

  • clients/agent-runtime/src/tools/shell.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

  • clients/agent-runtime/src/tools/browser.rs
🪛 LanguageTool
clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md

[grammar] ~11-~11: Corrige la mayúscula.
Context: ...n docType: guide --- # Aislamiento del Sandbox del Runtime Corvus usa **dos capas de ...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_UPPERCASE)


[grammar] ~11-~11: Corrige la mayúscula.
Context: ...uide --- # Aislamiento del Sandbox del Runtime Corvus usa dos capas de seguridad pa...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_UPPERCASE)


[style] ~15-~15: En sentido figurado es preferible usar otras expresiones.
Context: ... disparada por usuarios: 1. Política a nivel de aplicación mediante SecurityPolicy ...

(A_NIVEL_DE)


[grammar] ~16-~16: Agrega un signo de puntuación.
Context: ...nte SecurityPolicy - allowlists de comandos - restricciones de rutas - límite...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~17-~17: Agrega un signo de puntuación.
Context: ...lists de comandos - restricciones de rutas - límites de tasa - clasificación...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~18-~18: Agrega un signo de puntuación.
Context: ... restricciones de rutas - límites de tasa - clasificación de riesgo y compuert...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~19-~19: Agrega un signo de puntuación.
Context: ...clasificación de riesgo y compuertas de aprobación 2. **Sandboxing a nivel de sistema oper...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[style] ~20-~20: En sentido figurado es preferible usar otras expresiones.
Context: ...ompuertas de aprobación 2. Sandboxing a nivel de sistema operativo mediante los backen...

(A_NIVEL_DE)


[grammar] ~20-~20: Aquí puede haber un error.
Context: ...iante los backends Sandbox del runtime - landlock - firejail - bubblewrap - docker Las dos capas importan. La capa de polít...

(QB_NEW_ES)


[grammar] ~55-~55: Cambia la palabra o signo.
Context: ... → error al iniciar - backend = "auto" sin backend encontrado → error al inicia...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_SPACE)


[grammar] ~56-~56: Cambia la palabra o signo.
Context: ... → error al iniciar - backend = "none" → error al iniciar - enabled = false →...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_SPACE)


[grammar] ~57-~57: Cambia la palabra o signo.
Context: ...→ error al iniciar -enabled = false` → error al iniciar Úsalo en despliegues...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_SPACE)


[grammar] ~67-~67: Corrige la minúscula.
Context: ...Corvus aplica ahora esta secuencia: 1. valida el comando contra SecurityPolicy 2. s...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_LOWERCASE)


[grammar] ~68-~68: Oración con errores
Context: ...a el comando contra SecurityPolicy 2. sanitiza variables de entorno 3. envuelve el com...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~68-~68: Oración con errores
Context: ...Policy` 2. sanitiza variables de entorno 3. envuelve el comando con el backend de sandbox se...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~69-~69: Oración con errores
Context: ...o con el backend de sandbox seleccionado 4. ejecuta con timeout 5. audita el resultado con ...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~70-~70: Oración con errores
Context: ...dbox seleccionado 4. ejecuta con timeout 5. audita el resultado con metadata del sandbox ...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~71-~71: Aquí puede haber un error.
Context: ...ta el resultado con metadata del sandbox Eso significa que cada comando shell per...

(QB_NEW_ES)


[style] ~77-~77: En sentido figurado es preferible usar otras expresiones.
Context: ...nto del sandbox noop Si no hay backend a nivel de sistema operativo y require = false, ...

(A_NIVEL_DE)


[grammar] ~80-~80: Corrige la minúscula.
Context: ...none (NoopSandbox). En ese modo: - la política a nivel de aplicación sigue ap...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_LOWERCASE)


[style] ~80-~80: En sentido figurado es preferible usar otras expresiones.
Context: ...pSandbox`). En ese modo: - la política a nivel de aplicación sigue aplicando - los comand...

(A_NIVEL_DE)


[grammar] ~80-~80: Agrega un signo de puntuación.
Context: ...la política a nivel de aplicación sigue aplicando - los comandos mutables emiten una advertenci...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~81-~81: Agrega un signo de puntuación.
Context: ...l sandbox del sistema operativo no está activo - los logs de auditoría registran `sandbox_ba...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~82-~82: Aquí puede haber un error.
Context: ...activo - los logs de auditoría registran sandbox_backend = "none" Esto se permite para desarrollo local, p...

(QB_NEW_ES)


[grammar] ~91-~91: Oración con errores
Context: ...idecar. Valores seguros por defecto: - el endpoint por defecto usa loopback: `htt...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~91-~91: Elimina la palabra o signo.
Context: ...- el endpoint por defecto usa loopback: http://127.0.0.1:8787/v1/actions - los endpoints remotos/públicos se bloque...

(QB_NEW_ES_OTHER_ERROR_IDS_UNNECESSARY_SPACE)


[grammar] ~92-~92: Elimina la palabra o signo.
Context: ... remotos/públicos se bloquean salvo que allow_remote_endpoint = true - los endpoints públicos remotos deben usa...

(QB_NEW_ES_OTHER_ERROR_IDS_UNNECESSARY_SPACE)


[grammar] ~93-~93: Oración con errores
Context: ...points públicos remotos deben usar HTTPS - dominios permitidos, allowlists de ventanas y lí...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~94-~94: Aquí puede haber un error.
Context: ...das se reenvían como política al sidecar Corvus hace un health-check perezoso...

(QB_NEW_ES)


[grammar] ~118-~118: Aquí puede haber un error.
Context: ...o verificarse el aislamiento del sidecar ## Expectativas de auditoría Los eventos d...

(QB_NEW_ES)


[grammar] ~126-~126: Aquí puede haber un error.
Context: ... - bandera de aprobación - éxito/fracaso - security.sandbox_backend La verificación del sidecar de computer-...

(QB_NEW_ES)


[grammar] ~129-~129: Aquí puede haber un error.
Context: ...-use genera una entrada SecurityEvent describiendo: - estado de salud del sidecar - tipo de...

(QB_NEW_ES)


[grammar] ~130-~130: Agrega un signo de puntuación.
Context: ...nt` describiendo: - estado de salud del sidecar - tipo de aislamiento reportado - runti...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~131-~131: Agrega un signo de puntuación.
Context: ...salud del sidecar - tipo de aislamiento reportado - runtime reportado ## Valores recomen...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~132-~132: Aquí puede haber un error.
Context: ...islamiento reportado - runtime reportado ## Valores recomendados para operadores ##...

(QB_NEW_ES)


[grammar] ~171-~171: Elimina la puntuación
Context: ...te de despliegue del sidecar de computer-use.

(QB_NEW_ES_OTHER_ERROR_IDS_UNNECESSARY_PUNCTUATION)

🪛 markdownlint-cli2 (0.22.0)
openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md

[warning] 8-8: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)


[warning] 138-138: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🔇 Additional comments (1)
clients/agent-runtime/src/bootstrap/mod.rs (1)

207-209: Good fail-fast placement.

Creating the sandbox before memory/runtime setup keeps sandbox.require = true genuinely fail-closed and avoids doing startup work that will be discarded anyway.

Based on learnings, Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable.

Comment on lines +605 to +625
let structured = result.structured.as_ref();
let risk_level = structured
.and_then(|v| v.get("risk_level"))
.and_then(|v| v.as_str())
.unwrap_or("unknown");
let sandbox_backend = structured
.and_then(|v| v.get("sandbox_backend"))
.and_then(|v| v.as_str())
.map(str::to_string);

let duration_ms = u64::try_from(duration.as_millis()).unwrap_or(u64::MAX);

if let Err(error) = logger.log_command_event(CommandExecutionLog {
channel: "agent",
command: &redacted_command,
risk_level,
approved,
allowed: result.structured.is_some(),
success: result.success,
duration_ms,
sandbox_backend,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

allowed is being inferred from the wrong signal.

result.structured.is_some() is not equivalent to “policy allowed this command.” clients/agent-runtime/src/tools/shell.rs:147-153 still returns structured: None when runtime command construction fails after validation, so this path logs allowed=false / risk_level=unknown for a command that was actually approved and only failed later. Carry an explicit allowed bit (and risk level) out of ShellTool instead of using metadata presence as the proxy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/agent/agent.rs` around lines 605 - 625, The code is
using result.structured.is_some() as a proxy for "allowed", which is incorrect;
update the ShellTool (the tool that builds runtime command results) to carry
explicit fields like allowed: bool and risk_level: Option<String> (or
risk_level: &str) in the result type, populate those when approval/validation
succeeds or fails, and then change the logger call (logger.log_command_event /
CommandExecutionLog construction) to use result.allowed and
result.risk_level.unwrap_or("unknown") instead of result.structured.is_some()
and the current risk_level extraction; ensure any callers that create
CommandExecutionLog use the new explicit fields so approved-but-failed commands
log allowed=true with the correct risk level.

Comment on lines +345 to +350
fn computer_use_health_url(&self) -> anyhow::Result<reqwest::Url> {
let mut url = self.computer_use_endpoint_url()?;
url.set_path("/v1/health");
url.set_query(None);
url.set_fragment(None);
Ok(url)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the configured base path when deriving the health URL.

set_path("/v1/health") always rewrites from the host root. A configured endpoint like https://sidecar.example.com/proxy/v1/actions would probe https://sidecar.example.com/v1/health instead of the sibling route and fail verification even though the action endpoint is valid. Replace only the trailing /v1/actions, or validate the configured path and reject unsupported shapes up front.

Suggested fix
 fn computer_use_health_url(&self) -> anyhow::Result<reqwest::Url> {
     let mut url = self.computer_use_endpoint_url()?;
-    url.set_path("/v1/health");
+    let health_path = url
+        .path()
+        .strip_suffix("/v1/actions")
+        .map(|prefix| format!("{prefix}/v1/health"))
+        .ok_or_else(|| {
+            anyhow::anyhow!(
+                "browser.computer_use.endpoint must end with /v1/actions for sidecar health verification"
+            )
+        })?;
+    url.set_path(&health_path);
     url.set_query(None);
     url.set_fragment(None);
     Ok(url)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn computer_use_health_url(&self) -> anyhow::Result<reqwest::Url> {
let mut url = self.computer_use_endpoint_url()?;
url.set_path("/v1/health");
url.set_query(None);
url.set_fragment(None);
Ok(url)
fn computer_use_health_url(&self) -> anyhow::Result<reqwest::Url> {
let mut url = self.computer_use_endpoint_url()?;
let health_path = url
.path()
.strip_suffix("/v1/actions")
.map(|prefix| format!("{prefix}/v1/health"))
.ok_or_else(|| {
anyhow::anyhow!(
"browser.computer_use.endpoint must end with /v1/actions for sidecar health verification"
)
})?;
url.set_path(&health_path);
url.set_query(None);
url.set_fragment(None);
Ok(url)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/tools/browser.rs` around lines 345 - 350, The
computer_use_health_url function currently calls computer_use_endpoint_url() and
then uses url.set_path("/v1/health"), which discards any configured base path;
change the logic to preserve the endpoint's base path by inspecting the path
returned from computer_use_endpoint_url(): if the path ends_with "/v1/actions"
replace only that suffix with "/v1/health" (keeping existing leading path
segments), otherwise either accept known alternative shapes or return an error
validating unsupported path shapes; ensure you still clear query/fragment and
return the modified reqwest::Url from computer_use_health_url, referencing
computer_use_health_url and computer_use_endpoint_url in your changes.

Comment thread clients/agent-runtime/src/tools/browser.rs Outdated
Comment thread clients/agent-runtime/src/tools/browser.rs Outdated
Comment thread clients/agent-runtime/src/tools/shell.rs
Comment on lines +42 to +47
Corvus tries supported backends in platform order and uses the first available one.

- Linux: `landlock` → `firejail` → `docker`
- macOS: `bubblewrap` → `docker`
- other platforms: `docker`
- if none are available and `require = false`, Corvus falls back to `none`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The auto-detection matrix is out of sync with the runtime.

This PR adds Bubblewrap probing on Linux, but the guide omits it from Linux and lists it under macOS. That gives operators the wrong expectation about what backend = "auto" can actually select. Please copy the exact platform order from the detection code here, and mirror the same fix in clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md.

As per coding guidelines, verify technical accuracy and that docs stay aligned with code changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md`
around lines 42 - 47, The Linux backend autodetection order in the guide is out
of sync with the runtime: update the list for backend = "auto" so Linux reads
"landlock → bubblewrap → firejail → docker" (preserve macOS as "bubblewrap →
docker" and the fallback behavior when require = false -> none); mirror this
exact change in the corresponding translated/ES runtime-sandbox-isolation guide
so both docs match the detection code.

Comment thread openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md Outdated
…ning

- Extract explicit allowed from structured payload instead of is_some() proxy
- Make audit failure terminal instead of retriable (DispatchAction)
- Strip credentials from sidecar endpoint URL before logging
- Fail closed on missing/bad sidecar health status or isolation
- Preserve base path in health URL construction
- Redact command in noop sandbox tracing warning
- Update docs: Linux autodetect order includes bubblewrap
- Note command redaction in audit expectations (EN + ES)
- Fix proposal audit.rs status and verify-report heading hierarchy
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
clients/agent-runtime/src/tools/browser.rs (1)

867-873: ⚠️ Potential issue | 🟠 Major

Required sidecar verification failures bypass browser audit logging.

Line 872 uses ?, so a required health-check failure returns Err before any ToolResult.structured payload is built. clients/agent-runtime/src/agent/agent.rs only calls log_browser_security_event() on Ok(ToolResult) paths, so the blocked computer-use action is never audited. Return a failed ToolResult with the verification details, or emit the audit event before propagating the error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/tools/browser.rs` around lines 867 - 873, The
verify_computer_use_sidecar() call in execute_computer_use_action currently uses
the ? operator which returns Err early and bypasses browser audit logging;
update execute_computer_use_action so that when verify_computer_use_sidecar()
(and similarly computer_use_endpoint_url()) fails you either (a) construct and
return an Err(…) that wraps an Ok(ToolResult::structured(...)) representing the
failed verification (including the verification/error details) so callers still
receive an Ok(ToolResult) and can call log_browser_security_event(), or (b)
explicitly call log_browser_security_event() with the verification failure
details before propagating the error; locate execute_computer_use_action,
verify_computer_use_sidecar, computer_use_endpoint_url, and ToolResult to
implement one of these two approaches and ensure the audit payload contains the
verification result and error message.
♻️ Duplicate comments (4)
clients/agent-runtime/src/tools/browser.rs (2)

345-353: ⚠️ Potential issue | 🟠 Major

Unsupported computer-use endpoint paths should fail closed.

The fallback branch rewrites any non-/v1/actions endpoint to host-root /v1/health. That can verify a different service than the one receiving actions, so require=true may succeed against the wrong health endpoint on proxied or misconfigured deployments. Replace only the trailing /v1/actions, or reject unsupported shapes.

Possible fix
 fn computer_use_health_url(&self) -> anyhow::Result<reqwest::Url> {
     let mut url = self.computer_use_endpoint_url()?;
-    let path = url.path().to_string();
-    let new_path = if path.ends_with("/v1/actions") {
-        path.replace("/v1/actions", "/v1/health")
-    } else {
-        "/v1/health".to_string()
-    };
+    let new_path = url
+        .path()
+        .strip_suffix("/v1/actions")
+        .map(|prefix| format!("{prefix}/v1/health"))
+        .ok_or_else(|| {
+            anyhow::anyhow!(
+                "browser.computer_use.endpoint must end with /v1/actions for sidecar health verification"
+            )
+        })?;
     url.set_path(&new_path);
     url.set_query(None);
     url.set_fragment(None);
     Ok(url)
 }

Based on learnings: "Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/tools/browser.rs` around lines 345 - 353, In
computer_use_health_url(), avoid rewriting arbitrary endpoints to "/v1/health":
only accept and rewrite when the path ends exactly with "/v1/actions" (use
strip_suffix or check the trailing segment) and replace that suffix with
"/v1/health"; otherwise return an Err (e.g., anyhow!("unsupported
computer_use_endpoint path: {}", url.path())) so unsupported shapes fail closed;
update the logic around computer_use_endpoint_url(), path manipulation, and
url.set_path accordingly.

912-917: ⚠️ Potential issue | 🟠 Major

Do not format the raw sidecar URL into this error.

self.computer_use.endpoint can contain user:pass@.... On request failure that raw value is surfaced through the tool error path. Reuse the sanitized endpoint URL instead.

Possible fix
+        let mut sanitized_endpoint = endpoint.clone();
+        let _ = sanitized_endpoint.set_username("");
+        let _ = sanitized_endpoint.set_password(None);
         let response = request.send().await.with_context(|| {
-            format!(
-                "Failed to call computer-use sidecar at {}",
-                self.computer_use.endpoint
-            )
+            format!("Failed to call computer-use sidecar at {sanitized_endpoint}")
         })?;

As per coding guidelines, "Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/tools/browser.rs` around lines 912 - 917, The error
message currently interpolates the raw URL via self.computer_use.endpoint which
may include credentials; change the with_context closure to use the
already-sanitized endpoint value instead (e.g., reuse a previously computed
sanitized `endpoint` variable or a sanitized accessor like
self.computer_use.sanitized_endpoint) so the formatted string no longer exposes
secrets when calling request.send().await.
clients/agent-runtime/src/agent/agent.rs (2)

643-690: ⚠️ Potential issue | 🟠 Major

audit_strict is still fail-open for browser audit writes.

If logger.log(&event) fails, this helper only changes the log level. In strict mode the browser action still returns normally, unlike the shell and code-session audit paths. Make this return Result<()> and propagate it from execute_tool_call().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/agent/agent.rs` around lines 643 - 690, The helper
log_browser_security_event currently swallows logger.log failures and only
alters log level; change fn log_browser_security_event(&self, result:
&crate::tools::ToolResult) to return Result<(), crate::Error> (or the crate's
common Result type), convert the internal early returns to Ok(()) where
appropriate, and propagate logger.log errors as Err(...) when self.audit_strict
is true (or always return Err on failure so the caller can decide); update
callers—most importantly execute_tool_call()—to handle the Result from
log_browser_security_event and propagate the error out of execute_tool_call() so
browser audit write failures do not fail-open. Ensure references to
audit_strict, logger.log(&event), AuditEvent/AuditEventType, and
execute_tool_call are adjusted to the new fallible flow.

607-620: ⚠️ Potential issue | 🟠 Major

allowed is still derived from approved.

approved only means a supervised approval was supplied; it is not equivalent to “policy allowed this command.” Successful low-risk shell executions will now be audited as allowed=false, and post-validation ShellTool failures that return structured: None still collapse to allowed=false / risk_level="unknown". Carry an explicit allowed bit out of clients/agent-runtime/src/tools/shell.rs and read that here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/agent/agent.rs` around lines 607 - 620, The code is
deriving `allowed` from the `approved` structured field, which is incorrect;
update the `agent` parsing in agent.rs to read the explicit allowed boolean
propagated from the tool (e.g., the explicit allowed bit added to the tool
result in clients/agent-runtime/src/tools/shell.rs) instead of using `approved`.
Locate the parsing block that sets `risk_level`, `sandbox_backend`, and
`allowed` (around the `let structured = result.structured.as_ref();` section)
and change `allowed` to prefer the explicit field carried on the result (for
example `result.allowed` or `structured.and_then(|v|
v.get("allowed")).and_then(|v| v.as_bool())`) falling back only if that explicit
flag is absent; remove reliance on `approved` for policy allowance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md`:
- Around line 86-88: The docs list of computer-use sidecar actions is missing
"key_press"; update the English page's action list
(`runtime-sandbox-isolation.md` under the Computer-use sidecar section) to
include `key_press` alongside `mouse_move`, `mouse_click`, `mouse_drag`,
`key_type`, and `screen_capture`, and make the identical edit in the Spanish
file
(`clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md`)
so the docs match the implementation in
clients/agent-runtime/src/tools/browser.rs which treats `key_press` as a
computer-use-only action.

In `@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md`:
- Around line 57-59: The proposal text must be updated to reflect that sidecar
verification is performed lazily on first use rather than at startup: change all
references in Phase 5, the startup-latency risk, and the success criteria from
"startup" to "first computer-use" (BrowserTool's lazy verification flow), state
that the SecurityEvent audit and health-check call occur on first computer-use,
and update the docs section describing the isolation contract (docs/) to match
the first-use contract; apply the same wording changes to the other affected
sections referenced (sections 90 and 111).

In
`@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md`:
- Around line 62-89: The compliance total is inconsistent: the table shows 13 "✅
COMPLIANT" rows but the footer reads "Compliance summary: 14/24 scenarios
compliant"; update the footer to match the table by changing the string
"Compliance summary: 14/24 scenarios compliant" to "Compliance summary: 13/24
scenarios compliant" (search for that exact footer line in verify-report.md) or
alternatively flip one of the rows to ✅ if you intended 14 compliant—ensure the
table rows and the final summary string are numerically consistent.

---

Outside diff comments:
In `@clients/agent-runtime/src/tools/browser.rs`:
- Around line 867-873: The verify_computer_use_sidecar() call in
execute_computer_use_action currently uses the ? operator which returns Err
early and bypasses browser audit logging; update execute_computer_use_action so
that when verify_computer_use_sidecar() (and similarly
computer_use_endpoint_url()) fails you either (a) construct and return an Err(…)
that wraps an Ok(ToolResult::structured(...)) representing the failed
verification (including the verification/error details) so callers still receive
an Ok(ToolResult) and can call log_browser_security_event(), or (b) explicitly
call log_browser_security_event() with the verification failure details before
propagating the error; locate execute_computer_use_action,
verify_computer_use_sidecar, computer_use_endpoint_url, and ToolResult to
implement one of these two approaches and ensure the audit payload contains the
verification result and error message.

---

Duplicate comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 643-690: The helper log_browser_security_event currently swallows
logger.log failures and only alters log level; change fn
log_browser_security_event(&self, result: &crate::tools::ToolResult) to return
Result<(), crate::Error> (or the crate's common Result type), convert the
internal early returns to Ok(()) where appropriate, and propagate logger.log
errors as Err(...) when self.audit_strict is true (or always return Err on
failure so the caller can decide); update callers—most importantly
execute_tool_call()—to handle the Result from log_browser_security_event and
propagate the error out of execute_tool_call() so browser audit write failures
do not fail-open. Ensure references to audit_strict, logger.log(&event),
AuditEvent/AuditEventType, and execute_tool_call are adjusted to the new
fallible flow.
- Around line 607-620: The code is deriving `allowed` from the `approved`
structured field, which is incorrect; update the `agent` parsing in agent.rs to
read the explicit allowed boolean propagated from the tool (e.g., the explicit
allowed bit added to the tool result in
clients/agent-runtime/src/tools/shell.rs) instead of using `approved`. Locate
the parsing block that sets `risk_level`, `sandbox_backend`, and `allowed`
(around the `let structured = result.structured.as_ref();` section) and change
`allowed` to prefer the explicit field carried on the result (for example
`result.allowed` or `structured.and_then(|v| v.get("allowed")).and_then(|v|
v.as_bool())`) falling back only if that explicit flag is absent; remove
reliance on `approved` for policy allowance.

In `@clients/agent-runtime/src/tools/browser.rs`:
- Around line 345-353: In computer_use_health_url(), avoid rewriting arbitrary
endpoints to "/v1/health": only accept and rewrite when the path ends exactly
with "/v1/actions" (use strip_suffix or check the trailing segment) and replace
that suffix with "/v1/health"; otherwise return an Err (e.g.,
anyhow!("unsupported computer_use_endpoint path: {}", url.path())) so
unsupported shapes fail closed; update the logic around
computer_use_endpoint_url(), path manipulation, and url.set_path accordingly.
- Around line 912-917: The error message currently interpolates the raw URL via
self.computer_use.endpoint which may include credentials; change the
with_context closure to use the already-sanitized endpoint value instead (e.g.,
reuse a previously computed sanitized `endpoint` variable or a sanitized
accessor like self.computer_use.sanitized_endpoint) so the formatted string no
longer exposes secrets when calling request.send().await.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ca9d1ee2-0d89-4128-88c9-e3e2087acc6c

📥 Commits

Reviewing files that changed from the base of the PR and between 2f290bd and 47a2835.

📒 Files selected for processing (7)
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pr-checks
  • GitHub Check: sonar
  • GitHub Check: submit-gradle
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{md,mdx}

⚙️ CodeRabbit configuration file

**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

Files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
clients/agent-runtime/src/tools/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Tool trait in src/tools/ with strict parameter schema, validate and sanitize all inputs, and return structured ToolResult without panics in runtime path

Files:

  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
clients/agent-runtime/src/{security,gateway,tools}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks

Files:

  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:

  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
  • clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md
  • clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Include threat/risk notes and rollback strategy for security, runtime, and gateway changes; add or update tests for boundary checks and failure modes

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/shell.rs
  • openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements

Applied to files:

  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow

Applied to files:

  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/tools/shell.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Applied to files:

  • clients/agent-runtime/src/tools/browser.rs
  • clients/agent-runtime/src/tools/shell.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

  • clients/agent-runtime/src/tools/browser.rs
🪛 LanguageTool
clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md

[grammar] ~11-~11: Corrige la mayúscula.
Context: ...n docType: guide --- # Aislamiento del Sandbox del Runtime Corvus usa **dos capas de ...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_UPPERCASE)


[grammar] ~11-~11: Corrige la mayúscula.
Context: ...uide --- # Aislamiento del Sandbox del Runtime Corvus usa dos capas de seguridad pa...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_UPPERCASE)


[style] ~15-~15: En sentido figurado es preferible usar otras expresiones.
Context: ... disparada por usuarios: 1. Política a nivel de aplicación mediante SecurityPolicy ...

(A_NIVEL_DE)


[grammar] ~16-~16: Agrega un signo de puntuación.
Context: ...nte SecurityPolicy - allowlists de comandos - restricciones de rutas - límite...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~17-~17: Agrega un signo de puntuación.
Context: ...lists de comandos - restricciones de rutas - límites de tasa - clasificación...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~18-~18: Agrega un signo de puntuación.
Context: ... restricciones de rutas - límites de tasa - clasificación de riesgo y compuert...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~19-~19: Agrega un signo de puntuación.
Context: ...clasificación de riesgo y compuertas de aprobación 2. **Sandboxing a nivel de sistema oper...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[style] ~20-~20: En sentido figurado es preferible usar otras expresiones.
Context: ...ompuertas de aprobación 2. Sandboxing a nivel de sistema operativo mediante los backen...

(A_NIVEL_DE)


[grammar] ~20-~20: Aquí puede haber un error.
Context: ...iante los backends Sandbox del runtime - landlock - firejail - bubblewrap - docker Las dos capas importan. La capa de polít...

(QB_NEW_ES)


[grammar] ~55-~55: Cambia la palabra o signo.
Context: ... → error al iniciar - backend = "auto" sin backend encontrado → error al inicia...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_SPACE)


[grammar] ~56-~56: Cambia la palabra o signo.
Context: ... → error al iniciar - backend = "none" → error al iniciar - enabled = false →...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_SPACE)


[grammar] ~57-~57: Cambia la palabra o signo.
Context: ...→ error al iniciar -enabled = false` → error al iniciar Úsalo en despliegues...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_SPACE)


[grammar] ~67-~67: Corrige la minúscula.
Context: ...Corvus aplica ahora esta secuencia: 1. valida el comando contra SecurityPolicy 2. s...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_LOWERCASE)


[grammar] ~68-~68: Oración con errores
Context: ...a el comando contra SecurityPolicy 2. sanitiza variables de entorno 3. envuelve el com...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~68-~68: Oración con errores
Context: ...Policy` 2. sanitiza variables de entorno 3. envuelve el comando con el backend de sandbox se...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~69-~69: Oración con errores
Context: ...o con el backend de sandbox seleccionado 4. ejecuta con timeout 5. audita el resultado con ...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~70-~70: Oración con errores
Context: ...dbox seleccionado 4. ejecuta con timeout 5. audita el resultado con metadata del sandbox ...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~71-~71: Aquí puede haber un error.
Context: ...ta el resultado con metadata del sandbox Eso significa que cada comando shell per...

(QB_NEW_ES)


[style] ~77-~77: En sentido figurado es preferible usar otras expresiones.
Context: ...nto del sandbox noop Si no hay backend a nivel de sistema operativo y require = false, ...

(A_NIVEL_DE)


[grammar] ~80-~80: Corrige la minúscula.
Context: ...none (NoopSandbox). En ese modo: - la política a nivel de aplicación sigue ap...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_LOWERCASE)


[style] ~80-~80: En sentido figurado es preferible usar otras expresiones.
Context: ...pSandbox`). En ese modo: - la política a nivel de aplicación sigue aplicando - los comand...

(A_NIVEL_DE)


[grammar] ~80-~80: Agrega un signo de puntuación.
Context: ...la política a nivel de aplicación sigue aplicando - los comandos mutables emiten una advertenci...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~81-~81: Agrega un signo de puntuación.
Context: ...l sandbox del sistema operativo no está activo - los logs de auditoría registran `sandbox_ba...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~82-~82: Aquí puede haber un error.
Context: ...activo - los logs de auditoría registran sandbox_backend = "none" Esto se permite para desarrollo local, p...

(QB_NEW_ES)


[grammar] ~91-~91: Oración con errores
Context: ...idecar. Valores seguros por defecto: - el endpoint por defecto usa loopback: `htt...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~91-~91: Elimina la palabra o signo.
Context: ...- el endpoint por defecto usa loopback: http://127.0.0.1:8787/v1/actions - los endpoints remotos/públicos se bloque...

(QB_NEW_ES_OTHER_ERROR_IDS_UNNECESSARY_SPACE)


[grammar] ~92-~92: Elimina la palabra o signo.
Context: ... remotos/públicos se bloquean salvo que allow_remote_endpoint = true - los endpoints públicos remotos deben usa...

(QB_NEW_ES_OTHER_ERROR_IDS_UNNECESSARY_SPACE)


[grammar] ~93-~93: Oración con errores
Context: ...points públicos remotos deben usar HTTPS - dominios permitidos, allowlists de ventanas y lí...

(QB_NEW_ES_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~94-~94: Aquí puede haber un error.
Context: ...das se reenvían como política al sidecar Corvus hace un health-check perezoso...

(QB_NEW_ES)


[grammar] ~118-~118: Aquí puede haber un error.
Context: ...o verificarse el aislamiento del sidecar ## Expectativas de auditoría Los eventos d...

(QB_NEW_ES)


[grammar] ~126-~126: Aquí puede haber un error.
Context: ... - bandera de aprobación - éxito/fracaso - security.sandbox_backend La verificación del sidecar de computer-...

(QB_NEW_ES)


[grammar] ~129-~129: Aquí puede haber un error.
Context: ...-use genera una entrada SecurityEvent describiendo: - estado de salud del sidecar - tipo de...

(QB_NEW_ES)


[grammar] ~130-~130: Agrega un signo de puntuación.
Context: ...nt` describiendo: - estado de salud del sidecar - tipo de aislamiento reportado - runti...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~131-~131: Agrega un signo de puntuación.
Context: ...salud del sidecar - tipo de aislamiento reportado - runtime reportado ## Valores recomen...

(QB_NEW_ES_OTHER_ERROR_IDS_MISSING_PUNCTUATION)


[grammar] ~132-~132: Aquí puede haber un error.
Context: ...islamiento reportado - runtime reportado ## Valores recomendados para operadores ##...

(QB_NEW_ES)


[grammar] ~171-~171: Elimina la puntuación
Context: ...te de despliegue del sidecar de computer-use.

(QB_NEW_ES_OTHER_ERROR_IDS_UNNECESSARY_PUNCTUATION)

Comment on lines +86 to +88
## Computer-use sidecar isolation

Computer-use actions (`mouse_move`, `mouse_click`, `mouse_drag`, `key_type`, `screen_capture`) use a sidecar.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The computer-use action list is missing key_press.

clients/agent-runtime/src/tools/browser.rs treats key_press as a computer-use-only action, but this guide omits it from the supported sidecar actions. Add it here and mirror the same change in clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md to keep EN/ES aligned.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/web/apps/docs/src/content/docs/guides/runtime-sandbox-isolation.md`
around lines 86 - 88, The docs list of computer-use sidecar actions is missing
"key_press"; update the English page's action list
(`runtime-sandbox-isolation.md` under the Computer-use sidecar section) to
include `key_press` alongside `mouse_move`, `mouse_click`, `mouse_drag`,
`key_type`, and `screen_capture`, and make the identical edit in the Spanish
file
(`clients/web/apps/docs/src/content/docs/es/guides/runtime-sandbox-isolation.md`)
so the docs match the implementation in
clients/agent-runtime/src/tools/browser.rs which treats `key_press` as a
computer-use-only action.

Comment on lines +57 to +59
- Add an optional health-check call to the sidecar endpoint at `BrowserTool` initialization that queries isolation capabilities.
- Log the sidecar's reported isolation level as an audit event (`SecurityEvent` type).
- Document the expected isolation contract in `docs/` — what operators SHOULD ensure about sidecar deployment.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The proposal still treats sidecar verification as startup-time.

clients/agent-runtime/src/tools/browser.rs now verifies the sidecar lazily on the first computer-use action, not during initialization. Phase 5, the startup-latency risk, and the success criteria still say "startup", which changes when operators expect failures and audit events to surface. Please update those sections to the first-use contract.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

Also applies to: 90-90, 111-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/proposal.md`
around lines 57 - 59, The proposal text must be updated to reflect that sidecar
verification is performed lazily on first use rather than at startup: change all
references in Phase 5, the startup-latency risk, and the success criteria from
"startup" to "first computer-use" (BrowserTool's lazy verification flow), state
that the SecurityEvent audit and health-check call occur on first computer-use,
and update the docs section describing the isolation contract (docs/) to match
the first-use contract; apply the same wording changes to the other affected
sections referenced (sections 90 and 111).

Comment on lines +62 to +89
| Requirement | Scenario | Test | Result |
|-------------|----------|------|--------|
| R1 | R1-S1 Sandbox wraps every shell command | `tools::shell::tests::shell_calls_wrap_command_on_injected_sandbox` | ✅ COMPLIANT |
| R1 | R1-S2 Sandbox error blocks execution | `tools::shell::tests::shell_returns_error_when_sandbox_wrap_fails` | ✅ COMPLIANT |
| R1 | R1-S3 NoopSandbox passes through unchanged | `tools::shell::tests::shell_executes_allowed_command` | ⚠️ PARTIAL |
| R1 | R1-S4 Wrapping happens after env sanitization | `tools::shell::tests::shell_wraps_after_env_sanitization` | ✅ COMPLIANT |
| R2 | R2-S1 Require mode with available backend succeeds | `security::detect::tests::docker_backend_falls_back_gracefully` | ⚠️ PARTIAL |
| R2 | R2-S2 Require mode with unavailable explicit backend fails | `security::detect::tests::require_landlock_on_non_linux_returns_error` | ✅ COMPLIANT |
| R2 | R2-S3 Require mode with auto finds nothing fails | `security::detect::tests::require_auto_no_backend_returns_error_or_ok` | ⚠️ PARTIAL |
| R2 | R2-S4 Non-require mode falls back to NoopSandbox | `security::detect::tests::explicit_none_returns_noop`, `security::detect::tests::default_security_config_produces_working_sandbox` | ⚠️ PARTIAL |
| R2 | R2-S5 Explicit none backend with require is an error | `security::detect::tests::require_none_backend_returns_error` | ✅ COMPLIANT |
| R2 | R2-S6 Disabled sandbox with require is an error | `security::detect::tests::require_disabled_returns_error` | ✅ COMPLIANT |
| R3 | R3-S1 Warning for mutating command with NoopSandbox | `tools::shell::tests::noop_sandbox_warning_helper_triggers_for_mutating_commands` | ⚠️ PARTIAL |
| R3 | R3-S2 No warning for read-only command with NoopSandbox | `tools::shell::tests::noop_sandbox_warning_helper_skips_read_only_commands` | ⚠️ PARTIAL |
| R3 | R3-S3 No warning when real sandbox is active | `tools::shell::tests::noop_sandbox_warning_helper_skips_real_sandbox` | ⚠️ PARTIAL |
| R4 | R4-S1 Audit event includes sandbox backend name | `security::audit::tests::audit_log_command_event_writes_structured_entry` | ✅ COMPLIANT |
| R4 | R4-S2 NoopSandbox is recorded as `none` | `security::audit::tests::audit_log_command_event_writes_structured_entry` | ⚠️ PARTIAL — test injects values into `CommandExecutionLog` proving serialization, but does not assert `NoopSandbox` name is emitted via the actual shell execution path; an end-to-end runtime propagation test is needed |
| R4 | R4-S3 Real backend name is recorded | `security::audit::tests::audit_log_command_event_records_real_sandbox_backend` | ⚠️ PARTIAL — test injects a backend name into `CommandExecutionLog` proving serialization, but does not assert the real sandbox backend name flows through the actual shell/browser execution path; an end-to-end runtime propagation test is needed |
| R5 | R5-S1 Healthy sidecar with isolation info | `tools::browser::tests::computer_use_sidecar_health_check_reports_isolation_info` | ✅ COMPLIANT |
| R5 | R5-S2 Sidecar health-check fails gracefully | `tools::browser::tests::computer_use_sidecar_health_check_fails_gracefully_when_optional` | ✅ COMPLIANT |
| R5 | R5-S3 Sidecar health-check with require mode | `tools::browser::tests::computer_use_sidecar_health_check_rejects_when_required` | ✅ COMPLIANT |
| R5 | R5-S4 Sidecar without health endpoint | `tools::browser::tests::computer_use_sidecar_missing_health_endpoint_is_treated_as_failure` | ✅ COMPLIANT |
| R6 | R6-S1 Default config produces identical behavior | `config::schema::tests::sandbox_config_default_require_is_false`, `bootstrap::tests::bootstrap_context_builds_core_components` | ⚠️ PARTIAL |
| R6 | R6-S2 Existing SecurityPolicy tests pass unchanged | full `cargo test` (`security::policy::tests::*`) | ✅ COMPLIANT |
| R6 | R6-S3 Config without require field deserializes correctly | `config::schema::tests::sandbox_config_missing_require_defaults_to_false` | ✅ COMPLIANT |
| R6 | R6-S4 CLI contract unchanged | full `cargo test`, `mcp_native_regression`, `mcp_runtime_e2e` | ⚠️ PARTIAL |

**Compliance summary**: 14/24 scenarios compliant
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Compliance summary is off by one.

The matrix currently contains 13 ✅ COMPLIANT rows, but Line 89 says 14/24. Please fix the summary or the row statuses so the archived report stays internally consistent.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@openspec/changes/archive/2026-04-02-runtime-sandbox-hardening/verify-report.md`
around lines 62 - 89, The compliance total is inconsistent: the table shows 13
"✅ COMPLIANT" rows but the footer reads "Compliance summary: 14/24 scenarios
compliant"; update the footer to match the table by changing the string
"Compliance summary: 14/24 scenarios compliant" to "Compliance summary: 13/24
scenarios compliant" (search for that exact footer line in verify-report.md) or
alternatively flip one of the rows to ✅ if you intended 14 compliant—ensure the
table rows and the final summary string are numerically consistent.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 3, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
73.9% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@yacosta738 yacosta738 merged commit b8d994e into main Apr 3, 2026
12 of 14 checks passed
@yacosta738 yacosta738 deleted the runtime-sandbox branch April 3, 2026 09:09
@yacosta738 yacosta738 mentioned this pull request Apr 3, 2026
@coderabbitai coderabbitai Bot mentioned this pull request Apr 4, 2026
@dallay-bot dallay-bot Bot mentioned this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden runtime sandboxing for user-triggered execution and sidecar-assisted actions

1 participant