Skip to content

fix(sandbox): two-phase Landlock to fix privilege ordering and add enforcement tests#810

Open
johntmyers wants to merge 7 commits intomainfrom
fix/landlock-availability-logging
Open

fix(sandbox): two-phase Landlock to fix privilege ordering and add enforcement tests#810
johntmyers wants to merge 7 commits intomainfrom
fix/landlock-availability-logging

Conversation

@johntmyers
Copy link
Copy Markdown
Collaborator

Summary

Fix the Landlock privilege ordering bug reported in #803 and add comprehensive e2e tests for Landlock filesystem enforcement. The root cause was that drop_privileges() ran before sandbox::apply(), causing PathFd::new() to execute as uid 998 where root-only paths silently failed. In best_effort mode this could drop all Landlock restrictions entirely.

Related Issue

Fixes #803

Changes

  • Two-phase Landlock (landlock.rs, mod.rs): Split apply() into prepare() (opens PathFds as root) and enforce() (calls restrict_self() after privilege drop). Root-only paths (mode 700) now succeed instead of silently failing.
  • Parent-side availability probe (landlock.rs, mod.rs): Added probe_availability() using raw landlock_create_ruleset syscall to check kernel support, with OCSF logging from the parent process where tracing works. Previously, Landlock status was only logged from the pre_exec child where the tracing pipeline is non-functional after fork.
  • Updated spawn paths (process.rs, ssh.rs): All three spawn paths (entrypoint, SSH PTY, SSH pipe) now use prepare-before-drop-enforce-after ordering.
  • E2e enforcement tests (test_sandbox_landlock.py): 5 tests verifying read-only blocks writes, read-write allows both, paths outside policy are denied, and user-owned paths outside policy are still blocked (proves Landlock enforces independently of Unix DAC).

Testing

  • cargo check -p openshell-sandbox passes (no new warnings)
  • cargo test -p openshell-sandbox — 416 tests pass
  • mise run e2e:python -- -k test_sandbox_landlock -v — 5 passed
  • Manual verification: openshell logs shows CONFIG:PROBED [INFO] Landlock filesystem sandbox available [abi:v6]
  • Manual verification: Landlock enforcement confirmed via write tests on running sandbox

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Landlock status was only logged from inside the pre_exec child process
where the tracing/OCSF pipeline is non-functional after fork. This made
Landlock failures completely invisible in sandbox logs.

Add a probe_availability() function that issues the raw
landlock_create_ruleset syscall to check kernel support, and call it
from the parent process before fork in all three spawn paths (entrypoint,
SSH PTY, SSH pipe). Uses std::sync::Once to emit exactly once per
sandbox lifetime.

WIP - addresses logging gap from #803.
Verify Landlock availability logging and enforcement in e2e:
- OCSF probe event appears in sandbox logs
- Read-only paths block writes, allow reads
- Read-write paths allow both
- Paths outside policy are denied entirely
- User-owned paths outside policy are still blocked (proves
  Landlock enforces independently of Unix DAC permissions)

Requires Linux host with Landlock support (GitHub Actions runners,
Docker Desktop linuxkit). Related to #803.
Two strict xfail tests that demonstrate the root cause of #803:
- PathFd::new() runs as uid 998 after drop_privileges, so root-only
  paths (mode 700) silently fail and Landlock degrades
- When ALL paths fail, best_effort silently drops Landlock entirely

These tests will pass after the two-phase Landlock fix (open PathFds
as root before drop_privileges, restrict_self after).
- Remove log-reading tests: the Landlock probe logs to the
  supervisor's container stdout, not the in-sandbox file appender
  at /var/log/openshell*.log*
- Replace broken xfail test (checked for child-process log messages
  that never reach the file appender) with a stat()-based test that
  verifies /root is actually in the Landlock allowlist
- Keep enforcement tests (all passing) and the root-only policy
  xfail test (correctly proves #803 bug)
Landlock doesn't restrict stat(), so the test passed unexpectedly.
In the mixed policy case where /root is silently skipped, Landlock
still applies (using other paths) and blocks /root even harder
(not in allowlist = denied). The observable security degradation
only occurs when ALL paths fail, which the existing xfail test
already covers.
@johntmyers johntmyers requested a review from a team as a code owner April 10, 2026 22:38
@johntmyers johntmyers self-assigned this Apr 10, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Split Landlock apply into prepare() and enforce():
- prepare() runs as root before drop_privileges: opens PathFds,
  creates ruleset, adds rules. Root-only paths (mode 700) now
  succeed instead of silently failing as uid 998.
- enforce() runs after drop_privileges: calls restrict_self()
  which does not require root.

This fixes the root cause of #803 where drop_privileges() ran
before sandbox::apply(), causing PathFd::new() to fail on
root-only paths. In best_effort mode this silently dropped all
Landlock restrictions.

The fix applies to all three spawn paths: entrypoint (process.rs),
SSH PTY (ssh.rs), and SSH pipe exec (ssh.rs).

Removes xfail marker from e2e test that now passes.
@johntmyers johntmyers force-pushed the fix/landlock-availability-logging branch from df272a6 to 2f8185f Compare April 10, 2026 22:40
- enforce() now respects best_effort: if restrict_self() fails and
  policy is best_effort, log and degrade instead of aborting startup
- log_sandbox_readiness distinguishes best_effort (degraded) from
  hard_requirement (will fail) in OCSF messages
@mjamiv
Copy link
Copy Markdown

mjamiv commented Apr 11, 2026

Local validation for PR #810:

  • cargo check -p openshell-sandbox passes on pr-810
  • cargo test -p openshell-sandbox sandbox::linux::landlock:: -- --nocapture passes cleanly here
  • A broader cargo test -p openshell-sandbox -- --nocapture run still fails in three process::tests::drop_privileges_* cases, but those same three failures reproduce unchanged on origin/main in this environment, so I do not read them as regressions from this PR

I have not done a live supervisor swap yet, so this is code-path validation rather than deployment validation.

@mjamiv
Copy link
Copy Markdown

mjamiv commented Apr 11, 2026

Follow-up with live deployment validation:

  • the live sandbox policy still reports landlock.compatibility: best_effort
  • baseline before swap: openshell sandbox exec --name <sandbox> -- sh -lc 'id; cat /proc/self/attr/current' returned uid=998(sandbox) and label=unconfined
  • I then copied the locally built pr-810 openshell-sandbox binary into the running cluster container, restarted the supervisor, and waited for the sandbox to return to Ready
  • post-swap, the same live probe was unchanged: uid=998(sandbox), label=unconfined

So in this real deployment, I still do not see Landlock enforcement taking effect with the PR binary, even though the branch builds cleanly and the Landlock-focused Rust tests pass locally here.

I rolled the cluster back to the stock supervisor immediately after the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(sandbox): Landlock not enforced on Landlock-capable kernels — best_effort silently returns Ok()

2 participants