Skip to content

_ipc: fix AF_UNIX bind/chmod TOCTOU + symlink-pre-plant race (#298)#299

Open
web-dev0521 wants to merge 2 commits intobrowser-use:mainfrom
web-dev0521:fix/ipc-socket-bind-perms-toctou
Open

_ipc: fix AF_UNIX bind/chmod TOCTOU + symlink-pre-plant race (#298)#299
web-dev0521 wants to merge 2 commits intobrowser-use:mainfrom
web-dev0521:fix/ipc-socket-bind-perms-toctou

Conversation

@web-dev0521
Copy link
Copy Markdown

@web-dev0521 web-dev0521 commented May 5, 2026

Summary

Closes #298 — TOCTOU + symlink-pre-plant races on the daemon's AF_UNIX socket.

  • Bind perms TOCTOU (primary). asyncio.start_unix_server used to bind with the inherited umask, leaving the socket world-accessible (mode 0o777 if umask=0) until the follow-up os.chmod(0o600) ran. Wrapping the bind in os.umask(0o077) (try/finally) guarantees the socket is created mode 0o700 from the kernel side; the existing chmod stays as defense-in-depth.
  • Symlink-pre-plant race (secondary). os.path.exists() followed symlinks, so a pre-planted dangling symlink at the socket path skipped the unlink and let bind() follow it kernel-side. The fix moves the socket into a 0o700 private parent directory (<tmp>/bu-<NAME>.d/sock) created/verified by _ensure_private_dir (lstat-based, refuses symlinks / non-dirs / wrong-uid), and replaces os.path.exists() + unlink with an unconditional os.unlink (which never follows symlinks).

Net effect: once _ensure_private_dir returns, only our uid can place anything inside the parent dir, so the subsequent unlink+bind on the socket is race-free; and even before that, umask ensures no on-disk permission window.

Changes

  • src/browser_harness/_ipc.py

    • Added stat import.
    • New _sock_dir(name)<tmp>/bu-<NAME>.d/. _sock_path now returns <sock_dir>/sock (was <tmp>/bu-<NAME>.sock).
    • New _ensure_private_dir(p): mkdir(0o700), else lstat-verify it's a real dir owned by us and tighten loose perms; refuse on symlink / non-dir / foreign uid.
    • serve() POSIX branch rewritten in 3 explicit steps: ensure private parent → unconditional os.unlink (catch FileNotFoundError) → os.umask(0o077) around start_unix_serverchmod(0o600).
  • tests/unit/test_ipc.py — 6 new regression tests (POSIX-only via skipif):

    • _ensure_private_dir: creates 0o700, tightens loose perms, refuses symlinks, refuses non-directory.
    • serve socket bound under umask=0 ends up mode 0o600 with no looser intermediate state.
    • serve unlinks a stale dangling symlink at the socket path (the case os.path.exists used to silently pass).

Compatibility notes

  • The on-disk path of the socket changes from /tmp/bu-<NAME>.sock to /tmp/bu-<NAME>.d/sock. All readers of the socket path go through _ipc._sock_path() (verified: connect, cleanup_endpoint, sock_addr — daemon, admin, helpers all route through these), so the change is internal. Stale .sock files left over from pre-upgrade daemons are harmless leftover bytes.
  • AF_UNIX sun_path length budget is unaffected: the new layout adds 2 chars (e.g. /tmp/bu-default.d/sock = 22 chars vs. 20), well under the 104/108 macOS/Linux limit. The BH_TMP_DIR-isolated case (bare bu stem) becomes <BH_TMP_DIR>/bu.d/sock, still safe.
  • Windows TCP-loopback path is untouched; token-based auth there is unchanged.

Test plan

  • uv run --with pytest pytest tests/unit/test_ipc.py -v — 16/16 passing (10 existing + 6 new).
  • uv run --with pytest pytest tests/unit/ -v — 72/72 passing (no regressions in admin/daemon/helpers/run suites that share the _ipc paths).
  • Manual smoke: start daemon under umask 0, confirm /tmp/bu-<NAME>.d is drwx------ and /tmp/bu-<NAME>.d/sock is srw------- from the moment it appears.
  • Manual smoke: pre-plant a dangling symlink at /tmp/bu-<NAME>.d/sock after the dir exists; confirm the daemon starts cleanly, the symlink is removed, and the bound path is a real socket.
  • macOS smoke (sun_path budget + /var/folders default): confirm no OSError: AF_UNIX path too long.
  • Optional: strace -e trace=bind,chmod,umask to confirm the bind syscall happens with the tightened umask in effect (no observable wider-mode window).

Summary by cubic

Secure AF_UNIX daemon socket binding by creating it in a private 0o700 dir and tightening permissions during bind to remove TOCTOU and symlink-pre-plant races. Also fixes daemon discovery to match the new socket layout; Windows TCP loopback is unchanged.

  • Bug Fixes

    • Wrap asyncio.start_unix_server with umask 0o077, then chmod 0o600, closing the bind→chmod TOCTOU.
    • Use a 0o700 private parent dir (verified via lstat); refuse symlinks/non-dirs/foreign uid; always unlink before bind to defeat pre-planted symlinks.
    • Fix admin._daemon_endpoint_names() to discover POSIX endpoints at the new layout (require inner sock; ignore empty .d/); Windows .port discovery unchanged.
    • _ipc.cleanup_endpoint() now also rmdirs the .d/ wrapper (best-effort). Added POSIX-only regression tests, a macOS-safe short-/tmp fixture, and a discovery test tied to _ipc._sock_path().
  • Migration

    • Socket path is now <tmp>/bu-<NAME>.d/sock. All call sites use _ipc._sock_path(), so no action needed; old .sock files are harmless. Windows unchanged.

Written for commit aba1efc. Summary will update on new commits.

@web-dev0521
Copy link
Copy Markdown
Author

Hi, @sauravpanda ,
This is my first contribution.
Could you please review my PR?
Thanks.

@web-dev0521 web-dev0521 changed the title serve: bind AF_UNIX in 0o700 private dir; umask during bind closes ch… _ipc: fix AF_UNIX bind/chmod TOCTOU + symlink-pre-plant race (#298) May 5, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@web-dev0521
Copy link
Copy Markdown
Author

Hello @sauravpanda ,
Could you please review my PR?
I would appreciate your feedback.
Thanks.

The PR moves the daemon socket from <_TMP>/bu-NAME.sock to
<_TMP>/bu-NAME.d/sock to close a symlink-pre-plant race, but
admin._daemon_endpoint_names() still globbed bu-*.sock — so post-merge,
daemon discovery silently returned []. Existing test_admin.py only
exercised the glob with hand-crafted .sock files, so the unit suite
stayed green while the lifecycle was broken.

- admin._daemon_endpoint_names: discover via 'bu-*.d/sock' on POSIX
  (require the inner sock so an empty .d/ from a half-cleanup doesn't
  count as a live daemon). Windows .port path unchanged.
- _ipc.cleanup_endpoint: also rmdir the .d/ wrapper so /tmp doesn't
  fill with stale dirs. ENOTEMPTY/race both leave nothing to do.
- test_admin.py: update fixtures to the new layout, plus a regression
  guard that anchors on _ipc._sock_path() to catch future drift.
- test_ipc.py: add a short_tmp fixture (mkdtemp under /tmp). pytest's
  tmp_path on macOS goes under /private/var/folders/... and busts the
  104-byte sun_path budget once we append bu-NAME.d/sock; the two new
  serve()/symlink tests fail without this. Linux's 108-byte limit was
  forgiving enough that the original tests happened to pass there.
@sauravpanda
Copy link
Copy Markdown
Collaborator

btw are you a bot?

@web-dev0521
Copy link
Copy Markdown
Author

Hi, @sauravpanda ,
No, I am not a bot.

@web-dev0521
Copy link
Copy Markdown
Author

btw are you a bot?

How can I contact you?
What do I have to fix now?
Please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: TOCTOU window between bind() and chmod(0o600) on daemon AF_UNIX socket

2 participants