Fix bridged subnet collision and clean up global NAT#283
Conversation
🔍 Claude ReviewSEVERITY: medium Findings[MEDIUM] Substring IP matching in The collision check uses [MEDIUM] When the last bridged VM exits, [MEDIUM] TOCTOU race in Between checking for [LOW] The same subnet-to-IP mapping logic exists in both SummaryThe collision detection feature has a substring matching bug that can cause false positives, and the NAT cleanup has two safety issues: unconditionally disabling IP forwarding can break other host services, and a TOCTOU race can break concurrently-starting VMs. Review by Claude | Run |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac08d08f0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let stdout = String::from_utf8_lossy(&output.stdout); | ||
| for line in stdout.lines() { | ||
| if line.contains(ip) && line.contains("veth0-") { | ||
| return true; |
There was a problem hiding this comment.
Match exact IPs when checking veth collisions
The collision check uses line.contains(ip) on ip -o addr show output, which can produce false positives because one IP string can be a prefix of another (e.g., 172.30.0.1 matches 172.30.0.17/30). That will cause free subnets to be skipped and can make subnet allocation fail after enough attempts even though the exact host IP is unused; it also breaks deterministic reuse for cache restores. Consider parsing the IPv4 address field (e.g., ip -o -4 addr show) and comparing exact tokens instead of substring matching.
Useful? React with 👍 / 👎.
🔧 Auto-Fix CreatedI found issues and created a fix PR: #284 Please review and merge the fix PR first, then this PR. |
c37e122 to
bbd9f99
Compare
🔍 Claude ReviewSEVERITY: medium Previous Findings (still unfixed)The following issues from a previous review remain unfixed (fix PR #284 is still open):
New Findings[MEDIUM] Full snapshot path still has non-atomic remove+rename gap (common.rs:1036-1043) The diff snapshot path (lines 982-995) was correctly fixed to use rename-swap (move old to tokio::fs::remove_dir_all(snapshot_dir).await?; // old gone
tokio::fs::rename(&temp_snapshot_dir, snapshot_dir).await?; // new not yet in placeIf the process crashes between SummaryThree previously-reported MEDIUM issues remain unfixed (pending merge of #284). One new MEDIUM issue: the full snapshot path was not given the same atomic-replace treatment as the diff snapshot path, leaving a window where a crash can lose snapshot data. Review by Claude | Run |
🔧 Auto-Fix CreatedI found issues and created a fix PR: #287 This fixes:
This supersedes #284 which covered items 1 and 2. Please review and merge the fix PR first, then this PR. |
🔍 Claude ReviewSEVERITY: low Previous FindingsMost issues from previous reviews (first review, second review) have been fixed by the merged PR #287:
The [MEDIUM] TOCTOU race in New Finding[LOW] Stale docstring on The doc comment still says SummaryAll previously-reported MEDIUM issues are fixed (or have low practical impact). Only a stale docstring remains as a minor inconsistency. The PR is in good shape for merge. Review by Claude | Run |
🔧 Auto-Fix CreatedI found a minor issue and created a fix PR: #289 This fixes the stale docstring on |
c86543e to
8125ce6
Compare
🔍 Claude ReviewSEVERITY: low Previous FindingsAll MEDIUM issues from previous reviews have been resolved:
The [MEDIUM] TOCTOU race in No New FindingsThe PR is in good shape. All substantive issues have been addressed. No new MEDIUM or CRITICAL issues found. SummaryAll previously-reported issues are fixed or have been assessed as low-impact. The PR is ready to merge. Review by Claude | Run |
- Add collision detection to hash-based subnet allocation: check if the derived host IP is already on a live veth, increment and retry if so (capped at 100 attempts) - Add cleanup_global_nat_if_unused() that removes MASQUERADE rules and disables ip_forward when the last bridged VM exits - Extract derive_host_ip() helper for collision check reuse
Rename old snapshot to .old before renaming new into place, instead of remove_dir_all + rename. Eliminates the window where a crash would lose the snapshot entirely.
…tion - Remove std::env::set_var for writeback cache propagation (#26): pass no_writeback_cache flag through mount_vsock_with_options API instead. set_var is unsound in multi-threaded Rust programs. - Bound exec line reader to 1MB (#27): prevents OOM from malicious or malformed exec requests sent over vsock. - Replace bash -c shell injection with direct Command args (#28): TAP device verification now uses ip link show directly instead of through a shell.
- Use exact "inet <ip>/" pattern in is_ip_in_use_on_veth to prevent substring false positives (e.g., 10.1.1.1 matching 10.1.1.10) - Remove unsafe sysctl ip_forward=0 from cleanup_global_nat_if_unused to avoid breaking Docker/K8s/VPNs on the host - Apply rename-swap technique to full snapshot path (matching diff snapshot path) to prevent data loss on crash between remove and rename 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Bridged networking needs net.ipv4.ip_forward=1, but fcvm shouldn't silently enable a system-wide sysctl. Instead, check and error with instructions if it's disabled. The Makefile setup-btrfs target and CI workflows already enable it as part of host setup. Also fix stale cleanup doc comment (ip_forward=0 removal was in #287).
The doc comment still mentioned "disables ip_forward" but the code intentionally leaves ip_forward enabled after the fix in PR #287. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8125ce6 to
02c6c4b
Compare
🔍 Claude ReviewSEVERITY: low Previous FindingsAll MEDIUM issues from previous reviews have been resolved:
The TOCTOU race in No New FindingsAll changes look correct. The SummaryAll previously-reported substantive issues are fixed. The PR is in good shape for merge. Review by Claude | Run |
Fix bridged subnet collision and clean up global NAT
Fix bridged subnet collision and clean up global NAT
Stacked on: review-raii
Summary
cleanup_global_nat_if_unused()that removes MASQUERADE rules for172.30.0.0/16and10.0.0.0/8, and disablesip_forwardwhen the last bridged VM exits.Test plan