Skip to content

Fix bridged subnet collision and clean up global NAT#283

Merged
ejc3 merged 7 commits intomainfrom
review-networking
Feb 8, 2026
Merged

Fix bridged subnet collision and clean up global NAT#283
ejc3 merged 7 commits intomainfrom
review-networking

Conversation

@ejc3
Copy link
Copy Markdown
Owner

@ejc3 ejc3 commented Feb 7, 2026

Stacked on: review-raii

Summary

  • Add collision detection to hash-based subnet allocation: after hashing the VM ID, check if the derived host IP is already assigned to a live veth interface. If collision detected, increment subnet_id and retry (capped at 100 attempts).
  • Add cleanup_global_nat_if_unused() that removes MASQUERADE rules for 172.30.0.0/16 and 10.0.0.0/8, and disables ip_forward when the last bridged VM exits.

Test plan

cargo check -p fcvm
cargo clippy -p fcvm

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] Substring IP matching in is_ip_in_use_on_veth causes false positives (bridged.rs:41)

The collision check uses line.contains(ip) which matches IPs as substrings. Checking for 10.1.1.1 would match a line containing 10.1.1.10, 10.1.1.100, or 210.1.1.1. This could cause spurious collision retries, and in degenerate cases, exhaust all 100 attempts and fail VM startup. The check should match against the IP with a trailing delimiter (e.g. inet 10.1.1.1/ or word-boundary matching).

[MEDIUM] cleanup_global_nat_if_unused unconditionally disables ip_forward (portmap.rs:386-390)

When the last bridged VM exits, ip_forward is set to 0. If the host runs other services that depend on IP forwarding (Docker, Kubernetes, VPNs, other container runtimes), this will break their networking. The code cannot know whether ip_forward was already enabled before fcvm started. The safe approach is to either: (a) not touch ip_forward during cleanup, or (b) record whether fcvm was the one that enabled it and only disable if so.

[MEDIUM] TOCTOU race in cleanup_global_nat_if_unused (portmap.rs:323-393)

Between checking for veth0- interfaces (line 335) and removing MASQUERADE rules (lines 352-383), another VM could start and set up its networking. The cleanup would then remove the rules that the newly-starting VM just created, breaking its outbound connectivity.

[LOW] derive_host_ip duplicates IP derivation logic from setup()

The same subnet-to-IP mapping logic exists in both derive_host_ip() and the setup() method body (lines 174-185, 224-228). If one is updated without the other, they could diverge silently.

Summary

The collision detection feature has a substring matching bug that can cause false positives, and the NAT cleanup has two safety issues: unconditionally disabling IP forwarding can break other host services, and a TOCTOU race can break concurrently-starting VMs.


Review by Claude | Run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac08d08f0c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/network/bridged.rs
Comment on lines +39 to +42
let stdout = String::from_utf8_lossy(&output.stdout);
for line in stdout.lines() {
if line.contains(ip) && line.contains("veth0-") {
return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match exact IPs when checking veth collisions

The collision check uses line.contains(ip) on ip -o addr show output, which can produce false positives because one IP string can be a prefix of another (e.g., 172.30.0.1 matches 172.30.0.17/30). That will cause free subnets to be skipped and can make subnet allocation fail after enough attempts even though the exact host IP is unused; it also breaks deterministic reuse for cache restores. Consider parsing the IPv4 address field (e.g., ip -o -4 addr show) and comparing exact tokens instead of substring matching.

Useful? React with 👍 / 👎.

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔧 Auto-Fix Created

I found issues and created a fix PR: #284

Please review and merge the fix PR first, then this PR.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔍 Claude Review

SEVERITY: medium

Previous Findings (still unfixed)

The following issues from a previous review remain unfixed (fix PR #284 is still open):

  • [MEDIUM] Substring IP matching in is_ip_in_use_on_vethline.contains(ip) matches IPs as substrings (e.g., 10.1.1.1 matches 10.1.1.10).
  • [MEDIUM] cleanup_global_nat_if_unused unconditionally disables ip_forward — breaks Docker/K8s/VPNs on the host.
  • [MEDIUM] TOCTOU race in cleanup_global_nat_if_unused — another VM can start between the veth check and rule deletion.

New Findings

[MEDIUM] Full snapshot path still has non-atomic remove+rename gap (common.rs:1036-1043)

The diff snapshot path (lines 982-995) was correctly fixed to use rename-swap (move old to .old, then rename temp to final). However, the full snapshot path still uses the unsafe pattern:

tokio::fs::remove_dir_all(snapshot_dir).await?;   // old gone
tokio::fs::rename(&temp_snapshot_dir, snapshot_dir).await?;  // new not yet in place

If the process crashes between remove_dir_all and rename, the snapshot is lost entirely. The same rename-swap technique used for the diff path should be applied here.

Summary

Three previously-reported MEDIUM issues remain unfixed (pending merge of #284). One new MEDIUM issue: the full snapshot path was not given the same atomic-replace treatment as the diff snapshot path, leaving a window where a crash can lose snapshot data.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔧 Auto-Fix Created

I found issues and created a fix PR: #287

This fixes:

  1. Substring IP matching false positives in collision detection
  2. Unsafe ip_forward=0 on NAT cleanup
  3. Non-atomic full snapshot replace (new finding — diff path was fixed but full path was not)

This supersedes #284 which covered items 1 and 2.

Please review and merge the fix PR first, then this PR.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔍 Claude Review

SEVERITY: low

Previous Findings

Most issues from previous reviews (first review, second review) have been fixed by the merged PR #287:

  • ✅ Substring IP matching in collision detection — fixed
  • ✅ Unconditional ip_forward disable — fixed
  • ✅ Non-atomic full snapshot replace — fixed

The [MEDIUM] TOCTOU race in cleanup_global_nat_if_unused noted previously remains, but has low practical impact since ensure_global_nat is called during every VM setup, so any prematurely-deleted MASQUERADE rule would be re-created.

New Finding

[LOW] Stale docstring on cleanup_global_nat_if_unused (portmap.rs:321)

The doc comment still says disables ip_forward but the code (after the PR #287 fix) intentionally does NOT disable ip_forward. The comment should be updated to match the current behavior.

Summary

All previously-reported MEDIUM issues are fixed (or have low practical impact). Only a stale docstring remains as a minor inconsistency. The PR is in good shape for merge.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔧 Auto-Fix Created

I found a minor issue and created a fix PR: #289

This fixes the stale docstring on cleanup_global_nat_if_unused that still mentioned disabling ip_forward.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔍 Claude Review

SEVERITY: low

Previous Findings

All MEDIUM issues from previous reviews have been resolved:

The [MEDIUM] TOCTOU race in cleanup_global_nat_if_unused noted in the first review remains, but as noted in the third review, it has low practical impact since ensure_global_nat is called during every VM setup and would re-create any prematurely-deleted rules.

No New Findings

The PR is in good shape. All substantive issues have been addressed. No new MEDIUM or CRITICAL issues found.

Summary

All previously-reported issues are fixed or have been assessed as low-impact. The PR is ready to merge.


Review by Claude | Run

@ejc3 ejc3 changed the base branch from review-raii to main February 7, 2026 23:49
EJ Campbell and others added 6 commits February 7, 2026 23:50
- Add collision detection to hash-based subnet allocation: check if
  the derived host IP is already on a live veth, increment and retry
  if so (capped at 100 attempts)
- Add cleanup_global_nat_if_unused() that removes MASQUERADE rules
  and disables ip_forward when the last bridged VM exits
- Extract derive_host_ip() helper for collision check reuse
Rename old snapshot to .old before renaming new into place, instead
of remove_dir_all + rename. Eliminates the window where a crash
would lose the snapshot entirely.
…tion

- Remove std::env::set_var for writeback cache propagation (#26): pass
  no_writeback_cache flag through mount_vsock_with_options API instead.
  set_var is unsound in multi-threaded Rust programs.
- Bound exec line reader to 1MB (#27): prevents OOM from malicious or
  malformed exec requests sent over vsock.
- Replace bash -c shell injection with direct Command args (#28): TAP
  device verification now uses ip link show directly instead of through
  a shell.
- Use exact "inet <ip>/" pattern in is_ip_in_use_on_veth to prevent
  substring false positives (e.g., 10.1.1.1 matching 10.1.1.10)
- Remove unsafe sysctl ip_forward=0 from cleanup_global_nat_if_unused
  to avoid breaking Docker/K8s/VPNs on the host
- Apply rename-swap technique to full snapshot path (matching diff
  snapshot path) to prevent data loss on crash between remove and rename

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Bridged networking needs net.ipv4.ip_forward=1, but fcvm shouldn't
silently enable a system-wide sysctl. Instead, check and error with
instructions if it's disabled. The Makefile setup-btrfs target and
CI workflows already enable it as part of host setup.

Also fix stale cleanup doc comment (ip_forward=0 removal was in #287).
The doc comment still mentioned "disables ip_forward" but the code
intentionally leaves ip_forward enabled after the fix in PR #287.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ejc3 ejc3 force-pushed the review-networking branch from 8125ce6 to 02c6c4b Compare February 7, 2026 23:50
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 7, 2026

🔍 Claude Review

SEVERITY: low

Previous Findings

All MEDIUM issues from previous reviews have been resolved:

The TOCTOU race in cleanup_global_nat_if_unused noted in previous reviews remains, but as previously assessed, it has low practical impact since ensure_global_nat is idempotent and called during every VM setup, re-creating any prematurely-deleted rules.

No New Findings

All changes look correct. The set_var removal, unbounded read fix, shell injection fix, collision detection, NAT cleanup, and atomic snapshot replacement are all well-implemented. No new MEDIUM or CRITICAL issues found.

Summary

All previously-reported substantive issues are fixed. The PR is in good shape for merge.


Review by Claude | Run

@ejc3 ejc3 merged commit fc21625 into main Feb 8, 2026
16 checks passed
@ejc3 ejc3 deleted the review-networking branch February 8, 2026 01:15
ejc3 added a commit that referenced this pull request Mar 2, 2026
Fix bridged subnet collision and clean up global NAT
ejc3 added a commit that referenced this pull request Mar 2, 2026
Fix bridged subnet collision and clean up global NAT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant