fc-agent: clock sync, ss -K fix, --no-swap cgroup isolation, --no-dirty-tracking#536
fc-agent: clock sync, ss -K fix, --no-swap cgroup isolation, --no-dirty-tracking#536
Conversation
- fc-agent/agent.rs: Start chronyd as root after FUSE mounts, add NTP servers dynamically via chronyc (config parser can't handle bare IPv6), strip xleave option from host chrony.conf - fc-agent/network.rs: Fix ss -K syntax (remove broken [fd00::]/64 filter), fix connection counting to include 10.0.2.* and [fd00: as local - fc-agent/restore.rs: Add clock sync from MMDS + chronyc makestep on snapshot restore, before network reconfiguration - src/cli/args.rs: Add --no-dirty-tracking and --mlock CLI flags - src/commands/common.rs: Add track_dirty_pages and mlock to RestoreParams, add disable_cgroup_swap() via systemctl set-property - src/commands/snapshot.rs: Wire track_dirty_pages and mlock through RestoreParams for snapshot run - src/commands/podman/mod.rs: Wire mlock flag through to snapshot args - DESIGN.md: Add Clone Memory Sharing section documenting file backend, UFFD MISSING+COPY, and proposed UFFD MINOR+CONTINUE architecture - scripts/: Add memory sharing test scripts
- Replace broken mlockall(MCL_FUTURE) with cgroup-based swap control (MCL_FUTURE doesn't survive exec). --no-swap creates a dedicated cgroup under /sys/fs/cgroup/fcvm.slice/ with memory.swap.max=0 and moves the Firecracker process into it. - Wire --no-dirty-tracking through to Firecracker's track_dirty_pages API for clones that don't need subsequent diff snapshots. - Add test_cgroup_swap: verifies process isolation in fcvm.slice, memory.swap.max=0, and separate scopes per VM. - Add test_clone_restore_fixes: integration tests for clock sync after restore, ss -K gateway preservation, --no-swap cgroup creation, and --no-dirty-tracking clone behavior. - Document nextest expression filter syntax in CLAUDE.md. Tested: make test-root FILTER=cgroup_swap STREAM=1 # 2/2 pass make test-root FILTER="-E 'test(/clock_synced|ss_filter|no_swap_creates|no_dirty_tracking/)'" STREAM=1 # 4/4 pass make test-root FILTER=sanity STREAM=1 # 5/5 pass
Add SnapshotFixture to tests/common/mod.rs — encapsulates the baseline→snapshot→serve setup that was duplicated across 4 tests in test_clone_restore_fixes.rs. Each test now creates the fixture in one line and focuses on its actual assertions.
🔍 Claude ReviewSEVERITY: medium Findings[MEDIUM] Hugepage clones regressed: dirty tracking enabled by default breaks 2MB TLB entries The old code in track_dirty_pages: Some(\!restore_config.hugepages),The removed comment explained: "Skip for hugepage VMs because KVM splits 2MB Stage 2 block mappings to 4K for dirty tracking, negating the TLB benefit of hugepages." The new code in let needs_dirty_tracking = if args.startup_snapshot_base_key.is_some() {
true
} else {
\!args.no_dirty_tracking // always true unless CLI flag passed
};This means hugepage clones ( Fix: Add [MEDIUM] The doc comment on [LOW] The struct doc says "Cleans up all processes on drop" but there's no SummaryThe main issue is a performance regression for hugepage clones: the refactoring of Review by Claude | Run |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 632667c44d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if line.starts_with("server ") { | ||
| if let Some(addr) = line.split_whitespace().nth(1) { | ||
| server_addrs.push(addr.to_string()); |
There was a problem hiding this comment.
Parse pool directives when importing host NTP sources
The chrony source import only accepts lines starting with server , so hosts whose /etc/chrony.conf is pool-only (a common default) leave server_addrs empty; chronyd then starts with no upstream sources and the VM stops doing ongoing clock convergence after boot/restore. Please parse both server and pool entries (or reuse the host config directly) so time sync works across host distros.
Useful? React with 👍 / 👎.
| let needs_dirty_tracking = if args.startup_snapshot_base_key.is_some() { | ||
| true // podman path — needs dirty tracking for startup snapshot | ||
| } else { | ||
| !args.no_dirty_tracking // CLI default: on. --no-dirty-tracking: off. |
There was a problem hiding this comment.
Keep dirty tracking disabled for hugepage clone restores
This logic now enables dirty tracking for clones by default unless --no-dirty-tracking is set, but it no longer gates on hugepages; for hugepage snapshots, dirty tracking forces 2MB stage-2 mappings to split into 4KB pages, which removes the intended hugepage performance characteristics and can significantly increase restore/runtime overhead. Preserve a hugepage-specific guard so track_dirty_pages stays off when restoring hugepage VMs unless a caller explicitly requires diff snapshots.
Useful? React with 👍 / 👎.
- Restore hugepage guard: hugepage VMs always disable dirty tracking (KVM splits 2MB Stage 2 block mappings to 4K, negating TLB benefit). The refactoring to --no-dirty-tracking lost this automatic guard. - Fix RestoreParams.track_dirty_pages doc: said "Default: false" but actual default was true. - Fix SnapshotFixture doc: said "on drop" but has no Drop impl.
Hosts using "pool" instead of "server" in chrony.conf (common default on Ubuntu/Debian) left chronyd with no upstream NTP sources.
Two automated reviewers on this repo: - Claude review: posts PR-level comments (gh pr view --json comments) - Codex review: posts PR reviews with inline comments on specific lines (gh api repos/$REPO/pulls/<N>/comments, filter by codex author)
🔍 Claude ReviewSEVERITY: medium Previous Review StatusAll three issues from the previous review (hugepage dirty tracking guard, RestoreParams doc, SnapshotFixture doc) have been fixed in commit 5264b3d. ✅ New Findings[MEDIUM] The CLI help text ( The same claim is repeated in But DESIGN.md (lines 2213-2218) documents the opposite, based on measured results: The File backend row in the table shows 33.6 GB Private_Clean per clone with 1.8 MB Shared — pages are clearly NOT staying shared. The CLI flag description actively misleads users into thinking [MEDIUM] In || peer.starts_with("[fd00:")But the "(", "!", "dst", "127.0.0.0/8",
"and", "!", "dst", "[::1]",
"and", "!", "dst", "10.0.2.0/24", ")"The comment on line 149 claims "fd00:: traffic goes through the gateway anyway (preserved by 10.0.2.0/24 rule)" — this is incorrect. The practical impact is limited since most TCP connections through [LOW] New CLI flags The README has a "Key SummaryThe main issue is a misleading Review by Claude | Run |
🔧 Auto-Fix CreatedI found issues and created a fix PR: #537 Please review and merge the fix PR first, then this PR. |
The test_no_dirty_tracking_clone assertion looked for track_dirty_pages in the Firecracker API body, which isn't logged. Added track_dirty_pages to the snapshot load info line so the test can verify the flag was set.
🔍 Claude ReviewSEVERITY: medium Previous Review Status
New FindingsNo new [MEDIUM] or [CRITICAL] issues found. The code changes are well-structured with good test coverage. [LOW] SnapshotFixture doc example comment still implies Drop behavior Line 1076 of [LOW] Cgroup directories not cleaned up on VM exit
SummaryThe two [MEDIUM] issues from the previous review remain unfixed pending merge of PR #537. No new significant issues found — the clock sync, cgroup swap isolation, and dirty tracking refactoring are solid. The new integration tests ( Review by Claude | Run |
The clock sync + chronyc makestep + gateway ping (3s timeout) added to handle_clone_restore extends the restore from ~0.5s to ~3.5s. The 3s wait before reading the log was barely enough; on loaded CI runners the restore messages hadn't flushed to the log yet.
1443dae to
d69f101
Compare
🔍 Claude ReviewSEVERITY: medium Previous Review Status
New Commits ReviewedTwo new commits since last review: d69f101 — Replace blocking ping with non-blocking ARP ✅ Good fix. 90b5b43 — Poll for restore-complete instead of fixed sleep ✅ Good fix. Replaces New FindingsNo new [MEDIUM] or [CRITICAL] issues. [LOW] Stale comment in serial console test references removed "gateway ping" Line 82 of [LOW] After d69f101, the function adds a STALE neighbor entry via SummaryThe two new commits are well-reasoned fixes: non-blocking ARP eliminates a 3s timeout on every rootless restore, and polling for restore-complete replaces a race-prone fixed sleep. The two [MEDIUM] issues from the second review remain unfixed pending merge of PR #537. Review by Claude | Run |
🔍 Claude ReviewSEVERITY: medium Previous Review StatusNo new commits since last review (HEAD is still 17014b7).
New FindingsNone. No new commits since the previous review. SummaryThe two [MEDIUM] issues from the second review remain the only outstanding items, with a fix PR (#537) already created but not yet merged. Review by Claude | Run |
send_gratuitous_arp() used `ping -c 1 -W 3` to trigger ARP resolution, but ping requires ICMP raw sockets and always failed in rootless mode (pasta doesn't respond to ICMP), burning 3 seconds on every restore. Replace with `arping -c 1 -U -I eth0 <our_ip>` which sends a gratuitous ARP directly at layer 2 without needing ICMP. This broadcasts our MAC→IP mapping to the network (bridge/pasta) instantly. Add iputils-arping to rootfs system packages.
🔍 Claude ReviewSEVERITY: medium Previous Review StatusNo new commits since last review (HEAD is still 9bd8bc4).
New FindingsNone. No new commits since the previous review. SummaryThe two [MEDIUM] issues from the second review remain the only outstanding items, with a fix PR (#537) already created but not yet merged. Review by Claude | Run |
🔧 CI Auto-FixCreated fix PR: #538 Root cause: |
The clone bench's HTTP request through pasta's loopback port forward can get an empty response on heavily loaded CI runners, even though verify_port_forwarding() confirmed the port works moments earlier. Retry up to 3 times with 500ms backoff instead of panicking on the first empty response.
🔍 Claude ReviewSEVERITY: medium Previous Review Status
New Commit Reviewede5fa3b0 — retry HTTP in clone bench for pasta forwarding under load ✅ Adds retry logic (3 attempts, 500ms backoff) to New Findings[LOW] Retry count inconsistency between
SummaryThe new commit is a reasonable fix for the CI failure. The two [MEDIUM] issues from the second review remain the only outstanding items, with a fix PR (#537) already created but not yet merged. PR #538 is now superseded by the author's own fix in e5fa3b0. Review by Claude | Run |
fc-agent: clock sync, ss -K fix, --no-swap cgroup isolation, --no-dirty-tracking
fc-agent: clock sync, ss -K fix, --no-swap cgroup isolation, --no-dirty-tracking
Summary
makestep 1 -1for snapshot restore clock jumps. On restore, sync clock from MMDS host time +chronyc makestepbefore network reconfiguration.! dst 127.0.0.1to grouped filter(! dst 127.0.0.0/8 and ! dst [::1] and ! dst 10.0.2.0/24)— preserves gateway and loopback connections after snapshot restore. The old filter killed gateway connections, breaking networking./sys/fs/cgroup/fcvm.slice/fcvm-{pid}.scopewithmemory.swap.max=0and moves Firecracker into it. Replaces brokenmlockall(MCL_FUTURE)which doesn't survive exec. Each VM gets its own cgroup scope (doesn't affect the session).track_dirty_pages=falseto Firecracker for clones that don't need subsequent diff snapshots, keeping file-backed pages shared through the host page cache.tests/common/mod.rs.Test Results