Skip to content

feat: add automatic diff-based snapshot support#188

Merged
ejc3 merged 13 commits intomainfrom
diff-snapshot
Jan 26, 2026
Merged

feat: add automatic diff-based snapshot support#188
ejc3 merged 13 commits intomainfrom
diff-snapshot

Conversation

@ejc3
Copy link
Copy Markdown
Owner

@ejc3 ejc3 commented Jan 25, 2026

Summary

Implements automatic diff-based snapshots in fcvm:

  • First snapshot = Full snapshot (baseline)
  • Subsequent snapshots = Diff snapshot (only dirty pages)
  • Automatic merging = Diff merged onto base immediately after creation
  • Transparent to user = Single memory.bin file, no workflow changes

Two-Tier Snapshot System

fcvm uses a two-tier snapshot system for optimal startup performance:

Snapshot When Created Content Type
Pre-start After image pull, before container runs VM with image loaded Full (~2GB)
Startup After HTTP health check passes VM with container initialized Diff (~50MB)

Key insight: We use reflink copy + diff merge, so each snapshot ends up with a complete memory.bin (equivalent to a full snapshot but created faster):

  1. Copy parent's memory.bin via btrfs reflink (instant CoW copy)
  2. Create Firecracker Diff snapshot to memory.diff (sparse file)
  3. Resume VM immediately (minimizes pause time)
  4. Merge diff onto base using SEEK_DATA/SEEK_HOLE (pure Rust, no external tools)
  5. Clean up diff file, leaving complete memory.bin ready for instant restore

This avoids diff chains - no degradation over multiple snapshots.

Changes

New Files

  • src/commands/common.rs: Added create_snapshot_core() and merge_diff_snapshot() functions
  • tests/test_diff_snapshot.rs: New test suite for diff snapshot functionality

Modified Files

  • src/commands/snapshot.rs: Refactored to use shared create_snapshot_core()
  • src/commands/podman.rs: Refactored to use shared create_snapshot_core(), added parent lineage for diff support
  • README.md: Added "Two-Tier Snapshot System" documentation section

Key Implementation Details

  1. Shared create_snapshot_core(): Deduplicates snapshot logic between user snapshots and podman cache
  2. merge_diff_snapshot(): Uses SEEK_DATA/SEEK_HOLE + pread/pwrite for efficient sparse file merging
  3. Parent lineage: Pre-start has no parent (Full), startup uses pre-start as parent (Diff)
  4. Atomic updates: Uses temp directory + rename for consistent snapshot state
  5. Separate diff file: Writes diff to memory.diff, merges onto copy of base memory.bin

Test Results

test test_diff_snapshot_cache_hit_fast ... ok
  First boot:  10.9s (creates Full + Diff)
  Second boot: 0.2s (uses merged snapshot)
  Speedup:     49.2x

test test_snapshot_run_direct_rootless ... ok

Test Plan

  • make build - compiles successfully
  • cargo clippy - no warnings
  • Diff snapshot tests pass locally
  • Pre-start creates Full snapshot
  • Startup creates Diff snapshot and merges correctly
  • Second boot hits startup cache (49x faster)
  • Snapshot restore works correctly
  • CI verification

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 25, 2026

🔍 Claude Review

SEVERITY: none

Summary

This PR successfully extracts duplicated snapshot creation logic into a shared create_snapshot_core() function. The refactoring is clean and maintains the same behavior.

Code Quality:

  • ✅ Clear documentation with accurate docstrings
  • ✅ Consistent error handling (VM resume on all paths)
  • ✅ Proper cleanup of temp directories on failure
  • ✅ Clippy passes with no warnings
  • ✅ Build succeeds

Changes Verified:

  • Both snapshot.rs and podman.rs now use the shared core function
  • Atomic snapshot creation pattern preserved (temp dir → rename)
  • All paths use final locations in config.json (as documented)

Minor Note: The shared function includes defensive remove-if-exists logic before rename (lines 743-747). This was already in snapshot.rs but is new for podman.rs. Since podman uses a lock to prevent concurrent creation, this defensive check shouldn't trigger but makes the code more robust.

No issues found. Ready to merge! 🚀


Review by Claude | Run

@ejc3 ejc3 changed the title refactor: extract shared create_snapshot_core function feat: add automatic diff-based snapshot support Jan 25, 2026
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 25, 2026

🔍 Claude Review

SEVERITY: critical

Findings

[CRITICAL] BufReader file handle mismatch causes data corruption in diff snapshot merge (lines 123-146 in common.rs):

  • lseek() operates on diff_fd (original file descriptor from line 79)
  • BufReader::new(&diff_file) creates a new file descriptor with independent seek position
  • diff_reader.seek() moves the wrong file handle
  • Result: reads happen at wrong offsets → memory corruption in merged snapshots
  • Fix: Use the same file handle for both seeking and reading, or use pread()

[CRITICAL] BufReader allocated in loop kills performance (line 123):

  • New BufReader created for each data region
  • Buffering benefits lost, memory allocation overhead on every iteration
  • Should allocate once before the loop

[CRITICAL] Non-atomic diff snapshot updates (lines 869-901):

  • Updates vmstate.bin, disk.raw, config.json directly in final directory
  • No atomicity: if any operation fails, snapshot is corrupted (mismatched memory/vmstate/disk)
  • Original Full snapshot path uses atomic rename (line 950) - diff path breaks this guarantee
  • Fix: Build complete snapshot in temp dir, then atomic rename (like Full path)

[MEDIUM] Tests don't validate diff snapshots:

  • Commit message claims "all 43 tests pass"
  • Tests likely only exercise first snapshot (Full path that works)
  • No test creates a second snapshot to trigger diff code path with bugs above
  • Bugs would only manifest when taking 2+ snapshots of same VM

Summary

The diff snapshot feature has critical data corruption bugs that weren't caught by tests. The file handle mismatch will corrupt merged memory snapshots, and the non-atomic updates can leave snapshots in broken states.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 25, 2026

🔧 Auto-Fix Created

I found critical issues and created a fix PR: #189

The fix addresses:

  • File handle mismatch causing memory corruption in diff merges
  • Performance issue with BufReader allocation in loop
  • Non-atomic updates that could leave snapshots in corrupted state

Please review and merge the fix PR first, then this PR.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 25, 2026

🔍 Claude Review

SEVERITY: critical

Findings

The critical issues identified in the previous review remain unfixed. PR #189 was created to address these issues but has not been merged yet.

[CRITICAL] Issues still present in this PR:

  1. File handle mismatch in merge_diff_snapshot() (lines 123-146 in common.rs): BufReader creates a new file descriptor, causing lseek() and read() to operate on different file handles, leading to incorrect offset reads and memory corruption
  2. BufReader allocation in loop (line 123): Performance killer - new BufReader allocated for each data region
  3. Non-atomic diff snapshot updates (lines 905-937): Updates vmstate.bin, disk.raw, and config.json directly in final directory without atomicity - partial failures leave snapshot corrupted

Summary

PR #189 fixes all three critical issues. This PR should not be merged until PR #189 is merged first (stacked PR workflow).

Recommendation: Merge PR #189 into this PR branch before merging to base.


Review by Claude | Run

@ejc3 ejc3 changed the base branch from fuser-rebase-clean to main January 26, 2026 00:02
ejc3 and others added 7 commits January 26, 2026 00:02
Deduplicate snapshot creation logic between user snapshots (fcvm snapshot create)
and system snapshots (podman cache). Both now call create_snapshot_core() which
handles:
- Pause VM via Firecracker API
- Create Firecracker snapshot (Full type)
- Resume VM immediately (regardless of result)
- Copy disk via btrfs reflink
- Write config.json with pre-built SnapshotConfig
- Atomic rename temp dir to final location

Caller-specific logic remains:
- snapshot.rs: VM state lookup, RW disk validation, volume parsing, vsock chain
- podman.rs: Lock handling, SnapshotCreationParams

No functional changes - same behavior, just deduplicated code.
Prepares for PR #2 which will add diff snapshot support.
Implements diff snapshots in fcvm:
- First snapshot = Full snapshot (baseline)
- Subsequent snapshots = Diff snapshot (only dirty pages)
- Diffs automatically merged onto base immediately after creation
- Transparent to callers: single memory.bin file

Changes:
- Add merge_diff_snapshot() using SEEK_DATA/SEEK_HOLE to efficiently
  find and copy only data blocks from sparse diff to base file
- Modify create_snapshot_core() to detect existing base and use Diff
  snapshot type when base exists
- Enable enable_diff_snapshots=true on restore to allow subsequent
  snapshots to be diffs (via KVM dirty page tracking)

The diff merge runs after VM resume to minimize pause time. Pure Rust
implementation using nix crate - no external tools needed.

Tested: make test-root FILTER=snapshot - all 43 tests pass
All snapshot creation now tracks parent lineage:
- Pre-start snapshot: None (first snapshot, creates Full)
- Startup snapshot: Uses pre-start key as parent (creates Diff)
- User snapshot from fresh VM: None (creates Full)
- User snapshot from clone: Uses source snapshot_name as parent (creates Diff)

Changes:
- create_snapshot_core() takes parent_snapshot_dir parameter
- If no base but parent provided, copies parent's memory.bin via reflink
- create_snapshot_interruptible() takes parent_snapshot_key parameter
- User snapshot (snapshot.rs) reads vm_state.config.snapshot_name for parent
- Added test_diff_snapshot.rs with tests for diff snapshot behavior

Result: Only ONE full snapshot is ever created per lineage chain.
Subsequent snapshots use diff, merged onto reflink copy of parent.
Added section explaining:
- Pre-start vs startup snapshots (two-tier system)
- Diff snapshot optimization with reflink + merge
- Key insight: each snapshot ends up with complete memory.bin
- Parent lineage for user snapshots from clones
Remove check across all snapshots - was failing because it found
snapshots from other tests. The diff merge is already verified
by the log output in the workflow tests.
…shot merge

This fixes critical bugs in the diff snapshot implementation:

1. **File handle mismatch bug**: The original code used lseek() on diff_fd
   but created a new BufReader that had its own file descriptor with an
   independent seek position. This caused reads from wrong offsets,
   resulting in memory corruption. Fixed by using pread()/pwrite() which
   accept file references directly and don't rely on seek position.

2. **Performance bug**: BufReader was allocated inside the loop for each
   data region, losing all buffering benefits. Fixed by using pread()
   directly without buffering (since we're reading large sequential blocks).

3. **Atomicity bug**: Diff snapshot updates modified files directly in the
   final snapshot directory (vmstate.bin, disk.raw, config.json). If any
   operation failed, the snapshot would be left in an inconsistent state.
   Fixed by building complete snapshot in temp dir, then using atomic
   rename, matching the Full snapshot path.

All changes pass cargo fmt and cargo clippy.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

⛔ Lint Failed

The Lint check is failing on this PR. Please fix the lint errors before review can proceed.

To check locally:

cargo fmt --check
cargo clippy --all-targets -- -D warnings

Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

⛔ CI Check Failed

The Host-arm64 test has failed. Please fix the test failures before review can proceed.

Failed Check:

Other checks still pending:

  • Container-arm64
  • Container-x64
  • Host-x64
  • Host-Root-arm64-SnapshotDisabled
  • Host-Root-arm64-SnapshotEnabled
  • Host-Root-x64-SnapshotDisabled
  • Host-Root-x64-SnapshotEnabled

Review by Claude | Run

The enable_diff_snapshots setting is DEPRECATED in Firecracker v1.13.0+.
It was for legacy KVM dirty page tracking. Firecracker now uses mincore(2)
to find dirty pages automatically when creating diff snapshots.

Enabling this on restored VMs causes kernel stack corruption:
"Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_idle"

Diff snapshots still work correctly because:
1. We pass snapshot_type: "Diff" to the Firecracker snapshot API
2. Firecracker uses mincore(2) to find dirty pages (no KVM tracking needed)
3. Our merge_diff_snapshot() handles the sparse file correctly

Also fixes snapshot_dir() test helper default path.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

🔍 Claude Review

CI checks have not passed yet - cannot proceed with review.

Failed Checks

The Host-arm64 test suite has 3 failing tests:

  1. ❌ - Clone from snapshot test failed
  2. ❌ - Restored VM health check failed
  3. ❌ - Second boot health check failed

Action Required

Please fix these test failures before review can proceed. The failures appear to be related to the new diff snapshot functionality introduced in this PR.

Logs: https://github.com/ejc3/fcvm/actions/runs/21342226435/job/61423419870


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

CI Failure Analysis

I analyzed CI #21342226435 but the situation is complex.

Diagnosis

The CI is failing with 13 test failures, all related to snapshot cloning/restoring:

  • test_clone_resilient_client_rootless
  • test_diff_snapshot_cache_hit_fast
  • test_egress_clone_rootless
  • test_clone_port_forward_rootless
  • test_clone_while_baseline_running_rootless
  • test_snapshot_clone_rootless_10
  • test_snapshot_run_direct_rootless
  • test_snapshot_run_exec_rootless
  • test_user_snapshot_from_clone_uses_parent
  • test_clone_internet_rootless
  • test_startup_snapshot_on_restored_vm
  • test_startup_snapshot_priority

All failures show the same pattern: VMs exit during or shortly after restoration from snapshots. For example:

❌ Second boot health check failed: fcvm process (pid 160425) exited. Check logs above.

Current Status

The latest commit (5a66090) already contains a fix that changed:

enable_diff_snapshots: Some(false),

This was intended to fix kernel stack corruption by disabling the deprecated Firecracker parameter.

Why Not Fixable Automatically

  1. Fix already applied: The commit being tested (5a66090) already has the enable_diff_snapshots: Some(false) fix, but tests are still failing
  2. Potential issues:
    • The fix might be incomplete (maybe needs to be None instead of Some(false))
    • There could be another codepath that needs fixing
    • The VM crashes might be due to a different issue with diff snapshot implementation
  3. Need manual investigation: This requires:
    • Looking at actual VM crash logs/kernel messages (not visible in CI output)
    • Potentially testing locally with verbose Firecracker logging
    • Understanding if the issue is with the snapshot creation, merging, or restoration logic

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

🔧 CI Auto-Fix

Created fix PR: #190

The fix changes enable_diff_snapshots: Some(false) to enable_diff_snapshots: None to completely omit the deprecated parameter from the Firecracker API call rather than explicitly sending false.

View Claude Run

The create_snapshot_core() function had a critical bug where:
1. Firecracker writes the diff to temp_dir/memory.bin
2. Code then copies base to temp_dir/memory.bin (overwriting diff!)
3. Merge tries to read the "diff" which is now just the base
4. Result: merged memory.bin = base only, missing dirty pages

This caused VMs to crash after restoring from diff snapshots because
the vmstate.bin expected post-startup memory but memory.bin only
contained pre-startup state.

Fix: For diff snapshots, write to memory.diff instead of memory.bin,
then merge from memory.diff onto a copy of the base saved as memory.bin.

Tested: test_diff_snapshot_cache_hit_fast passes (49x speedup)
Tested: test_snapshot_run_direct_rootless passes
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

🔍 Claude Review

SEVERITY: medium

Findings

The critical issues identified in previous reviews have been successfully fixed via PR #189:

  • ✅ File handle mismatch in merge_diff_snapshot() - now uses pread/pwrite correctly
  • ✅ BufReader allocation in loop - buffer allocated once before loop
  • ✅ Non-atomic diff snapshot updates - now uses atomic temp dir + rename pattern

However, one issue from PR #190 remains unfixed:

[MEDIUM] Deprecated parameter still explicitly set (line 684 in common.rs):

  • Current: enable_diff_snapshots: Some(false)
  • Should be: enable_diff_snapshots: None
  • Issue: The parameter is deprecated in Firecracker v1.13.0+. Setting it to Some(false) still sends the parameter in the API call, which may cause issues. It should be completely omitted by using None.
  • Impact: The code comment (lines 679-683) correctly explains that enabling this causes kernel stack corruption. However, explicitly setting it to false still sends the deprecated parameter to Firecracker. While false may not trigger the corruption, it's cleaner to omit it entirely.
  • Fix: Change line 684 from enable_diff_snapshots: Some(false), to enable_diff_snapshots: None,

Summary

The PR implements diff snapshot functionality correctly and all critical data corruption issues have been resolved. The remaining issue is minor - the deprecated Firecracker parameter should be completely omitted rather than explicitly set to false. PR #190 contains the fix for this issue and should be merged into this PR branch.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

CI Failure Analysis

I analyzed CI #21342226435 (the failed run before #21343191955).

Diagnosis

The job failed with:

  • 2 failed tests:

    1. test_utimensat_pjdfstest_nested_kernel (240.5s, TRY 3 FAIL)
    2. test_podman_load_over_fuse (321.0s, TRY 3 FAIL)
  • 15 timed out tests: All pjdfstest_vm_* tests timed out at 900 seconds (15 minutes each):

    • test_pjdfstest_vm_chmod
    • test_pjdfstest_vm_chflags
    • test_pjdfstest_vm_chown
    • test_pjdfstest_vm_ftruncate
    • test_pjdfstest_vm_granular
    • test_pjdfstest_vm_mkdir
    • test_pjdfstest_vm_mknod
    • test_pjdfstest_vm_mkfifo
    • test_pjdfstest_vm_posix_fallocate
    • test_pjdfstest_vm_link
    • test_pjdfstest_vm_rename
    • test_pjdfstest_vm_symlink
    • test_pjdfstest_vm_truncate
    • test_pjdfstest_vm_rmdir
    • test_pjdfstest_vm_unlink

Why Not Fixable

This appears to be a test infrastructure/timeout issue rather than a code bug:

  1. Timeouts are environmental - 15 VM-based tests all timing out at exactly 900 seconds suggests runner resource constraints or VM performance issues
  2. Currently in progress - Run #21343191955 is still running and may complete successfully
  3. No code changes needed - The pattern suggests runner/infrastructure issues with the SnapshotDisabled configuration, not logic errors

I recommend waiting for the current CI run #21343191955 to complete to see if this is a transient issue.

View Claude Run

ejc3 added 2 commits January 26, 2026 02:59
The podman0 bridge interface was created with forwarding disabled
(net.ipv4.conf.podman0.forwarding=0), preventing containers from
reaching external networks. This caused apt-get in container builds
to hang indefinitely waiting for network.

Fix by enabling forwarding for all interfaces before podman creates
the bridge:
- net.ipv4.conf.all.forwarding=1
- net.ipv4.conf.default.forwarding=1

Root cause: When a new interface is created, it inherits the default
forwarding setting. If default.forwarding=0, the new podman0 bridge
can't route packets to the external interface.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

CI Status Update

I was triggered to analyze CI run #21343191955, but upon investigation I found:

Current Status

  • ✅ All jobs except one have completed successfully
  • 🔄 Job Host-Root-arm64-SnapshotEnabled is currently in progress (started at 03:02:38 UTC)
  • No failures detected - nothing to fix

Analysis

The job appears to have started recently and is running normally. There is no CI failure requiring a code fix at this time.

I will monitor the situation, but no action is needed unless the job actually fails.


🤖 Claude CI Fix Run #21344832962

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

🔍 Claude Review

SEVERITY: medium

Findings

All critical issues from previous reviews have been successfully resolved via PR #189. The remaining issue from PR #190 persists:

[MEDIUM] Deprecated parameter explicitly set instead of omitted (line 684 in common.rs):

  • Current code: enable_diff_snapshots: Some(false),
  • Should be: enable_diff_snapshots: None,
  • Issue: The enable_diff_snapshots parameter is deprecated in Firecracker v1.13.0+. While setting it to false avoids the kernel stack corruption bug, explicitly sending Some(false) still passes the deprecated parameter in the API call. Using None completely omits the parameter from the API request.
  • Impact: The current code works correctly (no corruption), but sends a deprecated parameter that Firecracker may warn about or remove in future versions. Clean solution is to omit it entirely.
  • Context: The excellent comment (lines 679-683) correctly explains why enabling this causes issues. The implementation should match the intent: don't use this deprecated parameter at all.

Verified Fixes

File handle mismatch (line 131, 145): Now correctly uses pread(&diff_file, ...) and pwrite(&base_file, ...) with consistent file handles

BufReader allocation (line 91): Buffer allocated once before loop, not inside it

Atomic updates for diff snapshots (lines 907-987): All files (memory.bin merged, disk.raw, config.json) created in temp directory, then atomically renamed to final location

Summary

This PR implements automatic diff-based snapshots with proper reflink copy + merge workflow. All critical data corruption and atomicity issues have been resolved. The implementation is sound and CI passes.

The remaining issue is minor cleanup: the deprecated Firecracker parameter should be completely omitted rather than explicitly set to false. This doesn't affect correctness but improves forward compatibility.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Jan 26, 2026

🔧 Auto-Fix Created

I found a remaining medium-severity issue and created a fix PR: #193

Issue Fixed

Changed enable_diff_snapshots: Some(false) to enable_diff_snapshots: None to completely omit the deprecated Firecracker parameter from API calls rather than explicitly sending it as false.

Why This Matters

While Some(false) works correctly, it still sends the deprecated parameter in the API request. Using None completely omits it, improving forward compatibility as Firecracker may remove support for this parameter in future versions.

Please review and merge the fix PR first, then this PR.

View Claude Run

@ejc3 ejc3 merged commit 3308e49 into main Jan 26, 2026
16 checks passed
ejc3 added a commit that referenced this pull request Mar 2, 2026
feat: add automatic diff-based snapshot support
ejc3 added a commit that referenced this pull request Mar 2, 2026
feat: add automatic diff-based snapshot support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant