fix: update DESIGN.md snapshot restore flow to reflect post-resume MMDS ordering#413
Closed
claude-claude[bot] wants to merge 2 commits intofix-output-pipeline-restorefrom
Closed
fix: update DESIGN.md snapshot restore flow to reflect post-resume MMDS ordering#413claude-claude[bot] wants to merge 2 commits intofix-output-pipeline-restorefrom
claude-claude[bot] wants to merge 2 commits intofix-output-pipeline-restorefrom
Conversation
Three bugs caused container stdout to not reach the host after snapshot restore: 1. put_mmds before VM resume: Firecracker accepts PUT /mmds while the VM is paused but the guest-visible data isn't updated. fc-agent never sees the new restore-epoch so it never reconnects. Fix: move put_mmds after resume. 2. Unconditional notify_one: cmd_snapshot_run fired output_reconnect.notify_one() for all snapshot types. For pre-start snapshots (container not yet running), the listener has no dead connection to drop. The stored Notify permit poisons the first real connection by immediately triggering the reconnect branch in select!, dropping a valid connection. Fix: only notify for startup snapshots. 3. Listener stuck on stale connection: fc-agent reconnects multiple times during snapshot create/restore cycles. Each reconnect creates a new vsock connection queued in the listener's accept backlog. The listener accepted the FIRST connection and blocked on read_line, while fc-agent wrote to the LATEST connection. Fix: race read_line against listener.accept() in tokio::select!, always switching to the newest connection. Also adds output verification to test_heavy_output_after_snapshot_restore to catch this class of bug — the test previously only checked health and exec, not that container output actually reached the host.
…DS ordering
Move "Update MMDS with new config" from Identity Patching (step 3) to a
new step 6 after Resume VM (step 5), matching the code change in
restore_from_snapshot() where put_mmds was moved after patch_vm_state("Resumed").
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2 tasks
2b6aa60 to
854ff7a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-Fix for PR #412
Issues Fixed
Changes
DESIGN.md: Moved "Update MMDS with new config" from the Identity Patching bullet list to a new step 6 after "Resume VM" (step 5), with an explanatory note that guest-visible MMDS data isn't updated until the VM is running.Generated by Claude | Review Run