fix: wait for network restore before exec in snapshot clones#211
Closed
claude-claude[bot] wants to merge 1 commit intofix-slirp-snapshot-restorefrom
Closed
fix: wait for network restore before exec in snapshot clones#211claude-claude[bot] wants to merge 1 commit intofix-slirp-snapshot-restorefrom
claude-claude[bot] wants to merge 1 commit intofix-slirp-snapshot-restorefrom
Conversation
After snapshot restore with slirp4netns, fc-agent needs time to complete
network initialization before container exec will work:
1. fc-agent polls MMDS for restore-epoch every 100ms
2. When detected, it runs handle_clone_restore() which:
- Flushes ARP cache
- Sends gratuitous NDP NA for IPv6
- This announces the VM's MAC to the new slirp4netns process
Without this delay, `fcvm snapshot run --exec` fails with exit code 125
("no such container") because the container's IPv6 networking isn't ready.
The test `test_snapshot_run_exec_rootless` was failing with:
exec_output_found=true (command ran)
exit_success=false (exit code 125)
This fix adds a 300ms delay after vsock socket is ready:
- 100ms max for restore-epoch detection
- 200ms for network setup (ARP flush + NDP NA)
This ensures IPv6 routing is established before trying to exec into
the container, allowing Podman to communicate properly.
Fixes CI test: test_snapshot_run_exec_rootless
Fixes CI test: test_snapshot_run_exec_bridged
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Owner
|
Superseded by PR #217 (http-proxy) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CI Fix
Fixes CI #21605344777
Problem
The test
test_snapshot_run_exec_rootless(and bridged variant) was failing with:exec_output_found=true- command output "EXEC_TEST_SUCCESS" was found ✓exit_success=false- but fcvm exited with code 125 ✗The actual error from the logs:
Exit code 125 = container runtime error (Podman)
Root Cause
After snapshot restore with rootless networking (slirp4netns), there's a timing race:
restore-epochchangeshandle_clone_restore():--exectried to exec immediately after vsock socket was ready (17µs!)The NDP NA announcement is critical for IPv6 DNS to work (slirp4netns routes via fd00::3).
Solution
Add a 300ms delay after vsock socket is ready but before executing container commands:
This ensures IPv6 routing is fully established before attempting container exec.
Testing
This fix specifically addresses the failing tests:
test_snapshot_run_exec_rootless- was failing with exit code 125test_snapshot_run_exec_bridged- likely same issueGenerated by Claude | Fix Run