fix: wait for network restore before exec in snapshot clones by claude-claude[bot] · Pull Request #211 · ejc3/fcvm

claude-claude · 2026-02-03T02:14:35Z

CI Fix

Problem

The test test_snapshot_run_exec_rootless (and bridged variant) was failing with:

exec_output_found=true - command output "EXEC_TEST_SUCCESS" was found ✓
exit_success=false - but fcvm exited with code 125 ✗

The actual error from the logs:

Error: no such container
2026-02-02T20:24:01.402562Z  INFO fcvm::commands::snapshot: exec completed with exit code 125

Exit code 125 = container runtime error (Podman)

Root Cause

After snapshot restore with rootless networking (slirp4netns), there's a timing race:

VM resumes immediately after snapshot restore
fc-agent network restore happens asynchronously:
- Polls MMDS every 100ms for restore-epoch changes
- When detected, runs handle_clone_restore():
  - Flushes ARP cache
  - Sends gratuitous NDP Neighbor Advertisement for IPv6
  - This announces the VM's MAC address to the new slirp4netns process
--exec tried to exec immediately after vsock socket was ready (17µs!)
Container networking not ready → Podman can't communicate → "no such container"

The NDP NA announcement is critical for IPv6 DNS to work (slirp4netns routes via fd00::3).

Solution

Add a 300ms delay after vsock socket is ready but before executing container commands:

100ms max for restore-epoch detection (polls every 100ms)
200ms for network setup (ARP flush + NDP NA transmission)

This ensures IPv6 routing is fully established before attempting container exec.

Testing

This fix specifically addresses the failing tests:

test_snapshot_run_exec_rootless - was failing with exit code 125
test_snapshot_run_exec_bridged - likely same issue

Generated by Claude | Fix Run

After snapshot restore with slirp4netns, fc-agent needs time to complete network initialization before container exec will work: 1. fc-agent polls MMDS for restore-epoch every 100ms 2. When detected, it runs handle_clone_restore() which: - Flushes ARP cache - Sends gratuitous NDP NA for IPv6 - This announces the VM's MAC to the new slirp4netns process Without this delay, `fcvm snapshot run --exec` fails with exit code 125 ("no such container") because the container's IPv6 networking isn't ready. The test `test_snapshot_run_exec_rootless` was failing with: exec_output_found=true (command ran) exit_success=false (exit code 125) This fix adds a 300ms delay after vsock socket is ready: - 100ms max for restore-epoch detection - 200ms for network setup (ARP flush + NDP NA) This ensures IPv6 routing is established before trying to exec into the container, allowing Podman to communicate properly. Fixes CI test: test_snapshot_run_exec_rootless Fixes CI test: test_snapshot_run_exec_bridged 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

ejc3 · 2026-02-03T16:46:00Z

Superseded by PR #217 (http-proxy)

claude-claude bot mentioned this pull request Feb 3, 2026

Add IPv6 support for rootless networking #209

Closed

ejc3 closed this Feb 3, 2026

ejc3 deleted the claude/fix-21613960041 branch February 8, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: wait for network restore before exec in snapshot clones#211

fix: wait for network restore before exec in snapshot clones#211
claude-claude[bot] wants to merge 1 commit intofix-slirp-snapshot-restorefrom
claude/fix-21613960041

claude-claude bot commented Feb 3, 2026

Uh oh!

ejc3 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claude-claude bot commented Feb 3, 2026

CI Fix

Problem

Root Cause

Solution

Testing

Uh oh!

ejc3 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant