Conversation
Add slirp_ipv6 test module with tests for: - libslirp version detection (4.7+ supports native IPv6 DNS) - DNS resolution in VMs - IPv6 connectivity on eth0 Change health check default: don't auto-assign HTTP health check URL from network config. HTTP health checks require an HTTP server in the container. Use container-ready file mechanism by default instead. Users can explicitly set --health-check for HTTP checks. Also add find_available_port() helper to tests/common for port scanning. Tested: make test-root FILTER=slirp_ipv6 - all 3 tests pass make test-root FILTER=sanity - sanity test still passes
- Switch from wget to curl for HTTPS testing (busybox wget doesn't properly support HTTPS through HTTP proxy) - Use dynamic port allocation (find_available_high_port) instead of hardcoded 8080 to avoid conflicts with system services - Add --vm flag to clone tests for nslookup/curl (run in VM, not container) - Update test URLs: google.com → facebook.com, httpbin.org → checkip.amazonaws.com Tests verified: test_egress_fresh_rootless - passed test_exec_rootless - passed test_port_forward_rootless - passed
test_ipv6_egress_to_host starts an IPv6-only server on host and verifies VM cannot reach it. This documents that slirp4netns only provides IPv4 NAT - IPv6 egress requires either: 1. IPv6 NAT (not supported by slirp4netns) 2. Bridged networking with IPv6 on the bridge 3. A different networking solution like pasta
Enable IPv6 connectivity for VMs using slirp4netns: - Add guest_ipv6 and host_ipv6 fields to NetworkConfig - Configure IPv6 addresses on TAP devices (fd00:1::2/64 for guest) - Enable IPv6 forwarding and NAT66 in namespace setup script - Pass --enable-ipv6 and --outbound-addr6 to slirp4netns - Detect host's global IPv6 address for outbound traffic - Add configure_ipv6_from_cmdline() in fc-agent to parse ipv6= boot param - Pass ipv6=<client>|<gateway> in kernel cmdline from podman.rs Tested: make test FILTER=ipv6 - all 3 tests pass - test_ipv6_connectivity_in_vm: ping to fd00::3 works - test_ipv6_egress_to_host: VM reaches host's global IPv6 - test_ipv6_egress_internet: documents slirp4netns limitation
The test was checking for fd00::100 (namespace slirp0 address) but the guest VM is configured with fd00:1::2. Also changed ping target from fd00::3 (slirp DNS, unreachable without NAT66) to fd00:1::1 (gateway). Tested: make test-root FILTER=ipv6 (2 tests pass)
The PR changed health checks from HTTP-based (which verified nginx was responding) to container-ready file (which only verifies container started). This causes a race condition where the container is marked healthy but nginx hasn't finished binding to port 80 yet. Add 30-second retry loops with 500ms intervals to both test_port_forward_bridged and test_port_forward_rootless, giving nginx time to start after the container becomes ready. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add Delegate=yes to fc-agent.service so podman can use cgroup controllers (pids, memory, cpu) when running containers inside the VM. Without this, crun fails with "the requested cgroup controller 'pids' is not available" on fresh VM boots. Snapshot restores worked because cgroups were already configured when the snapshot was taken. Tested: FCVM_NO_SNAPSHOT=1 make test-root FILTER=test_dns_resolution_in_vm
- Add http_proxy/https_proxy fields to Plan struct in fc-agent - Add save_proxy_settings() to persist proxy config to /etc/fcvm-proxy.env - Add read_proxy_settings() to read proxy config in exec handlers - Pass proxy env vars to podman pull and podman exec - Add proxy fields to NetworkConfig (dns_search, http_proxy) - Update MMDS to pass proxy settings from host env vars - Add spawn_fcvm_with_env() test helper for passing env vars - Add proxy tests (WIP - IPv6 proxy requires bridged networking) The proxy settings are: 1. Read from host environment (http_proxy, HTTP_PROXY, etc.) 2. Passed to fc-agent via MMDS container-plan 3. Saved to /etc/fcvm-proxy.env by fc-agent 4. Applied to podman pull for image downloads 5. Applied to podman exec via -e flags for container commands 6. Applied to VM-level exec via environment variables
Replace external service dependencies (httpbin.org, ifconfig.me) with local test servers for reliable, offline-capable testing. Changes: - Add LocalTestServer helper for starting ephemeral HTTP test servers - Add wait_for_tcp() helper for connection-based readiness instead of sleeps - Add spawn_fcvm_with_env() helper to spawn fcvm with custom env vars - Replace test_proxy_passthrough_to_exec and test_image_pull_and_egress with four egress matrix tests covering all networking scenarios: - test_egress_ipv4_local: VM reaches 10.0.2.2 (bound to 127.0.0.1) - test_egress_ipv4_global: VM reaches 10.0.2.2 (bound to 0.0.0.0) - test_egress_ipv6_local: VM reaches fd00::2 (bound to ::1) - test_egress_ipv6_global: VM reaches host's global IPv6 directly - Refactor test_vm_uses_ipv6_proxy to use local target server instead of httpbin.org for egress verification The matrix tests verify slirp4netns gateway translation: - IPv4: 10.0.2.2 → host's 127.0.0.1 - IPv6: fd00::2 → host's ::1 All tests now run without external network access.
|
Change slirp4netns port forwarding from 10.0.2.100 to 10.0.2.15, which is the standard guest IP in slirp4netns's internal network (10.0.2.0/24). The previous configuration caused slirp_add_hostfwd to fail with: "bad request: add_hostfwd: slirp_add_hostfwd failed" This was because 10.0.2.100 is not a valid guest address in slirp4netns's view. The standard guest IP is 10.0.2.15 (with gateway at 10.0.2.2). Changes: - Update slirp0 TAP device IP from 10.0.2.100 to 10.0.2.15 - Update add_hostfwd guest_addr to use 10.0.2.15 - Update DNAT rule to redirect from 10.0.2.15 to actual guest IP The port forwarding flow remains: host -> slirp4netns -> 10.0.2.15 -> DNAT -> guest Fixes test_port_forward_rootless test failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Python startup time can vary in CI environments, especially on x64 containers. The previous 1000ms timeout was too aggressive and caused test_proxy tests to fail with "server not ready after 1000ms". Increased to 5000ms to handle slower environments while still failing fast enough for actual issues.
The LocalTestServer now uses tokio TcpListener directly instead of spawning a Python process. Benefits: - Instant startup (no Python interpreter overhead) - No external dependencies - More reliable in CI environments - Simpler code The server binds immediately and accepts connections in a tokio task, responding with "TEST_SUCCESS\n" to any HTTP request.
The previous test tried to pull alpine:latest through an IPv6 proxy, which was flaky because: - Requires global IPv6 on host AND connectivity to registry - Python forward proxy needed to handle registry traffic - 180s timeout caused long CI runs when it failed New test just verifies proxy env vars are passed to the VM, which is the core functionality we need to test. The simpler egress tests (test_egress_*) already verify actual IPv6 connectivity through slirp4netns.
This reverts commit 2b8e5b0.
|
🔧 CI Auto-FixCreated fix PR: #218 Issue DiagnosedThe test was failing because:
Fix AppliedAdded |
- Add LocalProxyServer to test infrastructure (pure Rust HTTP forward proxy) - Add random offset to find_available_high_port() to avoid port collisions - Refactor proxy tests into matrix: test_proxy_ipv4, test_proxy_ipv6 - Remove redundant test_service_via_ipv6_gateway (covered by test_egress_ipv6_local) - Document host service access, IPv6 usage, and proxy configuration in README Tests verify VM can use proxy via slirp gateways: - IPv4: curl -x http://10.0.2.2:PORT → proxy on 127.0.0.1 → target - IPv6: curl -x http://[fd00::2]:PORT → proxy on ::1 → target
🔍 Claude ReviewSEVERITY: medium Findings[MEDIUM] IPv6 tests failing on arm64 CI runners The following IPv6-related tests are failing on the Container-arm64 job:
These tests appear to be failing because the arm64 CI runners don't have IPv6 configured. The test failures are expected in an IPv6-less environment, but they should be skipped gracefully rather than failing. Root cause: The tests try to use IPv6 features (binding to Recommended fix: Add IPv6 availability checks at the beginning of IPv6 tests (similar to how
If IPv6 is not available, the tests should print "SKIP: IPv6 not available on this system" and return SummaryThis PR adds comprehensive IPv6 and HTTP proxy support for fcvm, which is a significant enhancement. The implementation looks solid with good test coverage. However, the IPv6 tests are failing on arm64 CI runners due to missing IPv6 support on those machines. The tests should gracefully skip when IPv6 is unavailable rather than failing. Review by Claude | Run |
|
@claude re-review please |
|
CI Failure AnalysisI analyzed CI #21695884982 and found two separate issues: 1. Container-arm64: TTY stdin forwarding failureTest: (174/180) Root Cause: The TTY reader loop hit EOF prematurely during snapshot creation: The test forks a process to run Why Not Fixable: This is a complex race condition in TTY/snapshot interaction that requires deeper investigation into how the TTY session is managed during snapshot creation. The issue may be related to how the vm is paused/resumed during snapshotting. 2. Host-x64: FUSE mount limit exceededTests: Multiple fuse-pipe integration tests Why Not Fixable: This is an environmental issue with the test runner, not a code issue. The host has reached the maximum number of FUSE filesystems that can be mounted simultaneously. |
CI Failure AnalysisI analyzed CI #21695884982 and found two separate issues: 1. Container-arm64: TTY stdin forwarding failureTest: Root Cause: The TTY reader loop hit EOF prematurely during snapshot creation: The test forks a process to run Why Not Fixable: This is a complex race condition in TTY/snapshot interaction that requires deeper investigation into how the TTY session is managed during snapshot creation. The issue may be related to how the VM is paused/resumed during snapshotting. 2. Host-x64: FUSE mount limit exceededTests: Multiple fuse-pipe integration tests Why Not Fixable: This is an environmental issue with the test runner, not a code issue. The host has reached the maximum number of FUSE filesystems that can be mounted simultaneously. |
|
The CI was failing with 'too many FUSE filesystems mounted' because stale mounts accumulated across test runs. This adds fusermount3 -u -z cleanup for temp dirs.
CI Failure AnalysisI analyzed CI #21696284247 and found that the failure was due to code formatting issues in DiagnosisThe CI Lint job failed because two Resolution✅ The issue has already been fixed in commit 476e8ad ("style: format exec.rs"), which is now part of the http-proxy branch. The latest CI run shows the Lint job passing successfully. No further action is needed. |
🔧 CI Auto-FixCreated fix PR: #235 The test-packaging-e2e failure was caused by health check exec calls blocking for 50+ seconds when the exec server wasn't ready yet, preventing VMs from becoming healthy within the 120s timeout. |
- Check container-exit file before other health checks to detect containers that fail to load (exit 125) while VM is still running - Add validate_docker_archive() to verify manifest.json exists in docker archives after export - Auto-repair corrupted cached archives by re-exporting on same run - Fix health check logic to not return Unhealthy when container doesn't exist yet (check_podman_healthcheck returns Some(false) when inspect fails) Tested: make test-root FILTER=healthcheck (3/3 passed) Tested: make test-root FILTER=sanity (3/3 passed)
|
|
🔍 CI Failure Analysis (In Progress)Analyzing: CI #21697741236 Current StatusCurrently investigating the CI failures. The http-proxy branch adds HTTP proxy support via a new Code Review Findings✅ The implementation appears correct:
Failed Jobs
Next StepsMonitoring current CI run #21698065661 to determine if the failures are reproducible or flaky. Updating shortly... |
CI Failure Analysis - CompleteAnalyzed: CI #21697741236 FindingsThe CI failures are NOT directly caused by the http_proxy feature implementation. The code changes are correct:
Actual Failure Causes1. Container-test failures (ARM64/x64):
2. Test-packaging-e2e failures (ARM64/x64):
AssessmentThese appear to be environmental/flaky test issues rather than bugs in the http_proxy implementation. The failures may be:
Recommendation
Currently monitoring run #21698065661 to see if issues reproduce. |
The health monitor's exec calls to check container status (podman inspect) could hang indefinitely if the exec server was unresponsive, blocking the entire health monitoring loop. This caused VMs to stay in "unknown" status forever in test-packaging-e2e. Added HEALTH_CHECK_EXEC_TIMEOUT (5 seconds) wrapping both: - check_container_running() - checks if container is running - check_podman_healthcheck() - checks podman HEALTHCHECK status When timeout fires, functions return false/retry rather than blocking. Tested: ./scripts/test-packaging-e2e.sh passes (VM becomes healthy)
|
CI Failure AnalysisI analyzed CI #21699199367 and couldn't create an automatic fix. DiagnosisThe CI failures are not related to the HTTP proxy feature changes. The failing tests are:
All failures show health check timeouts and are unrelated to the proxy/IPv6 networking changes in this PR. Why Not FixableThese appear to be flaky stress/integration tests experiencing timing issues in the CI environment, not bugs introduced by code changes. The failures are environmental:
RecommendationRe-run the CI to see if these flaky tests pass on retry. The http-proxy feature code itself appears sound based on the test structure and passing unit tests. |
Container-arm64 CI was failing with: "OS can't spawn worker thread: Resource temporarily unavailable (os error 11)" tokio panicked when it couldn't spawn worker threads due to process/thread limits being exhausted. This was caused by missing nproc ulimit in container configuration - only pids-limit and nofile were set. Added --ulimit nproc=65536:65536 alongside existing limits.
|
🔍 CI Failure AnalysisI analyzed CI #21699963293 which was triggered to fix the previous failure. Previous Failure Diagnosis (run #21699199367)The CI had 4 test failures due to thread/process exhaustion:
Root cause: Tests were hitting resource limits with error ✅ Fix Already AppliedCommit 0e07f7b by @ejc3 has already fixed this issue by adding The current CI run (#21699963293) is running with this fix. No additional code changes are needed. Generated by Claude | Analysis Run |
|
- Capture sudo process PID for cleanup - Kill process tree children before parent - Use VM name pattern to catch orphaned firecracker/slirp4netns - Increase sleep time for graceful shutdown This prevents 20+ stale firecracker/slirp4netns processes from accumulating after packaging e2e tests.
⏸️ Review Paused - CI Check FailedThe Host-arm64 check has failed. Please fix the failing test before I can complete the review. Failed Check: Reviewing code that doesn't pass CI is not productive. I'll wait for the fixes and can review again once all checks pass. Review by Claude | Run |
The retry flag was only in _test-root, causing flaky tests to fail immediately in container/fast tests instead of being retried.
🔍 Claude ReviewSEVERITY: none FindingsNo issues found. ✅ This PR successfully adds IPv6 support and HTTP/HTTPS proxy functionality to fcvm. After extensive debugging and iteration through multiple fixes (as evidenced by the 44 commits and previous review history), all CI checks are now passing. Implementation HighlightsIPv6 Support:
HTTP/HTTPS Proxy Support:
Test Infrastructure:
Bug Fixes Along the Way:
Code Quality
SummaryThis is a substantial and well-executed enhancement to fcvm. The IPv6 and proxy support are production-ready, thoroughly tested, and all CI checks pass. The PR is ready to merge. Review by Claude | Run |
Add HTTP/HTTPS proxy support and IPv6 networking
Add HTTP/HTTPS proxy support and IPv6 networking
Summary
This PR adds IPv6 support for rootless networking and HTTP/HTTPS proxy support for container image pulls, plus comprehensive matrix tests for egress and proxy functionality.
Changes
IPv6 Support for Rootless Networking
fd00:1::2/64for guest,fd00:1::1/64for gateway)--enable-ipv6and--outbound-addr6to slirp4netnsconfigure_ipv6_from_cmdline()in fc-agent to parseipv6=<client>|<gateway>boot param::1viafd00::2(slirp gateway translation)HTTP/HTTPS Proxy Support
http_proxy/https_proxyfields to Plan struct in fc-agent/etc/fcvm-proxy.envfor exec commandsSelf-Contained Test Infrastructure
Matrix Tests
Egress tests - verify VM can reach host services:
test_egress_ipv4_local127.0.0.110.0.2.2test_egress_ipv4_global0.0.0.010.0.2.2test_egress_ipv6_local::1fd00::2test_egress_ipv6_globalProxy tests - verify VM uses proxy for HTTP requests:
test_proxy_ipv4127.0.0.110.0.2.2test_proxy_ipv6::1fd00::2All tests are fully self-contained with no external network dependencies.
Bug Fixes
10.0.2.15) for slirp4netns port forwardingDelegate=yesto fc-agent.service for podman cgroup controllersDocumentation
Test Plan