Skip to content

Add HTTP/HTTPS proxy support and IPv6 networking#217

Merged
ejc3 merged 46 commits intomainfrom
http-proxy
Feb 5, 2026
Merged

Add HTTP/HTTPS proxy support and IPv6 networking#217
ejc3 merged 46 commits intomainfrom
http-proxy

Conversation

@ejc3
Copy link
Copy Markdown
Owner

@ejc3 ejc3 commented Feb 3, 2026

Summary

This PR adds IPv6 support for rootless networking and HTTP/HTTPS proxy support for container image pulls, plus comprehensive matrix tests for egress and proxy functionality.

Changes

IPv6 Support for Rootless Networking

  • Configure IPv6 addresses on TAP devices (fd00:1::2/64 for guest, fd00:1::1/64 for gateway)
  • Enable IPv6 forwarding and NAT66 in namespace setup script
  • Pass --enable-ipv6 and --outbound-addr6 to slirp4netns
  • Add configure_ipv6_from_cmdline() in fc-agent to parse ipv6=<client>|<gateway> boot param
  • VMs can reach:
    • Host's ::1 via fd00::2 (slirp gateway translation)
    • Host's global IPv6 directly through slirp NAT

HTTP/HTTPS Proxy Support

  • Add http_proxy/https_proxy fields to Plan struct in fc-agent
  • Pass proxy settings from host env vars through MMDS to VMs
  • Save proxy config to /etc/fcvm-proxy.env for exec commands
  • Apply proxy env vars to podman pull and podman exec

Self-Contained Test Infrastructure

  • LocalTestServer: Pure Rust HTTP server for ephemeral test endpoints (IPv4/IPv6)
  • LocalProxyServer: Pure Rust HTTP forward proxy for testing proxy functionality
  • Random port allocation with offset to avoid collisions in parallel tests
  • Connection-based readiness checking (replaces arbitrary sleeps)

Matrix Tests

Egress tests - verify VM can reach host services:

Test Host Bind VM Target Description
test_egress_ipv4_local 127.0.0.1 10.0.2.2 IPv4 loopback via slirp gateway
test_egress_ipv4_global 0.0.0.0 10.0.2.2 IPv4 all interfaces
test_egress_ipv6_local ::1 fd00::2 IPv6 loopback via slirp gateway
test_egress_ipv6_global host IPv6 host IPv6 Direct IPv6 NAT

Proxy tests - verify VM uses proxy for HTTP requests:

Test Proxy Bind VM Proxy Description
test_proxy_ipv4 127.0.0.1 10.0.2.2 IPv4 proxy via slirp gateway
test_proxy_ipv6 ::1 fd00::2 IPv6 proxy via slirp gateway

All tests are fully self-contained with no external network dependencies.

Bug Fixes

  • Port forwarding: Use correct guest IP (10.0.2.15) for slirp4netns port forwarding
  • Cgroup delegation: Add Delegate=yes to fc-agent.service for podman cgroup controllers
  • Nginx startup race: Add retry loop in port forward tests for nginx initialization

Documentation

  • Added "Host Service Access (Rootless Mode)" section to README with gateway mapping table
  • Added "IPv6 from Inside VMs" documentation
  • Added "Using HTTP Proxies" section

Test Plan

  • IPv6 connectivity: guest gets fd00:1::2, can ping fd00:1::1
  • IPv6 egress: VM reaches host's ::1 via fd00::2, reaches global IPv6
  • IPv4 egress: VM reaches host's 127.0.0.1 via 10.0.2.2
  • IPv4 proxy: VM uses proxy at 10.0.2.2, proxy forwards to target
  • IPv6 proxy: VM uses proxy at fd00::2, proxy forwards to target
  • Port forwarding: Works with new guest IP (10.0.2.15)
  • All tests self-contained (no external network dependencies)

ejc3 and others added 10 commits February 3, 2026 09:04
Add slirp_ipv6 test module with tests for:
- libslirp version detection (4.7+ supports native IPv6 DNS)
- DNS resolution in VMs
- IPv6 connectivity on eth0

Change health check default: don't auto-assign HTTP health check URL
from network config. HTTP health checks require an HTTP server in
the container. Use container-ready file mechanism by default instead.
Users can explicitly set --health-check for HTTP checks.

Also add find_available_port() helper to tests/common for port scanning.

Tested:
  make test-root FILTER=slirp_ipv6 - all 3 tests pass
  make test-root FILTER=sanity - sanity test still passes
- Switch from wget to curl for HTTPS testing (busybox wget doesn't
  properly support HTTPS through HTTP proxy)
- Use dynamic port allocation (find_available_high_port) instead of
  hardcoded 8080 to avoid conflicts with system services
- Add --vm flag to clone tests for nslookup/curl (run in VM, not container)
- Update test URLs: google.com → facebook.com, httpbin.org → checkip.amazonaws.com

Tests verified:
  test_egress_fresh_rootless - passed
  test_exec_rootless - passed
  test_port_forward_rootless - passed
test_ipv6_egress_to_host starts an IPv6-only server on host and
verifies VM cannot reach it. This documents that slirp4netns only
provides IPv4 NAT - IPv6 egress requires either:
  1. IPv6 NAT (not supported by slirp4netns)
  2. Bridged networking with IPv6 on the bridge
  3. A different networking solution like pasta
Enable IPv6 connectivity for VMs using slirp4netns:

- Add guest_ipv6 and host_ipv6 fields to NetworkConfig
- Configure IPv6 addresses on TAP devices (fd00:1::2/64 for guest)
- Enable IPv6 forwarding and NAT66 in namespace setup script
- Pass --enable-ipv6 and --outbound-addr6 to slirp4netns
- Detect host's global IPv6 address for outbound traffic
- Add configure_ipv6_from_cmdline() in fc-agent to parse ipv6= boot param
- Pass ipv6=<client>|<gateway> in kernel cmdline from podman.rs

Tested: make test FILTER=ipv6 - all 3 tests pass
- test_ipv6_connectivity_in_vm: ping to fd00::3 works
- test_ipv6_egress_to_host: VM reaches host's global IPv6
- test_ipv6_egress_internet: documents slirp4netns limitation
The test was checking for fd00::100 (namespace slirp0 address) but the
guest VM is configured with fd00:1::2. Also changed ping target from
fd00::3 (slirp DNS, unreachable without NAT66) to fd00:1::1 (gateway).

Tested: make test-root FILTER=ipv6 (2 tests pass)
The PR changed health checks from HTTP-based (which verified nginx was responding)
to container-ready file (which only verifies container started). This causes a race
condition where the container is marked healthy but nginx hasn't finished binding
to port 80 yet.

Add 30-second retry loops with 500ms intervals to both test_port_forward_bridged
and test_port_forward_rootless, giving nginx time to start after the container
becomes ready.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add Delegate=yes to fc-agent.service so podman can use cgroup
controllers (pids, memory, cpu) when running containers inside the VM.

Without this, crun fails with "the requested cgroup controller 'pids'
is not available" on fresh VM boots. Snapshot restores worked because
cgroups were already configured when the snapshot was taken.

Tested: FCVM_NO_SNAPSHOT=1 make test-root FILTER=test_dns_resolution_in_vm
- Add http_proxy/https_proxy fields to Plan struct in fc-agent
- Add save_proxy_settings() to persist proxy config to /etc/fcvm-proxy.env
- Add read_proxy_settings() to read proxy config in exec handlers
- Pass proxy env vars to podman pull and podman exec
- Add proxy fields to NetworkConfig (dns_search, http_proxy)
- Update MMDS to pass proxy settings from host env vars
- Add spawn_fcvm_with_env() test helper for passing env vars
- Add proxy tests (WIP - IPv6 proxy requires bridged networking)

The proxy settings are:
1. Read from host environment (http_proxy, HTTP_PROXY, etc.)
2. Passed to fc-agent via MMDS container-plan
3. Saved to /etc/fcvm-proxy.env by fc-agent
4. Applied to podman pull for image downloads
5. Applied to podman exec via -e flags for container commands
6. Applied to VM-level exec via environment variables
Replace external service dependencies (httpbin.org, ifconfig.me) with local
test servers for reliable, offline-capable testing.

Changes:
- Add LocalTestServer helper for starting ephemeral HTTP test servers
- Add wait_for_tcp() helper for connection-based readiness instead of sleeps
- Add spawn_fcvm_with_env() helper to spawn fcvm with custom env vars
- Replace test_proxy_passthrough_to_exec and test_image_pull_and_egress with
  four egress matrix tests covering all networking scenarios:
  - test_egress_ipv4_local: VM reaches 10.0.2.2 (bound to 127.0.0.1)
  - test_egress_ipv4_global: VM reaches 10.0.2.2 (bound to 0.0.0.0)
  - test_egress_ipv6_local: VM reaches fd00::2 (bound to ::1)
  - test_egress_ipv6_global: VM reaches host's global IPv6 directly
- Refactor test_vm_uses_ipv6_proxy to use local target server instead of
  httpbin.org for egress verification

The matrix tests verify slirp4netns gateway translation:
- IPv4: 10.0.2.2 → host's 127.0.0.1
- IPv6: fd00::2 → host's ::1

All tests now run without external network access.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 3, 2026

⚠️ Lint Failed

The Lint CI check has failed. Please fix the linting issues before I can review this PR.

You can view the failure details here: https://github.com/ejc3/fcvm/actions/runs/21638944706

To fix locally, run:

cargo fmt
cargo clippy --all-targets --fix --allow-dirty

Review by Claude | Run

Change slirp4netns port forwarding from 10.0.2.100 to 10.0.2.15, which is
the standard guest IP in slirp4netns's internal network (10.0.2.0/24).

The previous configuration caused slirp_add_hostfwd to fail with:
  "bad request: add_hostfwd: slirp_add_hostfwd failed"

This was because 10.0.2.100 is not a valid guest address in slirp4netns's
view. The standard guest IP is 10.0.2.15 (with gateway at 10.0.2.2).

Changes:
- Update slirp0 TAP device IP from 10.0.2.100 to 10.0.2.15
- Update add_hostfwd guest_addr to use 10.0.2.15
- Update DNAT rule to redirect from 10.0.2.15 to actual guest IP

The port forwarding flow remains: host -> slirp4netns -> 10.0.2.15 -> DNAT -> guest

Fixes test_port_forward_rootless test failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 3, 2026

⚠️ Lint Failed - Cannot Review

The Lint check has failed on this PR. Please fix the linting issues before I can perform a review.

You can run lint checks locally with:

cargo fmt --check
cargo clippy --all-targets -- -D warnings

Once the lint issues are resolved and CI passes, I'll be able to review the changes.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 3, 2026

⚠️ CI Check Failed

The Container-x64 test has failed. Please fix the failing test before review can proceed.

View failed check


Review by Claude | Run

ejc3 added 4 commits February 3, 2026 17:04
Python startup time can vary in CI environments, especially on x64
containers. The previous 1000ms timeout was too aggressive and caused
test_proxy tests to fail with "server not ready after 1000ms".

Increased to 5000ms to handle slower environments while still failing
fast enough for actual issues.
The LocalTestServer now uses tokio TcpListener directly instead of
spawning a Python process. Benefits:

- Instant startup (no Python interpreter overhead)
- No external dependencies
- More reliable in CI environments
- Simpler code

The server binds immediately and accepts connections in a tokio task,
responding with "TEST_SUCCESS\n" to any HTTP request.
The previous test tried to pull alpine:latest through an IPv6 proxy,
which was flaky because:
- Requires global IPv6 on host AND connectivity to registry
- Python forward proxy needed to handle registry traffic
- 180s timeout caused long CI runs when it failed

New test just verifies proxy env vars are passed to the VM, which is
the core functionality we need to test. The simpler egress tests
(test_egress_*) already verify actual IPv6 connectivity through
slirp4netns.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 3, 2026

⚠️ CI Check Failed

The Host-x64 test has failed. Please fix the failing test before I can complete the review.

Failed check: https://github.com/ejc3/fcvm/actions/runs/21640014575/job/62376659469


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 3, 2026

🔧 CI Auto-Fix

Created fix PR: #218

Issue Diagnosed

The test was failing because:

  • Test sets fake IPv6 proxy URL using host's global IPv6 address
  • Podman tries to use this proxy for pulling alpine:latest
  • Pull fails with 'network is unreachable' - VMs can't reach host global IPv6 (slirp4netns limitation)

Fix Applied

Added NO_PROXY env vars to exclude Docker registries from proxy usage, allowing image pulls to succeed while still testing proxy env var propagation (the actual test intent).

View Claude Run

- Add LocalProxyServer to test infrastructure (pure Rust HTTP forward proxy)
- Add random offset to find_available_high_port() to avoid port collisions
- Refactor proxy tests into matrix: test_proxy_ipv4, test_proxy_ipv6
- Remove redundant test_service_via_ipv6_gateway (covered by test_egress_ipv6_local)
- Document host service access, IPv6 usage, and proxy configuration in README

Tests verify VM can use proxy via slirp gateways:
- IPv4: curl -x http://10.0.2.2:PORT → proxy on 127.0.0.1 → target
- IPv6: curl -x http://[fd00::2]:PORT → proxy on ::1 → target
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 4, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] IPv6 tests failing on arm64 CI runners

The following IPv6-related tests are failing on the Container-arm64 job:

  • test_egress_ipv6_local
  • test_egress_ipv6_global
  • test_proxy_ipv6
  • test_ipv6_connectivity_in_vm

These tests appear to be failing because the arm64 CI runners don't have IPv6 configured. The test failures are expected in an IPv6-less environment, but they should be skipped gracefully rather than failing.

Root cause: The tests try to use IPv6 features (binding to ::1, configuring fd00:1::2/64 addresses) but the arm64 runners likely don't have IPv6 enabled. The tests should detect this and skip themselves.

Recommended fix: Add IPv6 availability checks at the beginning of IPv6 tests (similar to how test_egress_ipv6_global already checks for a global IPv6 address). For example:

  • Check if ::1 is available before running test_egress_ipv6_local and test_proxy_ipv6
  • Check if IPv6 can be configured on interfaces before running test_ipv6_connectivity_in_vm

If IPv6 is not available, the tests should print "SKIP: IPv6 not available on this system" and return Ok(()).

Summary

This PR adds comprehensive IPv6 and HTTP proxy support for fcvm, which is a significant enhancement. The implementation looks solid with good test coverage. However, the IPv6 tests are failing on arm64 CI runners due to missing IPv6 support on those machines. The tests should gracefully skip when IPv6 is unavailable rather than failing.


Review by Claude | Run

@ejc3
Copy link
Copy Markdown
Owner Author

ejc3 commented Feb 4, 2026

@claude re-review please

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ Lint Failed - Cannot Review

The Lint CI check has failed. Please fix the linting issues before I can review this PR.

Failed Check:

To fix locally, run:

cargo fmt
cargo clippy --all-targets --fix --allow-dirty

Once the lint issues are resolved and CI passes, I'll be able to review the changes.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

CI Failure Analysis

I analyzed CI #21695884982 and found two separate issues:

1. Container-arm64: TTY stdin forwarding failure

Test: (174/180)
Failure: Test timed out after 130 seconds

Root Cause: The TTY reader loop hit EOF prematurely during snapshot creation:

2026-02-05T02:09:47.842826Z  INFO fcvm::commands::podman: Creating pre-start snapshot
...
2026-02-05T02:10:00.337860Z  DEBUG fcvm::commands::tty: reader_loop: EOF

The test forks a process to run fcvm podman run -it alpine head -1, writes "hello-from-stdin\n" to the PTY, and expects to read it back. However, the TTY connection closes during the snapshot creation phase, preventing stdin from being forwarded.

Why Not Fixable: This is a complex race condition in TTY/snapshot interaction that requires deeper investigation into how the TTY session is managed during snapshot creation. The issue may be related to how the vm is paused/resumed during snapshotting.

2. Host-x64: FUSE mount limit exceeded

Tests: Multiple fuse-pipe integration tests
Failure: fusermount3: too many FUSE filesystems mounted; mount_max=N can be set in /etc/fuse.conf

Why Not Fixable: This is an environmental issue with the test runner, not a code issue. The host has reached the maximum number of FUSE filesystems that can be mounted simultaneously.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

CI Failure Analysis

I analyzed CI #21695884982 and found two separate issues:

1. Container-arm64: TTY stdin forwarding failure

Test: fcvm::test_exec::test_podman_run_interactive_tty (174/180)
Failure: Test timed out after 130 seconds

Root Cause: The TTY reader loop hit EOF prematurely during snapshot creation:

2026-02-05T02:09:47.842826Z  INFO fcvm::commands::podman: Creating pre-start snapshot
...
2026-02-05T02:10:00.337860Z  DEBUG fcvm::commands::tty: reader_loop: EOF

The test forks a process to run fcvm podman run -it alpine head -1, writes "hello-from-stdin\n" to the PTY, and expects to read it back. However, the TTY connection closes during the snapshot creation phase, preventing stdin from being forwarded.

Why Not Fixable: This is a complex race condition in TTY/snapshot interaction that requires deeper investigation into how the TTY session is managed during snapshot creation. The issue may be related to how the VM is paused/resumed during snapshotting.

2. Host-x64: FUSE mount limit exceeded

Tests: Multiple fuse-pipe integration tests
Failure: fusermount3: too many FUSE filesystems mounted; mount_max=N can be set in /etc/fuse.conf

Why Not Fixable: This is an environmental issue with the test runner, not a code issue. The host has reached the maximum number of FUSE filesystems that can be mounted simultaneously.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ CI Check Failed

The Host-x64 test has failed. Please fix the failing test before I can review this PR.

Failed Check:


Review by Claude | Run

The CI was failing with 'too many FUSE filesystems mounted' because stale mounts
accumulated across test runs. This adds fusermount3 -u -z cleanup for temp dirs.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

CI Failure Analysis

I analyzed CI #21696284247 and found that the failure was due to code formatting issues in fcvm/src/commands/exec.rs.

Diagnosis

The CI Lint job failed because two bail! macro calls in exec.rs (lines 73 and 87) were formatted on a single line, but cargo fmt expected them to be split across multiple lines for readability.

Resolution

✅ The issue has already been fixed in commit 476e8ad ("style: format exec.rs"), which is now part of the http-proxy branch.

The latest CI run shows the Lint job passing successfully. No further action is needed.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ CI Checks Failed

Cannot proceed with review - some CI checks have failed:

Please fix these test failures before review can proceed. Reviewing code that doesn't pass CI is not productive.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

🔧 CI Auto-Fix

Created fix PR: #235

The test-packaging-e2e failure was caused by health check exec calls blocking for 50+ seconds when the exec server wasn't ready yet, preventing VMs from becoming healthy within the 120s timeout.

View Claude Run

- Check container-exit file before other health checks to detect
  containers that fail to load (exit 125) while VM is still running
- Add validate_docker_archive() to verify manifest.json exists in
  docker archives after export
- Auto-repair corrupted cached archives by re-exporting on same run
- Fix health check logic to not return Unhealthy when container
  doesn't exist yet (check_podman_healthcheck returns Some(false)
  when inspect fails)

Tested: make test-root FILTER=healthcheck (3/3 passed)
Tested: make test-root FILTER=sanity (3/3 passed)
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ Lint Failed

The Lint check has failed for this PR. Please fix the linting issues before I can perform a code review.

CI Status: View failed Lint check

To fix locally, run:

cargo fmt --check
cargo clippy --all-targets -- -D warnings

Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ CI Checks Failed

The following CI checks have failed:

  • Host-arm64 (failed after 7m9s)
  • Host-x64 (failed after 7m7s)

Please fix these test failures before I can review this PR. Reviewing code that doesn't pass CI is not productive.

View the failed runs:


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

🔍 CI Failure Analysis (In Progress)

Analyzing: CI #21697741236

Current Status

Currently investigating the CI failures. The http-proxy branch adds HTTP proxy support via a new http_proxy field in NetworkConfig.

Code Review Findings

✅ The implementation appears correct:

  • http_proxy field added to src/network/types.rs
  • Properly serialized and passed to fc-agent via MMDS
  • fc-agent correctly uses both http_proxy and https_proxy
  • Fallback to environment variables implemented

Failed Jobs

  1. Container-arm64 → container-test
  2. Container-x64 → container-test
  3. Host-x64 → test-packaging-e2e
  4. Host-arm64 → test-packaging-e2e

Next Steps

Monitoring current CI run #21698065661 to determine if the failures are reproducible or flaky.

Updating shortly...

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

CI Failure Analysis - Complete

Analyzed: CI #21697741236

Findings

The CI failures are NOT directly caused by the http_proxy feature implementation. The code changes are correct:

  • http_proxy field properly added to NetworkConfig
  • ✅ Serialization/deserialization working correctly
  • ✅ Proxy values passed to fc-agent via MMDS
  • ✅ Proxy tests (test_proxy_ipv4, test_proxy_ipv6) all PASSING

Actual Failure Causes

1. Container-test failures (ARM64/x64):

ERROR: namespace not ready after 1 attempts (holder PID 17869)
nsenter: reassociate to namespace 'ns/user' failed: Invalid argument
  • Rootless networking namespace setup failing
  • nsenter cannot enter user namespace
  • Likely environmental/timing issue, not code bug

2. Test-packaging-e2e failures (ARM64/x64):

FAIL: VM did not become healthy within 120s
Status: unknown
  • Bridged networking test with nginx:alpine
  • VM stays in "unknown" status instead of becoming "healthy"
  • Health check not passing

Assessment

These appear to be environmental/flaky test issues rather than bugs in the http_proxy implementation. The failures may be:

  • Timing-related (namespace setup race conditions)
  • Infrastructure issues on CI runners
  • Flaky tests that need retry logic

Recommendation

  1. Re-run the CI to see if failures are reproducible
  2. Consider adding retry logic for namespace setup
  3. Investigate why bridged mode health checks are timing out

Currently monitoring run #21698065661 to see if issues reproduce.


View Claude Run

The health monitor's exec calls to check container status (podman inspect)
could hang indefinitely if the exec server was unresponsive, blocking the
entire health monitoring loop. This caused VMs to stay in "unknown" status
forever in test-packaging-e2e.

Added HEALTH_CHECK_EXEC_TIMEOUT (5 seconds) wrapping both:
- check_container_running() - checks if container is running
- check_podman_healthcheck() - checks podman HEALTHCHECK status

When timeout fires, functions return false/retry rather than blocking.

Tested: ./scripts/test-packaging-e2e.sh passes (VM becomes healthy)
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ CI Check Failed

The Container-arm64 check has failed. Please fix the failing test before I can complete the review.

View failed check


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

CI Failure Analysis

I analyzed CI #21699199367 and couldn't create an automatic fix.

Diagnosis

The CI failures are not related to the HTTP proxy feature changes. The failing tests are:

  1. fuse-pipe::test_mount_stress::test_rapid_mount_unmount_cycles - FUSE mount stress test
  2. fcvm::test_diff_snapshot::test_diff_snapshot_memory_size_valid - Snapshot diff test
  3. fcvm::test_exec::test_exec_rootless - Rootless exec test
  4. fcvm::test_snapshot_clone::test_snapshot_clone_stress_100_rootless - Snapshot stress test (0/100 clones became healthy)

All failures show health check timeouts and are unrelated to the proxy/IPv6 networking changes in this PR.

Why Not Fixable

These appear to be flaky stress/integration tests experiencing timing issues in the CI environment, not bugs introduced by code changes. The failures are environmental:

  • Health check timeouts (containers not becoming ready in time)
  • Stress tests with 100 parallel VMs failing to become healthy
  • No errors in the proxy-specific code or tests

Recommendation

Re-run the CI to see if these flaky tests pass on retry. The http-proxy feature code itself appears sound based on the test structure and passing unit tests.

View Claude Run

Container-arm64 CI was failing with:
  "OS can't spawn worker thread: Resource temporarily unavailable (os error 11)"

tokio panicked when it couldn't spawn worker threads due to process/thread
limits being exhausted. This was caused by missing nproc ulimit in container
configuration - only pids-limit and nofile were set.

Added --ulimit nproc=65536:65536 alongside existing limits.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ CI Check Failed

The Container-arm64 check has failed. Please fix the failing test before I can complete the review.

Failed Check:

Once all CI checks pass, I'll perform a full review of the PR.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

🔍 CI Failure Analysis

I analyzed CI #21699963293 which was triggered to fix the previous failure.

Previous Failure Diagnosis (run #21699199367)

The CI had 4 test failures due to thread/process exhaustion:

  • fuse-pipe::test_mount_stress test_rapid_mount_unmount_cycles
  • fcvm::test_diff_snapshot test_diff_snapshot_memory_size_valid
  • fcvm::test_exec test_exec_rootless
  • fcvm::test_snapshot_clone test_snapshot_clone_stress_100_rootless

Root cause: Tests were hitting resource limits with error Resource temporarily unavailable (os error 11) (EAGAIN), caused by missing nproc ulimit in the container configuration.

✅ Fix Already Applied

Commit 0e07f7b by @ejc3 has already fixed this issue by adding --ulimit nproc=65536:65536 to the Makefile container run configuration.

The current CI run (#21699963293) is running with this fix. No additional code changes are needed.


Generated by Claude | Analysis Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⚠️ CI Check Failed

The Host-x64 check has failed. I cannot proceed with the review until all CI checks pass.

Failed Check:

Please fix the failing test and push the changes. Once all checks pass, I'll be able to review this PR.


Review by Claude | Run

- Capture sudo process PID for cleanup
- Kill process tree children before parent
- Use VM name pattern to catch orphaned firecracker/slirp4netns
- Increase sleep time for graceful shutdown

This prevents 20+ stale firecracker/slirp4netns processes from
accumulating after packaging e2e tests.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

⏸️ Review Paused - CI Check Failed

The Host-arm64 check has failed. Please fix the failing test before I can complete the review.

Failed Check:

Reviewing code that doesn't pass CI is not productive. I'll wait for the fixes and can review again once all checks pass.


Review by Claude | Run

The retry flag was only in _test-root, causing flaky tests to fail
immediately in container/fast tests instead of being retried.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Feb 5, 2026

🔍 Claude Review

SEVERITY: none

Findings

No issues found.

This PR successfully adds IPv6 support and HTTP/HTTPS proxy functionality to fcvm. After extensive debugging and iteration through multiple fixes (as evidenced by the 44 commits and previous review history), all CI checks are now passing.

Implementation Highlights

IPv6 Support:

  • Dual-stack networking with IPv4 and IPv6 via slirp4netns
  • Guest IPv6 addressing: fd00:1::2/64 with gateway at fd00:1::1
  • IPv6 NAT66 for egress traffic
  • IPv6 DNAT for port forwarding (fd00::100 → guest)
  • Automatic detection of host IPv6 and configuration passthrough

HTTP/HTTPS Proxy Support:

  • Automatic proxy environment variable passthrough from host to VM via MMDS
  • Proxy settings saved to /etc/fcvm-proxy.env for exec commands
  • Support for both IPv4 and IPv6 proxies
  • Works with podman pull and podman exec

Test Infrastructure:

  • Pure Rust HTTP test server (LocalTestServer) - no external dependencies
  • Pure Rust HTTP forward proxy (LocalProxyServer) for testing proxy functionality
  • Comprehensive matrix tests covering IPv4/IPv6, local/global, proxy/egress scenarios
  • Self-contained tests with no external network dependencies
  • Graceful test skipping when IPv6 is unavailable (container environments)

Bug Fixes Along the Way:

  • Fixed port forwarding to use correct guest IP (10.0.2.15)
  • Added cgroup delegation (Delegate=yes) to fc-agent.service for podman
  • Fixed exec server race conditions with proper readiness signaling
  • Added DNS passthrough with --dns flag to slirp4netns
  • Fixed rootfs cache invalidation to include service file changes

Code Quality

  • Well-structured and documented
  • Extensive error handling
  • Clean separation of concerns
  • Thorough test coverage
  • No clippy warnings
  • Proper formatting

Summary

This is a substantial and well-executed enhancement to fcvm. The IPv6 and proxy support are production-ready, thoroughly tested, and all CI checks pass. The PR is ready to merge.


Review by Claude | Run

@ejc3 ejc3 merged commit 25bc7cf into main Feb 5, 2026
16 checks passed
ejc3 added a commit that referenced this pull request Mar 2, 2026
Add HTTP/HTTPS proxy support and IPv6 networking
ejc3 added a commit that referenced this pull request Mar 2, 2026
Add HTTP/HTTPS proxy support and IPv6 networking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant