Skip to content

fix: add timeout to health check exec calls to prevent blocking#235

Closed
claude-claude[bot] wants to merge 1 commit intohttp-proxyfrom
claude/fix-21696772921
Closed

fix: add timeout to health check exec calls to prevent blocking#235
claude-claude[bot] wants to merge 1 commit intohttp-proxyfrom
claude/fix-21696772921

Conversation

@claude-claude
Copy link
Copy Markdown
Contributor

@claude-claude claude-claude bot commented Feb 5, 2026

CI Fix

Fixes CI #21696494896

Problem

The test was timing out on ARM64 because the VM never became healthy within the 120-second timeout. The health monitor was getting blocked when trying to run fcvm exec to check container status via podman inspect.

Root cause: Commit cb639e9 changed the health check to not assume healthy when exec fails, which is correct behavior. However, the fcvm exec command has built-in retry logic with exponential backoff that can take 50+ seconds when the exec server isn't ready yet (common during VM startup). This caused health checks to hang for extended periods, preventing VMs from being marked as healthy.

Solution

Added a 2-second timeout to exec calls in both check_container_running and check_podman_healthcheck. This allows the health monitor to fail fast and retry on the next iteration (every 100ms during startup), rather than blocking for the full exec retry duration.

With this fix:

  • Health checks complete in <2s even if exec isn't ready
  • The health monitor can poll frequently without blocking
  • VMs become healthy quickly once the exec server is ready
  • No changes to the correctness of the health check logic

Generated by Claude | Fix Run

The health monitor was getting blocked when trying to run 'fcvm exec' to check
container status via podman inspect. The exec command has built-in retry logic
with exponential backoff that can take 50+ seconds when the exec server isn't
ready yet (e.g., during VM startup).

This caused the health check to hang for extended periods, preventing VMs from
being marked as healthy in a timely manner. The test-packaging-e2e test was
timing out after 120 seconds because the health checks were taking too long.

Solution: Add a 2-second timeout to exec calls in both check_container_running
and check_podman_healthcheck. This allows the health monitor to fail fast and
retry on the next iteration (every 100ms during startup), rather than blocking
for the full exec retry duration.

Fixes CI #21696494896
@ejc3
Copy link
Copy Markdown
Owner

ejc3 commented Feb 5, 2026

Already fixed in http-proxy branch (commit cb639e9 and 4963268 include these changes).

@ejc3 ejc3 closed this Feb 5, 2026
@ejc3 ejc3 deleted the claude/fix-21696772921 branch February 8, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant