Feature: Tunnel Health Check and Fast Recovery

### Problem

Currently, the tunnel lacks robust health monitoring and automatic recovery mechanisms. When the tunnel fails (network issues, resolver unavailability, silent packet loss), detection is slow and recovery requires manual restart.

This becomes especially critical in highly restricted network environments where the tunnel may be established through an intermediary server running in non-interactive or headless mode. In such scenarios, recovering from a failure requires restarting the tunnel from the server side—but access to that server may be limited, intermittent, or available only during narrow time windows. Without self-healing capabilities, a silent tunnel failure can render the connection unusable until the next opportunity for manual intervention.

### Current State

- QUIC keep-alive is enabled (400ms) but never verified
- Connection close/reset is detected via callbacks, but triggers program exit
- No automatic reconnection logic
- No per-resolver health tracking
- Recursive mode has no timeout for unresponsive resolvers
- Path quality metrics are collected but not used for failure detection

### Proposed Solution

#### Client-Side (Primary)

1. **Active Health Probing**
   - Periodic lightweight probes independent of data transfer
   - Configurable interval (e.g., `--health-check-interval`)
   - Detect silent failures within seconds

2. **Automatic Reconnection**
   - Exponential backoff on connection failure
   - Configurable retry budget and max delay
   - Preserve TCP listeners during reconnection attempts

3. **Per-Resolver Health Tracking**
   - Track success/failure rate per resolver
   - Automatic failover to healthy resolvers
   - Circuit breaker pattern for repeatedly failing resolvers

4. **Path Quality Thresholds**
   - Use existing RTT/loss metrics for degradation detection
   - Switch paths when quality drops below threshold

#### Server-Side (Optional)

- Session idle timeout tracking (already partially exists in UDP fallback)
- Metrics/logging for client health events
---

We'd love to hear your feedback on these proposed solutions. We're happy to contribute to the implementation—just wanted to align on the overall strategy and approach before diving into the code.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Tunnel Health Check and Fast Recovery #30

Problem

Current State

Proposed Solution

Client-Side (Primary)

Server-Side (Optional)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature: Tunnel Health Check and Fast Recovery #30

Description

Problem

Current State

Proposed Solution

Client-Side (Primary)

Server-Side (Optional)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions