-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Problem
When the server is completely unreachable at the TCP level (e.g. brief network blip, server restart), the WebSocket reconnect logic in src/lib/ws-client.ts has two compounding issues:
-
Unbounded exponential backoff — delays grow as
1000 * 2^nms, reaching 512s (~8.5 min) by the 9th attempt. Most implementations cap at 30-60s. -
Hard 10-attempt limit with no recovery — after 10 failed attempts (~17 min total),
scheduleReconnect()logsmax reconnect attempts reachedand returns. The client is permanently dead until page reload.
Note: reconnectAttempts resets in ws.onopen (line 172), so this only affects scenarios where the server is totally unreachable (TCP connect fails before onopen). When the server is reachable but drops post-handshake, the counter resets and it retries at 1s indefinitely — that path is fine.
Observed behavior
Client error logs showing repeated [WsClient] reconnect failed WebSocket error with increasing gaps between attempts. The HTTP endpoint (/api/logs/client) was reachable the whole time, suggesting the outage was brief but the client had already burned through attempts or was waiting on a very long backoff delay.
Suggested fix
- Cap max backoff delay (e.g. 30s)
- After exhausting fast attempts, switch to a slow poll (e.g. every 60s) instead of giving up entirely
- Optionally: retry immediately on
document.visibilitychange(user returns to tab)
Relevant code
src/lib/ws-client.tslines 401-425 (scheduleReconnect)src/lib/ws-client.tsline 94 (maxReconnectAttempts = 10)src/lib/ws-client.tsline 95 (baseReconnectDelay = 1000)