Parent
Part of #204 (Phase 3: Multi-Rig + Scaling)
Problem
The xterm.js terminal in the Gastown dashboard has three recurring rendering/stability issues that degrade the experience significantly:
- UI becomes completely unresponsive — blinking cursor, no keyboard input accepted, requires a full page refresh
- "Blacked out" artifacts — certain row/col coordinates render as blank/corrupt cells, making text unreadable
- Raw JSON overlay in chat box —
{"cursor":<number>} text appears overlaid in the terminal, interfering with the TUI's own rendering
Issues 2 and 3 can be temporarily fixed by running a command that refreshes the TUI (e.g., /status). Issue 1 requires a page refresh.
Root Causes
Issue 1: Unresponsive terminal — WebSocket disconnects with no reconnection
Files: useXtermPty.ts:187-191, TerminalBar.tsx:779-783
When the PTY WebSocket closes (container restart, network blip, container sleep after 30min idle), the handler sets connected=false but never attempts to reconnect. The xterm instance stays mounted with cursorBlink: true (hence the blinking cursor), but term.onData silently drops keystrokes because ws.readyState !== WebSocket.OPEN.
The alarm status WebSocket (useAlarmStatusWs at TerminalBar.tsx:340) does have 3-second reconnection logic. The PTY WebSockets do not.
Contributing factors:
TownContainerDO.sleepAfter = '30m' — container sleeping kills all WebSocket connections
- If the Mayor agent restarts (new session), the PTY session ID becomes stale but the terminal component doesn't detect this unless the
mayorAgentId changes
- No visual indicator that the WebSocket is disconnected — the user just sees a dead terminal
Issue 2: Blacked out artifacts — PTY/xterm dimension mismatch during resize
Files: useXtermPty.ts:117-126, TerminalBar.tsx:796
ResizeObserver calls fitAddon.fit() with no debounce. During CSS transitions (sidebar expand/collapse, terminal bar resize), this fires many rapid resize events. Each triggers:
fitAddon.fit() — synchronously resizes xterm's viewport (instant)
resizePtySession tRPC mutation — async: browser → tRPC → Worker → TownContainerDO → Container → SDK PUT /pty/:id → process.resize(cols, rows) (network latency)
Between steps 1 and 2, the TUI renders to the old PTY dimensions while xterm expects the new dimensions. Cells that the TUI didn't repaint appear as "blacked out" — the xterm viewport grew but the TUI hasn't redrawn those cells yet.
The setTimeout(() => fit(), 50) in TerminalBar.tsx:822-823 partially mitigates this for sidebar changes but not for ResizeObserver events.
Issue 3: {"cursor":N} JSON overlay — missing control frame filter
Files: useXtermPty.ts:170-184, TerminalBar.tsx:763-776
The Kilo SDK's PTY module sends cursor metadata as a binary WebSocket frame with a 0x00 prefix byte (pty/index.ts:27-34):
// SDK sends: [0x00, ...JSON.stringify({cursor: N})]
The native Kilo desktop app correctly filters these (packages/app/src/components/terminal.tsx:472-486):
if (bytes[0] !== 0) return; // Not a control frame — skip
const json = decoder.decode(bytes.subarray(1)); // Parse metadata
The Gastown browser code does not check for the 0x00 prefix. It writes all binary data directly to xterm:
if (e.data instanceof ArrayBuffer) {
term.write(new Uint8Array(e.data)); // ← No 0x00 check!
}
The NUL byte is ignored by xterm.js, but {"cursor":123} renders as visible text in the terminal viewport, overlapping with the TUI's own output.
A secondary path exists for when the proxy chain converts the binary frame to a string. The filter at line 175 (e.data.startsWith('{') + JSON.parse) catches this case but is fragile — leading whitespace or a preserved NUL byte would bypass it.
Fixes
Fix 1: WebSocket reconnection with exponential backoff
Add reconnection logic to PTY WebSockets, matching the pattern already used by useAlarmStatusWs:
- On
ws.onclose: if not intentionally closed, attempt reconnection after 1s → 2s → 4s → 8s (capped)
- Before reconnecting, check if the PTY session still exists (GET
/agents/:id/pty/:ptyId/status)
- If the PTY session is gone (container restarted), create a new PTY session and reconnect
- Show a visual "Reconnecting..." indicator in the terminal bar
- If reconnection fails after N attempts, show "Connection lost — click to reconnect" with a manual retry button
Apply to both useXtermPty.ts and the duplicated logic in TerminalBar.tsx (or deduplicate — see Fix 5).
Fix 2: Debounce resize events
Wrap the ResizeObserver callback and fitAddon.fit() calls with a debounce (e.g., 150ms). Only send the resizePtySession tRPC mutation after the debounce settles. This prevents resize storms during CSS transitions and ensures the PTY gets a single, final resize rather than dozens of intermediate ones.
Additionally, after sending the resize mutation, wait for it to complete before allowing the next resize. This ensures the PTY dimensions and xterm dimensions stay in sync.
Fix 3: Filter 0x00 control frames in the WebSocket message handler
Before writing binary data to xterm, check for the control frame prefix:
if (e.data instanceof ArrayBuffer) {
const bytes = new Uint8Array(e.data);
if (bytes.length > 0 && bytes[0] === 0) {
// Control frame — parse metadata, don't write to terminal
return;
}
term.write(bytes);
}
This matches the native Kilo app's implementation at packages/app/src/components/terminal.tsx:472-486.
Fix 4: Visual connection status indicator
Add a small status badge to the terminal bar showing WebSocket state:
- Connected (green dot — default, unobtrusive)
- Reconnecting (yellow dot + "Reconnecting...")
- Disconnected (red dot + "Connection lost — click to reconnect")
This gives users immediate feedback instead of a mysteriously dead terminal.
Fix 5: Deduplicate terminal setup code
MayorTerminalPane in TerminalBar.tsx:614-836 duplicates the entire xterm/WebSocket/resize setup from useXtermPty.ts. This means every fix needs to be applied in two places. Extract the shared logic into useXtermPty (or a new shared hook) and use it from both MayorTerminalPane and AgentTerminalPane. This is not just cleanup — the duplicated code paths can diverge, and any fix applied to one but not the other creates a new inconsistency.
Acceptance Criteria
Notes
- The
{"cursor":N} fix (Fix 3) is a one-liner with a clear reference implementation in the native Kilo app
- The WebSocket reconnection (Fix 1) is the highest-impact fix — it addresses the most annoying symptom (permanent freeze requiring page refresh)
- The resize debounce (Fix 2) explains why running
/status fixes the rendering — the TUI modal redraws the full viewport, overwriting the stale cells
- The code deduplication (Fix 5) should be done first to avoid applying fixes in two places
Parent
Part of #204 (Phase 3: Multi-Rig + Scaling)
Problem
The xterm.js terminal in the Gastown dashboard has three recurring rendering/stability issues that degrade the experience significantly:
{"cursor":<number>}text appears overlaid in the terminal, interfering with the TUI's own renderingIssues 2 and 3 can be temporarily fixed by running a command that refreshes the TUI (e.g.,
/status). Issue 1 requires a page refresh.Root Causes
Issue 1: Unresponsive terminal — WebSocket disconnects with no reconnection
Files:
useXtermPty.ts:187-191,TerminalBar.tsx:779-783When the PTY WebSocket closes (container restart, network blip, container sleep after 30min idle), the handler sets
connected=falsebut never attempts to reconnect. The xterm instance stays mounted withcursorBlink: true(hence the blinking cursor), butterm.onDatasilently drops keystrokes becausews.readyState !== WebSocket.OPEN.The alarm status WebSocket (
useAlarmStatusWsatTerminalBar.tsx:340) does have 3-second reconnection logic. The PTY WebSockets do not.Contributing factors:
TownContainerDO.sleepAfter = '30m'— container sleeping kills all WebSocket connectionsmayorAgentIdchangesIssue 2: Blacked out artifacts — PTY/xterm dimension mismatch during resize
Files:
useXtermPty.ts:117-126,TerminalBar.tsx:796ResizeObservercallsfitAddon.fit()with no debounce. During CSS transitions (sidebar expand/collapse, terminal bar resize), this fires many rapid resize events. Each triggers:fitAddon.fit()— synchronously resizes xterm's viewport (instant)resizePtySessiontRPC mutation — async: browser → tRPC → Worker → TownContainerDO → Container → SDK PUT/pty/:id→process.resize(cols, rows)(network latency)Between steps 1 and 2, the TUI renders to the old PTY dimensions while xterm expects the new dimensions. Cells that the TUI didn't repaint appear as "blacked out" — the xterm viewport grew but the TUI hasn't redrawn those cells yet.
The
setTimeout(() => fit(), 50)inTerminalBar.tsx:822-823partially mitigates this for sidebar changes but not forResizeObserverevents.Issue 3:
{"cursor":N}JSON overlay — missing control frame filterFiles:
useXtermPty.ts:170-184,TerminalBar.tsx:763-776The Kilo SDK's PTY module sends cursor metadata as a binary WebSocket frame with a
0x00prefix byte (pty/index.ts:27-34):// SDK sends: [0x00, ...JSON.stringify({cursor: N})]The native Kilo desktop app correctly filters these (
packages/app/src/components/terminal.tsx:472-486):The Gastown browser code does not check for the
0x00prefix. It writes all binary data directly to xterm:The NUL byte is ignored by xterm.js, but
{"cursor":123}renders as visible text in the terminal viewport, overlapping with the TUI's own output.A secondary path exists for when the proxy chain converts the binary frame to a string. The filter at line 175 (
e.data.startsWith('{')+JSON.parse) catches this case but is fragile — leading whitespace or a preserved NUL byte would bypass it.Fixes
Fix 1: WebSocket reconnection with exponential backoff
Add reconnection logic to PTY WebSockets, matching the pattern already used by
useAlarmStatusWs:ws.onclose: if not intentionally closed, attempt reconnection after 1s → 2s → 4s → 8s (capped)/agents/:id/pty/:ptyId/status)Apply to both
useXtermPty.tsand the duplicated logic inTerminalBar.tsx(or deduplicate — see Fix 5).Fix 2: Debounce resize events
Wrap the
ResizeObservercallback andfitAddon.fit()calls with a debounce (e.g., 150ms). Only send theresizePtySessiontRPC mutation after the debounce settles. This prevents resize storms during CSS transitions and ensures the PTY gets a single, final resize rather than dozens of intermediate ones.Additionally, after sending the resize mutation, wait for it to complete before allowing the next resize. This ensures the PTY dimensions and xterm dimensions stay in sync.
Fix 3: Filter 0x00 control frames in the WebSocket message handler
Before writing binary data to xterm, check for the control frame prefix:
This matches the native Kilo app's implementation at
packages/app/src/components/terminal.tsx:472-486.Fix 4: Visual connection status indicator
Add a small status badge to the terminal bar showing WebSocket state:
This gives users immediate feedback instead of a mysteriously dead terminal.
Fix 5: Deduplicate terminal setup code
MayorTerminalPaneinTerminalBar.tsx:614-836duplicates the entire xterm/WebSocket/resize setup fromuseXtermPty.ts. This means every fix needs to be applied in two places. Extract the shared logic intouseXtermPty(or a new shared hook) and use it from bothMayorTerminalPaneandAgentTerminalPane. This is not just cleanup — the duplicated code paths can diverge, and any fix applied to one but not the other creates a new inconsistency.Acceptance Criteria
0x00control frame prefix checked on all binary WebSocket messages —{"cursor":N}never written to xtermMayorTerminalPane,AgentTerminalPane, anduseXtermPtyNotes
{"cursor":N}fix (Fix 3) is a one-liner with a clear reference implementation in the native Kilo app/statusfixes the rendering — the TUI modal redraws the full viewport, overwriting the stale cells