Sessions permanently stuck after server restart or stream interruption — no startup recovery for orphaned messages/tool parts

### Description

When the OpenCode server restarts (or the process crashes) while a session is actively executing tool calls, the session gets permanently stuck in a "Thinking" state. The root cause is that there is **no startup recovery** that cleans up orphaned assistant messages and tool parts.

#### What happens

1. A session is actively executing tool calls (e.g., bash commands)
2. The server restarts or crashes
3. The in-memory session state (`SessionStatus`) is lost — the session is no longer "busy"
4. But the **database state is stale**: the last assistant message has `time.completed = undefined` (never completed) and tool parts remain in `status: "running"` forever
5. The UI sees the incomplete assistant message and shows a permanent "Thinking" spinner
6. The session cannot recover — sending a new message creates a new loop iteration, but the old orphaned message still exists

#### Root cause analysis

The existing cleanup in `processor.ts:402-417` correctly handles the **normal case** — when the stream ends (normally, via error, or abort), it force-sets any non-terminal tool parts to `status: "error"`. However, this cleanup **only runs if the process survives long enough** to reach it.

There is **zero recovery at startup**:
- `Session.initialize()` does not scan for orphaned messages
- `SessionStatus` (in-memory map) is empty after restart — no stale detection
- No background watchdog checks for sessions stuck in busy state

The only defense is in `toModelMessages()` (`message-v2.ts:740-746`), which converts `pending`/`running` tool parts into `"[Tool execution was interrupted]"` when building the next LLM prompt. This helps contextual recovery if the user sends a new message, but the UI still shows the session as stuck because the orphaned assistant message has no `time.completed`.

#### Observed in production

- Session `ses_2f4299f5cffeVZfxCt3ViZ7eVJ` stuck for 3+ hours with a `git log` tool part permanently in `"running"` status
- Session `ses_2e9127723ffeKJ1JpjLNS35B4z` similar pattern (though this one was actually still running a long k8s test — but demonstrates the same vulnerability)

#### Relation to existing issues

This is the **backend root cause** behind several reported symptoms:
- #17680 — Web UI permanent Thinking spinner after stream interruption
- #16856 — TUI stuck QUEUED badges from orphan assistant messages
- #14769 — Intermittent hang: session stays running forever
- #11865 — Subagents stuck with no timeout/retry
- #13841 — Explore subagent hangs indefinitely with no recovery

Open PRs #16907 and #17593 address **frontend symptoms** (making the UI more defensive about stale state), but neither fixes the **backend root cause** — orphaned messages and tool parts in the database.

### Proposed fix

**Startup recovery** in `Session` or app bootstrap:

1. On server start, query all messages where `time.completed IS NULL` and the message `role = "assistant"`
2. For each orphaned message:
   - Set `time.completed = Date.now()`
   - Set all tool parts with `status = "running"` or `status = "pending"` to `status = "error"` with `error = "Tool execution was interrupted (server restart)"`
   - Emit Bus events so connected frontends update

This is a small, safe change — the cleanup logic already exists in `processor.ts:402-417`, it just needs to be callable from a recovery path at startup.

### Steps to reproduce

1. Start `opencode serve`
2. Start a session that uses tool calls (e.g., ask it to run tests)
3. Kill the server process while tools are executing (`kill -9`)
4. Restart the server
5. Open the session in the UI — it shows permanent "Thinking" spinner
6. Session status API returns `{}` (idle) but the UI is stuck

### Environment

- opencode serve (long-running, multiple sessions)
- macOS / Linux
- Any provider (observed with gpt-5.3-codex via github-copilot)

### OpenCode version

Latest dev branch (commit `814a515a8`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sessions permanently stuck after server restart or stream interruption — no startup recovery for orphaned messages/tool parts #19023

Description

What happens

Root cause analysis

Observed in production

Relation to existing issues

Proposed fix

Steps to reproduce

Environment

OpenCode version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Sessions permanently stuck after server restart or stream interruption — no startup recovery for orphaned messages/tool parts #19023

Description

Description

What happens

Root cause analysis

Observed in production

Relation to existing issues

Proposed fix

Steps to reproduce

Environment

OpenCode version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions