Skip to content

EPIC: Dialogue — structured chat UI for agent sessions #250

@kirich1409

Description

@kirich1409

Dialogue — structured chat UI for agent sessions

Target: macOS 26+ native client. First supported agent: Claude Code. Architecture opens path to ACP / Codex / other agents post-MVP.


1. Problem & motivation

Today every agent session in Relay is a raw PTY rendered via SwiftTerm. For shell sessions this is correct. For AI-coding agents — it is a mismatch: the agent produces structured output (messages, tool calls, thinking, plugin invocations, diffs), and rendering it as undifferentiated ANSI text discards meaning, hurts scannability, and prevents UX polish (markdown, syntax-highlighted code, collapsible tool output, approval sheets, live progress).

Dialogue introduces a second presentation surface for agent sessions — a structured chat UI that parses the agent's stream and renders messages, tool calls, and thinking as first-class UI primitives. The raw terminal stays available for sessions that need it (interactive TUI prompts, /login, shell-like flows).

Selection is at session creation, not a live toggle — Claude Code's interactive TTY and -p --output-format stream-json are different CLI entrypoints with incompatible stdout contracts. The UX compensation for "switch mode" expectation is an Open as Terminal / Open as Dialogue action that creates a sibling session via claude --resume <session_id> (context preserved).

2. User stories

  • As a user running a Claude Code session on a task, I want the agent's answer rendered as formatted markdown with tool invocations shown as cards, so I can scan the work done without re-reading raw terminal output.
  • As a user, I want to see live progress while the agent is working — which tool is currently running, what thinking is happening, when a message is being streamed — so I feel the agent is actively working for me.
  • As a user, I want to switch a session from Dialogue to raw Terminal without losing the conversation context, so that when the agent drops into a TUI flow I can still finish it.
  • As a user with accessibility needs, I want every interactive element to be keyboard-reachable and VoiceOver-labelled, every animation to respect Reduce Motion, and every status signal to be pair-coded with an icon (not color alone).

3. Naming

Feature name: Dialogue. In code: SessionSurface.agentDialogue(.claudeCode), DialogueFeature (TCA reducer), DialogueView (SwiftUI), DialogueMarkdownView (markdown renderer wrapper). User-facing strings: "Dialogue", "New Dialogue Session", "Open as Dialogue".

Considered alternatives (rejected): Parley, Atelier, Converse, Scribe, Studio. Dialogue wins on symmetry with Terminal, cross-lingual clarity, no conflict with Claude / Anthropic brand terms.

4. Architecture

4.1 Packages

Package Layer Content
AgentChat (new) Domain + TCA AgentMessage, ToolCall, AgentStreamEvent, AgentChatSession protocol, ClaudeStreamJSONParser, AgentChatFeature reducer. No SwiftUI, no gRPC, no SwiftTerm. Pure Swift + Foundation.
AgentChatUI (new) SwiftUI presentation DialogueView, UserMessageView, AssistantMessageView, ToolCallCardView, specialized tool widgets, ThinkingIndicatorView, ApprovalPromptView, DialogueMarkdownView, StreamInspectorView.
TerminalAbstraction Modified TerminalSession protocol gains rawStdout: AsyncStream<Data> — a side-channel of raw bytes before VT parsing. Default impl provided.
TerminalSwiftTerm / RemoteTerminal Modified Implement rawStdout. Fork PTY-master bytes (local) or ServerMessage.stdout_data (gRPC) into the stream.
AgentOrchestrator / ClaudeCommandBuilder Modified Accept surface: SessionSurface in build; for .agentDialogue(.claudeCode) append -p --output-format stream-json --verbose --include-partial-messages --input-format stream-json --bare --permission-mode bypassPermissions.
SharedModels Modified New SessionSurface enum: .shell / .agentTerminal(AgentKind) / .agentDialogue(AgentKind). Immutable after session creation.
PaneManager Modified Tab.surface: SessionSurface. Routing. New session actions with surface choice.
DesignSystem Modified New primitives: Card, StatusIndicator, BlockCodeContainer. No markdown (ISP).
Relay (app target) Modified MainFeature view routes by tab.surface. Settings UI for preferredClaudeSurface. Menu actions.

4.2 Dependency graph (no cycles)

                Relay.app
                    │
            MainFeature
      ┌─────────┼──────────┐
      ▼         ▼          ▼
  TabFeature  AgentOrchestrator  ServerList…
                    │
            AgentSessionFeature
                    │ (Presentation enum)
          ┌─────────┴──────────┐
          ▼                    ▼
   TerminalFeature       DialogueFeature (AgentChat)
          │                    │
          ▼                    │
  [SwiftTerm, RemoteTerminal]  │
          │                    │
          ▼                    │
  TerminalAbstraction ◄────────┘ (rawStdout)
          │
          ▼
    SharedModels
          ▲
          │
 AgentChatUI ──► AgentChat
          │
          ▼
    DesignSystem
          │
          ▼
  apple/swift-markdown (transitive, only via AgentChatUI)

4.3 Data flow (a Claude Dialogue turn)

  1. User types prompt in DialogueView input.
  2. AgentChatFeature sends bytes to TerminalSession.write(_:) as stream-json ClientMessage.
  3. Claude process writes NDJSON to stdout → gRPC ServerMessage.stdout_dataTerminalSession.rawStdout.
  4. ClaudeCodeJSONLSession subscriber reads rawStdout, feeds ClaudeStreamJSONParser.
  5. Parser emits AgentStreamEvent values — messageStarted, textDelta, toolCallStarted, toolCallCompleted, thinkingDelta, sessionEnd, etc.
  6. AgentChatFeature reducer updates transcript: IdentifiedArrayOf<AgentMessage> with dedup by message_id.
  7. DialogueView re-renders — bubble grows with live caret, tool cards flip through lifecycle states, status strip reflects current activity.

4.4 Why client-side parsing in MVP (vs server-side vs ACP)

Evaluated three topologies:

  • Client-side (chosen): runner stays agent-agnostic; zero proto changes; breaking-change count = 0; parser iteration independent of runner release cycle.
  • Server-side (post-MVP): runner parses JSONL, emits structured gRPC events; solves reconnect (structured transcript is first-class); closes persistence question (runner SQLite). Post-MVP migration, synchronous with ACP adoption.
  • ACP (post-MVP): unifies N agents, first-class permissions + diff blocks; no Swift SDK yet (~3000–4500 LOC to write); Claude works via Node-adapter @zed-industries/claude-agent-acp which reduces CLI feature parity. Trigger for adoption: second chat-agent in roadmap OR ACP 1.0 OR Anthropic officially accepts ACP.

The AgentStreamEvent internal model is designed close to ACP's ToolCall/ContentBlock shape so that future migration is a parser swap, not a reducer/UI rewrite.

5. Session surface model

New enum in SharedModels:

public enum SessionSurface: Equatable, Sendable, Codable {
    case shell
    case agentTerminal(AgentKind)
    case agentDialogue(AgentKind)
}

public enum AgentKind: String, Equatable, Sendable, Codable {
    case claudeCode
    // future: codex, aider, acpGeneric
}

Immutable after session creation. AgentSessionFeature.State.presentation: Presentation enum:

public enum Presentation: Equatable {
    case terminal(TerminalFeature.State)
    case dialogue(DialogueFeature.State)
}

Scoped via ifCaseLet. TerminalFeature.State stays pure-terminal; Dialogue state lives in a sibling reducer.

Mode selection UX:

  • New Session creation sheet shows explicit picker — Terminal / Dialogue — when creating an agent session.
  • preferredClaudeSurface in SettingsStore sets the default (initial value: .agentTerminal(.claudeCode); flipped to .agentDialogue in a later release after positive feedback).
  • Keyboard shortcuts: ⌘⇧N — New Dialogue Session; ⌘⌥N — New Terminal Session.
  • Cross-surface resume: menu item on a running session — Open as Terminal / Open as Dialogue (⌘⇧T / ⌘⇧D) — creates a sibling tab with the same session_id via claude --resume <session_id>. Context preserved, physically kill + respawn. User's expectation of "switch mode" is met without breaking the process-level separation.

6. UI concept — live feel

Multi-layer live signals (all behind accessibilityReduceMotion guards):

Token level mini-caret at end of streaming text; text_delta throttled to 30fps (~33ms batching of deltas per runloop); optional character-by-character smoothing when tokens arrive in bursts.

Message level — soft fade-in (DS.Motion.fast, 120ms); auto-scroll anchored to bottom only when user is already at bottom, else floating ↓ New message chip; subtle glow pulse (1-2% opacity) on active streaming bubble.

Tool-call level — card appears instantly with pending status (shimmer skeleton); pending → in_progress pill opacity pulse; completed checkmark spring (scale 0.8 → 1.0); failed shake (±4pt, 2 cycles); live duration tick every 1s.

Session level — top status strip above input: idle / Thinking… / Responding… / Running: Bash (npm test) / error banner with retry info. Tab-icon ambient pulse during streaming.

Specialized tool widgets (full catalog in events spec §3):

Tool Widget pattern
Read / Grep / Glob / LS Path + count preview; expanded = content / match list
Edit / Write / MultiEdit Inline DiffView (green +, red −) with syntax accents
Bash / BashOutput Command pill + mini-terminal (ANSI), exit code, duration; long-running has Abort
Task (sub-agent) Nested DialogueView inside the card (routed via parent_tool_use_id)
TodoWrite Live checklist with pending / in_progress / done pills
WebFetch / WebSearch URL / query preview + OG / result list on expand
mcp__<server>__<tool> Generic MCP card with service badge + input/output JSON tabs

7. Events — source of truth

Full event catalog with wire-level detail — docs/architecture/dialogue-events.md (created in this epic). Summary:

Top-level types: system (subtypes: init, api_retry, plugin_install, compact_boundary, rate_limit, unknown), assistant (complete message), user (tool_result OR echo), result (exactly one, final), stream_event (raw Anthropic SSE wrapped when --include-partial-messages).

stream_event.event.type: message_start, content_block_start, content_block_delta (delta.type: text_delta | input_json_delta | thinking_delta | signature_delta | citations_delta), content_block_stop, message_delta, message_stop, ping.

Dedup: stream_event by (session_id, event.index, event.type, uuid); assistant by message.id; user by tool_use_id; system by uuid. assistant snapshot is authoritative over partial state.

Edge cases covered: CLI crash mid-turn, reconnect + VT-snapshot duplicates, unknown event types (never crash), partial JSON parse, long thinking content, large tool_result (>1 MiB truncation), cost thresholds, user interrupt (⌘. via streaming-input ControlRequest → SIGINT fallback).

8. External dependencies

New SPM dependencies (all require explicit approval before adoption):

  • apple/swift-markdown (Apache-2.0) — AST parser for markdown rendering. Used as AST-provider; inline rendering delegated to Apple's AttributedString(markdown:, options: .inlineOnlyPreservingWhitespace); block rendering is a custom MarkupVisitor. Bus factor 0 (Apple), streaming control is ours, full render of chat-relevant markdown.
  • (Optional, v1.1+) JohnSundell/Splash or equivalent for syntax-highlighted code fences. Not in MVP — MVP renders code fences as monochrome monospaced blocks.

Explicitly rejected for this feature:

9. MVP acceptance criteria

Before Dialogue MVP is considered done:

  1. A .agentDialogue(.claudeCode) session can be created from the UI; the CLI launches with the expected stream-json flags and JSONL is parsed into AgentChatFeature.State.transcript.
  2. Progressive markdown renders assistant messages with 30fps throttling — no visible flicker on a typical 2000-token answer.
  3. Tool calls appear as cards with lifecycle state (pending → in_progress → completed / failed) and specialized widgets for Read / Edit / Bash / Grep / WebFetch / WebSearch / mcp__*. Output over 2KB truncates with "Show full".
  4. Parser errors do not crash the UI — unknown events log + surface in debug panel, session continues.
  5. Graceful terminate — process exit closes AsyncStreams; no zombie processes.
  6. Pre-flight check — missing auth / /login interactive flow shows explicit error screen "Needs interactive setup. Open as Terminal".
  7. Open as Terminal action (⌘⇧D) creates a sibling tab via claude --resume <session_id> with preserved context. Reverse Open as Dialogue works the same way.
  8. A11y — all 9 DS checklist points satisfied for every Dialogue view; Stream Inspector is VoiceOver-navigable; all animations guarded by accessibilityReduceMotion.
  9. Stream Inspector (⌘⌥R) reveals raw JSONL stream for debugging parser issues.

10. Out of MVP (deliberately deferred)

  • --include-hook-events — undocumented schema, silently skipped.
  • AskUserQuestion tool — requires streaming-input + ControlResponse; MVP uses bypassPermissions.
  • Per-tool permission callback UI sheet — MVP uses global --permission-mode.
  • Plan Mode as live feature — dividers rendered, no "Accept plan" UI.
  • Streaming thinking realtime — accumulated, rendered only on expand.
  • MCP tool discovery progress.
  • Slash-command authoring / editing.
  • Attachment (image / file) input.
  • Subagent as a separate tab — nested inside TaskWidget, not a tab.
  • Structured output streaming (--json-schema is final-only in result.structured_output).
  • Extended thinking with max_thinking_tokens (disables stream_events).
  • Client-side adaptive rate-limit back-off.
  • Background tasks (system/task_*, CronCreate).
  • Syntax highlighting for code fences (v1.1).
  • Chat transcript persistence across app restarts (v1.1 / post-MVP server-side migration).

11. Risks (top)

Risk Severity Mitigation
Reconnect via VT-snapshot replay breaks JSONL mid-line Critical Parser robust to partial lines; dedup by Claude message_id. Post-MVP — server-side topology solves it first-class.
Agent drops to interactive TUI (/login, OAuth) that Dialogue can't show Major Pre-flight claude doctor / auth check; missing → explicit "Open as Terminal" screen.
Claude stream-json schema drift Major Tolerant Codable (decodeIfPresent), unknown event → log + debug panel + continue. CI test against live CLI.
User confusion from missing toggle ("I said switch mode") Major Explicit picker at creation + "Open as X" actions + release notes. First release — .agentTerminal is default; .agentDialogue opt-in.
Large tool_output hangs card UI Major Truncate 2KB / 40 lines + "Show full" modal. Very large → binary summary.
apple/swift-markdown types not Sendable under strict concurrency Major Parse on @MainActor, pass only RenderedBlock / AttributedString across actor boundaries.
SwiftUI rerender performance on long transcript Major LRU [hash(blockSource): RenderedBlock] cache; LazyVStack; block-boundary stable/tail split. Profile post-MVP; escalate to performance-expert if red.
AttributedString(markdown:) inline parser differs from cmark-gfm on edge cases Minor Fallback = write own inline visitor (~1.5 days).

12. Wave breakdown

Tasks decomposed across six waves (0 = prerequisites, 5 = polish). Sub-issue list at the bottom of this epic is populated when child tasks are created. See each child for full acceptance criteria.

Wave Focus Unblocks
0 DesignSystem primitives (ds-api) 3, 4
1 rawStdout hook, SessionSurface, docs commit 2
2 AgentChat domain + parser + reducer (no UI) 3, 4
3 AgentChatUI views, markdown renderer, widgets 4
4 Integration — routing, command builder, settings, "Open as X" 5
5 Polish — a11y sweep, tests, profiling, docs

Parallelism within a wave is encouraged; cross-wave is mostly strict (see Blocked by / Blocks fields on each child).

13. References

  • Research report (frozen): swarm-report/chat-mode-ui-research.md (in research worktree; summary committed to docs/architecture/dialogue.md as part of wave 1).
  • Events spec: swarm-report/chat-mode-spec-events.md → committed to docs/architecture/dialogue-events.md in wave 1.
  • External: Claude Code headless, streaming output, Anthropic streaming API, ACP.

Sub-issues

Full breakdown across 6 waves. Sub-issue links via GitHub parent/child relationship; blockers documented per-issue via Blocked by references. Status on the Relay project board.

Wave 0 — DesignSystem primitives (ds-api)

Wave 1 — Foundation

Wave 2 — Domain & Parser

Wave 3 — AgentChatUI

Wave 4 — Integration

Wave 5 — Polish


Dependency graph (critical path)

Wave 0 (DS primitives: 251, 252, 253)   ──► Wave 3 UI
Wave 1 (254 docs, 255 rawStdout, 256 surface)
  │                                       
  ├─ 254 docs ───► Wave 2 domain
  ├─ 255 rawStdout ─► 259 adapter (Wave 2)
  └─ 256 surface ──► 270 ClaudeCommandBuilder (Wave 4)

Wave 2 (257 AgentChat → 258 Parser → 259 Adapter → 260 Reducer → 261 Lifecycle)

Wave 3 (262 AgentChatUI pkg → 263 Markdown → 264 Messages, 265 Card, 267 Thinking → 266 Widgets → 268 DialogueView → 269 Inspector)

Wave 4 (270 Builder → 271 Presentation → 272 Tab.surface → 273 Routing, 274 Settings, 275 Open as X, 276 Preflight)

Wave 5 (277 a11y → 278 tests → 279 perf → 280 docs)

Parallelism hints

Within each wave:

Typical single-developer calendar: 4–6 weeks end-to-end.

Metadata

Metadata

Assignees

No one assigned

    Labels

    complexity:LdialogueDialogue feature — structured chat UI for agent sessionsepicEpic / umbrella tracking issuefrontend

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions