Skip to content

Implement LLM Agent UX (per spec from #11) #377

@intendednull

Description

@intendednull

Spec

Summary

Make LLM-powered agents first-class participants in Willow servers. Agents join channels, read messages, and reply like any other member, but their identity (PeerKind), capabilities, presence, and UI treatment are distinct (bot badge, "Agent" tag, streaming responses, thinking indicators, data-policy disclosure). Implementation introduces a new crates/bot/ worker (separate from the existing willow-agent MCP crate), additive Profile/ProfileDelta fields, ephemeral WireMessage variants for streaming/thinking, an admin-only SetAgentConfig event, and — as a prerequisite — a versioned-envelope forward-compat layer so future EventKind/WireMessage variants don't force flag-day upgrades.

Build phases

Phases follow §12 of the spec.

  • Phase 0 — Forward-compat foundation (prerequisite). Versioned-envelope (kind_tag: u32, payload: Vec<u8>) for both EventKind and WireMessage, with Unknown no-op fallback in apply_event. Without this, every later phase is a breaking wire change.
  • Phase 1 — Identity & UI (foundation).
    • Extend Profile and ProfileDelta with peer_kind: Option<PeerKind> and data_policy: Option<DataPolicy> (additive #[serde(default)] on Profile; Option<Option<_>> on ProfileDelta).
    • Update apply_event(UpdateProfile) to overlay the new fields.
    • Wire PeerKind and DataPolicy through profile UI (member list, profile card).
    • Insert "Agents" section into member list between Infrastructure and Members.
    • Add bot badge and "Agent" tag to message rendering.
    • Add collapsible long messages (>20 lines) with "Show more" toggle in crates/web/src/components/message.rs (applies to all messages, not just agents).
  • Phase 2 — Interaction (core loop).
    • Create new crates/bot/ crate shipping willow-bot binary (separate from crates/agent/ MCP crate).
    • Add BotWorker implementing WorkerRole + InferenceBackend trait.
    • Add new WorkerRoleInfo::Bot { inference_in_flight, inference_capacity } variant; update WorkerRoleInfo::role_name() arm and round-trip tests.
    • Implement @mention detection and routing.
    • Add agent TOML config loader.
    • Thread-scoped context building (walk reply_to: Option<EventHash> chain).
  • Phase 3 — Streaming (polish).
    • Add WireMessage::StreamStart / StreamChunk / StreamEnd / StreamCancel ephemeral variants.
    • Streaming message rendering in web UI (blinking cursor, scroll-locked, "New messages" pill).
    • Stop button + cancel protocol (cancel emits StreamEnd + final EventKind::Message with partial body).
    • Add WireMessage::Thinking / StoppedThinking ephemeral indicators with 30s UI auto-clear timeout.
  • Phase 4 — Configuration & Privacy (trust).
    • New admin-only EventKind::SetAgentConfig { peer_id: EndpointId, system_prompt, auto_respond_channels, rate_limit } (gated via state.is_admin(author) in the dedicated admin block of check_permission, NOT via required_permission()).
    • Server settings > Agents UI (per-agent system prompt, default channels, "Add Agent" / "Remove Agent").
    • Per-channel agent enable / auto-respond toggles.
    • Rate-limit enforcement (server default 10/min, per-channel override).
    • Data-policy disclosure tooltip + one-time confirmation dialog when first @mentioning a CloudProvider agent.
    • Encryption transparency notice on agent messages in encrypted channels.
  • Phase 5 — Slash commands & discovery.
    • Agent-registered command metadata in profile.
    • Command palette integration (/ trigger).
    • Regenerate action on agent messages (emits DeleteMessage of agent's prior response + new EventKind::Message preserving reply_to).
  • Phase 6 (optional) — Atomic Kick+Rotate. If the multi-event Propose-KickMember → votes → per-channel RotateChannelKey sequence proves error-prone, add either ProposedAction::KickAndRotate (preferred — preserves governance) or top-level EventKind::KickAndRotate (admin-only, faster). Pick exactly one shape.

Acceptance criteria

  • Old peers running pre-Phase-0 builds continue to function (with reduced capability) when a new peer broadcasts unknown EventKind / WireMessage variants — no panics, no deserialization failures.
  • Profile / ProfileDelta round-trip tests cover the new peer_kind and data_policy fields and confirm absent fields decode as None (#[serde(default)] semantics).
  • Member list renders three ordered sections: Infrastructure, Agents, Members. Agents show bot badge, provider tag, and Online/Thinking/Offline indicator.
  • Agent messages render with bot badge and "Agent" tag; messages over 20 lines (any author) auto-collapse with a "Show more" toggle.
  • willow-bot binary in crates/bot/ connects via the standard worker runtime, subscribes to SERVER_OPS_TOPIC, WORKERS_TOPIC, and per-channel channel_topic(server_id, channel_id) topics, and announces WorkerRoleInfo::Bot { .. } in heartbeats.
  • @mention triggers an inference call; the agent responds via WireMessage::StreamStart/Chunk/End plus a final EventKind::Message carrying the full body, with reply_to set to the invoking event's EventHash.
  • WireMessage::StreamCancel from a user causes the agent to emit StreamEnd plus a final EventKind::Message containing the partial body generated so far.
  • Stream chunks do not appear in the event store — replay materializes only the final EventKind::Message.
  • Thinking / StoppedThinking indicators render in the message-input bar and member list, and auto-clear after 30s without a StoppedThinking.
  • EventKind::SetAgentConfig is rejected (ApplyResult::Rejected) when author is not state.is_admin(author); accepted when admin.
  • Granting ManageRoles to an agent and proposing ProposedAction::GrantAdmin { peer_id: <agent> } both surface high-friction confirmation dialogs explaining the multi-vote nature on multi-admin servers.
  • One-time CloudProvider confirmation dialog fires on first @mention of a cloud-backed agent and is remembered per agent thereafter.
  • Rate-limit hits surface a "Agent rate limited" system message; excess messages are dropped silently otherwise.
  • All new code passes just check (fmt + clippy + test + WASM) with zero warnings; new behaviour is covered at the lowest viable test tier per CLAUDE.md (state > client > browser > Playwright).
  • crates/agent/ (the MCP/JSON-RPC bridge) is unchanged in scope; the new in-protocol bot lives in crates/bot/ to avoid scope collision.

Out of scope

  • File / attachment sharing by agents. The spec notes file attachment is not yet a state event; would require a new EventKind::AttachFile and an associated permission. Not part of this work.
  • Granting agents the ability to delete other users' messages. The spec notes this would require a new ModerateMessages permission and gating DeleteMessage by author-or-moderator; explicitly out of scope for shipping Regenerate.
  • Adding ReadMessages, ManageMessages, or BanMembers permissions. None exist today; read access remains implicit at the gossip layer.
  • Hard-deleting messages (the DeleteMessage tombstone is soft — replay still sees the original event; this stays as-is).
  • Persisting stream chunks in the event store. Stream chunks remain ephemeral / wire-only by design.
  • Reusing crates/agent/ for the in-protocol LLM bot. The MCP bridge and the in-protocol bot are deliberately separated into two crates.

Open questions

  • E2E encryption path for agent messages (§3 / §8). EventKind::Message.body is plain String today and travels in the state DAG; encrypted human messages travel through the parallel Content/SealedContent path in crates/messaging + crates/crypto. Pick before shipping Phase 4:
    • (a) Treat EventKind::Message.body as opaque ciphertext (e.g. base64 SealedContent) and decrypt in the rendering layer — uniform event-sourced path but breaking schema change.
    • (b) Route agent output through the Content-based message path and reserve EventKind::Message for plaintext-only channels — preserves existing encryption pipeline but bifurcates how agents and humans publish.
  • Atomic Kick+Rotate shape (§4 / Phase 6). If we ship Phase 6, pick exactly one of ProposedAction::KickAndRotate (preserves governance, preferred) vs. top-level EventKind::KickAndRotate (admin-only, faster, loses multi-admin checkpoint). Mixing both adds two ways to express the same thing.
  • Forward-compat strategy for EventKind / WireMessage (§11). Spec recommends Option B (versioned envelope { kind_tag: u32, payload: Vec<u8> }) over Option A (custom Deserialize with hand-managed tag table). Confirm Option B before starting Phase 0 — Option A is documented for completeness but is brittle under variant reordering.
  • Pin-management permission tightness (§4). PinMessage / UnpinMessage are unrestricted in required_permission() today. Decide whether agents pinning messages requires adding a new ManagePins permission, or remains implicit.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions