From 82c5833bfb98b37ed094cda949a806663adb6dd9 Mon Sep 17 00:00:00 2001 From: Daniel Franklin Date: Sat, 16 May 2026 10:10:47 -0400 Subject: [PATCH 1/2] docs: add runtime and provider onboarding milestone --- README.md | 1 + ...008-dashboard-first-provider-onboarding.md | 90 +++++ docs/configuration.md | 22 ++ docs/milestones.md | 2 + docs/runtime-and-tui-milestone.md | 338 ++++++++++++++++++ 5 files changed, 453 insertions(+) create mode 100644 docs/adr/008-dashboard-first-provider-onboarding.md create mode 100644 docs/runtime-and-tui-milestone.md diff --git a/README.md b/README.md index fb28796..bd923eb 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ agent007 runs as an MCP server that gives your AI editor a broad orchestration t - **Learning** — passive feedback recording → future PromptOptimizer - **Git agent** — AI-powered branch, commit, PR, and impact analysis - **Web dashboard** — live run/task/memory inspector at `http://localhost:8007`, with standalone task execution when a local provider such as Ollama is configured +- **Dashboard-led provider UX (planned)** — provider health, setup validation, and onboarding will be centered in the web dashboard while preserving config/env-based setup for headless use - **LSP context controls** — configure LSP servers + category injection from config and dashboard (`/api/lsp/config`) - **ETR built-ins** — low-latency deterministic extraction/query/metrics tools to reduce shell+parsing overhead diff --git a/docs/adr/008-dashboard-first-provider-onboarding.md b/docs/adr/008-dashboard-first-provider-onboarding.md new file mode 100644 index 0000000..31eb095 --- /dev/null +++ b/docs/adr/008-dashboard-first-provider-onboarding.md @@ -0,0 +1,90 @@ +# ADR-008: Dashboard-First Provider Onboarding + +**Date:** 2026-05-16 +**Status:** Accepted +**Deciders:** agent007 core team + +## Context + +agent007 currently supports standalone model execution through: + +1. `ANTHROPIC_API_KEY` +2. `OPENAI_API_KEY` +3. `[models.ollama]` in `~/.agent007/config.toml` + +If none of those are configured, agent007 runs in **hosted-MCP** mode and depends on the host editor/LLM session. + +This works, but the current setup model is still operator-heavy: + +- users must understand environment variables and config layout +- provider health is not surfaced clearly enough +- local/self-hosted endpoint setup is not guided +- there is no unified place to understand why standalone mode is unavailable + +At the same time, agent007 already has a web dashboard and runtime status surface. That dashboard is the natural place to make provider setup more discoverable. + +## Decision + +Provider onboarding in agent007 will be **dashboard-first**, while preserving file/env compatibility. + +This means: + +1. The primary user-facing setup flow will live in the web dashboard. +2. Dashboard actions will write or validate the same underlying configuration model (`config.toml`, env-backed provider detection) rather than creating a separate secrets/config system. +3. Manual configuration remains supported and documented for headless, scripted, and advanced setups. +4. OAuth/account-backed login flows are **not** the first slice. The first slice focuses on: + - status/health visibility + - guided setup for current providers + - OpenAI-compatible/local endpoint configuration + - better error reporting and validation + +## Rationale + +- **Matches current product shape**: agent007 already has a dashboard; users should not need a separate auth-only CLI surface for the first usability improvement. +- **Reduces duplicate configuration paths**: the dashboard should not invent a second provider model. It should manage the same provider configuration the runtime already reads. +- **Keeps automation intact**: CI, remote boxes, and power users still need env/config-based setup. +- **Safer first implementation**: health checks, config writing, and explicit validation are much smaller and lower-risk than implementing many provider-specific OAuth/device flows. +- **Supports future expansion**: if selected OAuth providers are later added, the dashboard can host them cleanly without invalidating config-based setups. + +## Consequences + +### Positive + +- Users get a visible provider status surface tied to runtime mode. +- Local/self-hosted endpoint setup becomes easier to validate. +- Hosted-MCP vs standalone mode becomes easier to understand. +- The same configuration remains usable from CLI, files, and dashboard. + +### Negative / Trade-offs + +- agent007 will still lag tools like jcode on multi-provider OAuth breadth in the short term. +- Dashboard-first onboarding increases dependence on the web surface for the best UX. +- Provider-specific OAuth support, if added later, will still require careful credential storage and revocation design. + +## First Slice + +1. Provider status panel in dashboard +2. Guided validation/setup for: + - Claude env/API-key path + - OpenAI/Codex env/API-key path + - Ollama local endpoint + - OpenAI-compatible endpoint +3. Health and failure explanations +4. Documentation updates that clearly state: + - dashboard-first onboarding + - config/env compatibility remains + - hosted-MCP remains a first-class mode + +## Alternatives Considered + +| Alternative | Reason Not Chosen | +|-------------|------------------| +| **CLI-first provider login (`agent007 login --provider ...`)** | Adds a second setup UX before the dashboard/provider-state UX is mature; less aligned with agent007’s existing operator surface | +| **OAuth-first implementation** | Higher complexity, provider-specific maintenance, and secret/session handling burden before basic setup visibility is solved | +| **Keep config/env only** | Lowest implementation cost, but continues the current usability gap and hides runtime/provider problems from normal users | + +## Related ADRs + +- ADR-002 — MCP stdio transport +- ADR-004 — Hosted-MCP workflow execution mode +- ADR-005 — Skills as Markdown with frontmatter diff --git a/docs/configuration.md b/docs/configuration.md index 82e57a0..d6539c3 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -4,6 +4,12 @@ Main config at `~/.agent007/config.toml`. Created by `agent007 init`. +Provider onboarding direction: + +- **Current:** direct config/env setup is supported and remains valid +- **Planned:** dashboard-first provider onboarding/validation will manage the same runtime configuration model instead of replacing it +- **Always supported:** headless/manual setups via `config.toml` and environment variables + ```toml [core] max_agents = 8 # Maximum concurrent agents @@ -33,6 +39,22 @@ default = "claude" # Fallback All fields are optional — agent007 uses sensible defaults if omitted. +## Provider setup modes + +Today, standalone runtime availability is determined from: + +1. `ANTHROPIC_API_KEY` +2. `OPENAI_API_KEY` +3. reachable `[models.ollama]` config + +If none are available, agent007 remains usable in **hosted-MCP** mode, where the connected host/editor LLM executes reasoning and tool orchestration through MCP. + +Planned UX direction: + +- the **web dashboard** becomes the primary setup and validation surface for providers +- manual config/env setup remains compatible +- future OpenAI-compatible endpoint setup should also be manageable from dashboard + --- ## hooks.toml diff --git a/docs/milestones.md b/docs/milestones.md index 815a345..6bdc61c 100644 --- a/docs/milestones.md +++ b/docs/milestones.md @@ -4,6 +4,7 @@ 1. M1 Core Runtime Reliability 2. M2 Visibility and Productization 3. M3 Controlled Rollout and Quality Gate +4. M4 Runtime Sessions, Agent Collaboration, and TUI Usability ## Milestone Table | Milestone | Status | Goal | Key Features | Dependencies | Exit Criteria | @@ -11,6 +12,7 @@ | M1 | ✅ Complete | Consistent retrieval + execution behavior | Warmup indexing bounds, shared skill executor path, telemetry artifact generation, persona policy enforcement | none | core paths green + artifacts persisted | | M2 | ✅ Complete | User-facing observability | run-detail API extension, dashboard telemetry/policy/token cards, docs updates | M1 | artifacts visible in UI and validated | | M3 | 📋 Planned | Safe rollout | strict-mode rollout matrix, KPI baseline tracking, rollback playbook | M2 | measured rollout decision gates | +| M4 | 📋 Planned | Stronger long-lived runtime and operator UX | session lifecycle model, compact runtime status, usable TUI, mock/diagram viewer, agent messaging, provider/browser health UX, memory lifecycle improvements | M2 | sessions resumable, runtime visible in dashboard/TUI, first operator-grade terminal flow usable, visual artifacts reviewable in dashboard | ## Parallel Workstreams 1. Backend: retriever/executor/policy and artifact persistence. diff --git a/docs/runtime-and-tui-milestone.md b/docs/runtime-and-tui-milestone.md new file mode 100644 index 0000000..b4d3e6f --- /dev/null +++ b/docs/runtime-and-tui-milestone.md @@ -0,0 +1,338 @@ +# M4 — Runtime Sessions, Agent Collaboration, and TUI Usability + +## Goal +Strengthen agent007 as a long-lived orchestration runtime by improving session persistence, agent-to-agent coordination, memory flow, browser/provider UX, compact runtime visibility, and terminal usability without changing the product into a terminal-first clone of jcode. + +## Non-Goals +1. Rebuild agent007 around a custom terminal renderer. +2. Replace the web dashboard as the primary control surface. +3. Chase full parity with jcode swarm/runtime features. +4. Introduce speculative distributed execution before single-host session semantics are solid. + +## Why This Milestone Exists +agent007 already has: +- a hosted workflow engine +- MCP server and dashboard +- ETR built-ins for deterministic work +- memory, workflows, and personas + +What it lacks is a stronger **runtime layer**: +- better long-lived session semantics +- explicit agent messaging +- tighter memory lifecycle +- simpler provider/browser setup +- compact live status in both web and terminal views +- a TUI that is usable for daily runtime monitoring + +## Workstreams + +### W1 — Session Server Model +Add a clearer long-lived session model around the existing MCP/runtime surfaces. + +**Scope** +- candidate crates/files: + - `/Users/neo/workspace/agent007/crates/cli/src/commands/serve.rs` + - `/Users/neo/workspace/agent007/crates/workflows/` + - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - `/Users/neo/workspace/agent007/crates/web/frontend/src/views/` + +**Deliverables** +1. Session inventory API: + - active sessions + - last heartbeat + - owning workflow/run + - current status +2. Session resume semantics: + - reconnect without losing step state + - explicit stale/orphaned session handling +3. Session lifecycle rules: + - created + - active + - idle + - awaiting approval + - stale + - completed + +**Acceptance** +- sessions survive routine client reconnects +- stale sessions are visible and recoverable +- hosted workflow state is inspectable without digging through raw files + +### W2 — Agent-to-Agent Messaging +Make collaboration explicit instead of implicit via only workflow state. + +**Scope** +- candidate crates/files: + - `/Users/neo/workspace/agent007/crates/workflows/` + - `/Users/neo/workspace/agent007/crates/core/` + - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - dashboard session/run detail views + +**Deliverables** +1. Internal message envelope: + - from + - to + - session/run + - message kind + - payload + - timestamp +2. Message classes: + - request + - handoff + - progress note + - warning/blocker + - result summary +3. Compact UI surface showing: + - last N messages + - blocked handoffs + - unacknowledged requests + +**Acceptance** +- workflow steps can exchange structured messages +- users can inspect handoffs in dashboard/TUI +- blocked coordination is visible without opening raw traces + +### W3 — Memory Architecture Improvements +Improve how useful memory is captured, compacted, and reused. + +**Scope** +- candidate crates/files: + - `/Users/neo/workspace/agent007/crates/memory/` + - `/Users/neo/workspace/agent007/crates/learning/` + - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - memory views in frontend + +**Deliverables** +1. Memory classes: + - explicit saved note + - run artifact summary + - reusable skill/workflow output + - ephemeral session memory +2. Save-path rules: + - what is auto-recorded + - what requires explicit promotion +3. Compact retrieval summaries: + - high-signal snippets + - source attribution + - freshness/age markers + +**Acceptance** +- repeated sessions reuse relevant prior outcomes with lower context bloat +- users can tell why a memory item exists +- memory retrieval is inspectable and suppressible + +### W4 — Browser / Provider UX +Reduce friction for setup and day-to-day usage. + +**Scope** +- candidate crates/files: + - `/Users/neo/workspace/agent007/crates/web/frontend/src/views/` + - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - provider/browser config surfaces + +**Operating model** +- provider onboarding is **dashboard-first** +- CLI/env/config setup remains supported +- dashboard actions should write or validate the same underlying config surface instead of inventing a second runtime model +- hosted-MCP mode remains valid even when no standalone provider is configured + +**Deliverables** +1. Provider status card: + - configured / missing / degraded + - endpoint in use + - auth state + - runtime mode impact (hosted-mcp / standalone / local-ollama) +2. Browser capability card: + - available backends + - health + - quick test action +3. Better setup paths: + - dashboard onboarding/wizard for supported provider types + - concise validation messages + - direct fix hints + - no silent failures +4. Provider classes to support incrementally: + - env-backed API providers already in repo (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) + - local/self-hosted endpoints (`[models.ollama]`) + - OpenAI-compatible endpoint configuration + - later: selected OAuth/account-backed providers if worth owning +5. Compatibility rules: + - preserve manual `config.toml` editing + - preserve env var overrides + - never require dashboard usage for automation/headless setups + +**Acceptance** +- users can set up or validate a provider from the dashboard +- users can tell why a provider/browser feature is unavailable +- setup failures are actionable +- quick validation works from dashboard +- manual config/env workflows continue to work unchanged + +### W5 — Compact Runtime Visibility +Expose runtime state in a compact, operator-friendly way. + +**Scope** +- web dashboard + TUI surfaces +- ETR/runtime status helpers + +**Deliverables** +1. Compact session cards: + - workflow/run + - phase + - progress + - last tool + - pending approval + - last error +2. Runtime summary endpoints for: + - active sessions + - blocked sessions + - stale sessions + - recent failures +3. ETR-backed inspection where possible for low-noise summaries + +**Acceptance** +- users can understand runtime health in one screen +- noisy raw JSON is not required for normal triage +- terminal and dashboard surfaces show aligned status + +### W6 — TUI Usability +Make the terminal experience genuinely usable for monitoring and control. + +**Scope** +- candidate CLI/TUI surfaces in: + - `/Users/neo/workspace/agent007/crates/cli/` + - dashboard parity for status concepts + +**Deliverables** +1. TUI views: + - sessions list + - run detail + - approvals queue + - recent errors +2. Keyboard actions: + - inspect + - retry + - approve/deny + - copy summary +3. Compact layout rules: + - no giant JSON dumps + - stable widths + - truncation with drill-in + +**Acceptance** +- TUI is usable for day-to-day monitoring +- approvals and failures can be handled without leaving terminal +- status is readable on normal laptop terminal sizes + +### W7 — Mock Viewer and Diagram Preview +Make generated visual/design artifacts first-class in the dashboard so users can review them without leaving agent007. + +**Scope** +- candidate crates/files: + - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - `/Users/neo/workspace/agent007/crates/web/frontend/src/views/` + - `/Users/neo/workspace/agent007/crates/web/frontend/src/components/` + - candidate artifact-serving helpers in web/runtime crates + +**Why this belongs here** +- the dashboard already exists +- runs/workflows already produce artifacts +- agent007 increasingly handles UI/UX, architecture, and design-adjacent tasks +- users should be able to render and review outputs directly instead of manually opening files elsewhere + +**Primary use cases** +1. design viewing in the web dashboard +2. rendering generated UI/UX mock outputs during tasks +3. rendering flow/mermaid/architecture diagrams + +**Deliverables** +1. Viewer surface in dashboard: + - modal, drawer, or dedicated artifact pane + - linked from run/workflow outputs +2. Render modes: + - Mermaid text → rendered diagram + - static image preview (PNG/SVG/WebP) + - HTML/CSS mock preview in a sandboxed iframe + - raw source fallback +3. Artifact metadata: + - type + - size + - source run/session + - renderability flags +4. Review actions: + - open + - copy raw source + - download artifact + - open related run/session context + +**Explicit v1 boundaries** +- not a full design editor +- not a Figma replacement +- not an arbitrary JS app host +- not browser automation embedded into the viewer +- not unrestricted execution of generated code + +**Acceptance** +- users can render Mermaid outputs directly in dashboard +- users can preview generated mock/image artifacts without leaving agent007 +- HTML/CSS previews are sandboxed +- non-renderable artifacts degrade cleanly to raw/source view +- viewer integrates with workflow/run outputs instead of being a disconnected file browser + +## Recommended Order +1. **W1 Session Server Model** +2. **W5 Compact Runtime Visibility** +3. **W6 TUI Usability** +4. **W7 Mock Viewer and Diagram Preview** +5. **W2 Agent-to-Agent Messaging** +6. **W4 Browser / Provider UX** +7. **W3 Memory Architecture Improvements** + +## Suggested PR Slices + +### Slice A — Session Inventory and Lifecycle +- add session list/status API +- add stale/orphan detection +- add dashboard session summary view + +### Slice B — Compact Runtime Summary +- add compact status endpoints/helpers +- add dashboard runtime cards +- align summary shape with terminal output + +### Slice C — First Usable TUI +- sessions list +- run detail +- approval queue + +### Slice D — Agent Messaging Core +- message envelope +- persistence +- message inspection UI + +### Slice E — Provider / Browser UX +- provider health/status +- browser health/status +- setup validation and fix hints + +### Slice F — Memory Lifecycle +- memory classes +- promotion rules +- retrieval summary visibility + +### Slice G — Mock Viewer and Diagram Preview +- artifact viewer panel/modal +- Mermaid renderer +- static image preview +- sandboxed HTML/CSS mock preview +- raw source fallback +- link viewer from workflow/run outputs + +## Definition of Done +1. Long-lived sessions are visible, resumable, and recoverable. +2. Runtime status is compact in both dashboard and terminal views. +3. TUI supports normal operator tasks without raw JSON dependence. +4. Agent handoffs are inspectable. +5. Provider/browser failures are diagnosable from UI. +6. Memory reuse is more transparent and less noisy. +7. Generated visual artifacts can be reviewed directly in dashboard. From c6c259a9ff60aca1d0bbdee996f47fc33d18d04c Mon Sep 17 00:00:00 2001 From: Daniel Franklin Date: Sat, 16 May 2026 10:15:59 -0400 Subject: [PATCH 2/2] docs: address runtime milestone review feedback --- README.md | 2 +- docs/configuration.md | 4 ++-- docs/runtime-and-tui-milestone.md | 36 +++++++++++++++---------------- 3 files changed, 21 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index bd923eb..160016c 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ agent007 runs as an MCP server that gives your AI editor a broad orchestration t - **Learning** — passive feedback recording → future PromptOptimizer - **Git agent** — AI-powered branch, commit, PR, and impact analysis - **Web dashboard** — live run/task/memory inspector at `http://localhost:8007`, with standalone task execution when a local provider such as Ollama is configured -- **Dashboard-led provider UX (planned)** — provider health, setup validation, and onboarding will be centered in the web dashboard while preserving config/env-based setup for headless use +- **Dashboard-first provider UX (planned)** — provider health, setup validation, and onboarding will be centered in the web dashboard while preserving config/env-based setup for headless use - **LSP context controls** — configure LSP servers + category injection from config and dashboard (`/api/lsp/config`) - **ETR built-ins** — low-latency deterministic extraction/query/metrics tools to reduce shell+parsing overhead diff --git a/docs/configuration.md b/docs/configuration.md index d6539c3..894b533 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -41,13 +41,13 @@ All fields are optional — agent007 uses sensible defaults if omitted. ## Provider setup modes -Today, standalone runtime availability is determined from: +Today, real provider-backed standalone runtime availability is determined from: 1. `ANTHROPIC_API_KEY` 2. `OPENAI_API_KEY` 3. reachable `[models.ollama]` config -If none are available, agent007 remains usable in **hosted-MCP** mode, where the connected host/editor LLM executes reasoning and tool orchestration through MCP. +If none are available, agent007 remains usable in **hosted-MCP** mode, where the connected host/editor LLM executes reasoning and tool orchestration through MCP. For tests and demos, `AGENT007_DRY_RUN=1` can also enable standalone execution with the mock provider; it is not a real model provider setup. Planned UX direction: diff --git a/docs/runtime-and-tui-milestone.md b/docs/runtime-and-tui-milestone.md index b4d3e6f..7629743 100644 --- a/docs/runtime-and-tui-milestone.md +++ b/docs/runtime-and-tui-milestone.md @@ -1,12 +1,12 @@ # M4 — Runtime Sessions, Agent Collaboration, and TUI Usability ## Goal -Strengthen agent007 as a long-lived orchestration runtime by improving session persistence, agent-to-agent coordination, memory flow, browser/provider UX, compact runtime visibility, and terminal usability without changing the product into a terminal-first clone of jcode. +Strengthen agent007 as a long-lived orchestration runtime by improving session persistence, agent-to-agent coordination, memory flow, browser/provider UX, compact runtime visibility, and terminal usability without changing the product into a terminal-first clone of jcode (a neighboring AI coding-agent harness focused on persistent terminal sessions). ## Non-Goals 1. Rebuild agent007 around a custom terminal renderer. 2. Replace the web dashboard as the primary control surface. -3. Chase full parity with jcode swarm/runtime features. +3. Chase full parity with jcode-style swarm/runtime features. 4. Introduce speculative distributed execution before single-host session semantics are solid. ## Why This Milestone Exists @@ -31,10 +31,10 @@ Add a clearer long-lived session model around the existing MCP/runtime surfaces. **Scope** - candidate crates/files: - - `/Users/neo/workspace/agent007/crates/cli/src/commands/serve.rs` - - `/Users/neo/workspace/agent007/crates/workflows/` - - `/Users/neo/workspace/agent007/crates/web/src/api.rs` - - `/Users/neo/workspace/agent007/crates/web/frontend/src/views/` + - `crates/cli/src/commands/serve.rs` + - `crates/workflows/` + - `crates/web/src/api.rs` + - `crates/web/frontend/src/views/` **Deliverables** 1. Session inventory API: @@ -63,9 +63,9 @@ Make collaboration explicit instead of implicit via only workflow state. **Scope** - candidate crates/files: - - `/Users/neo/workspace/agent007/crates/workflows/` - - `/Users/neo/workspace/agent007/crates/core/` - - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - `crates/workflows/` + - `crates/core/` + - `crates/web/src/api.rs` - dashboard session/run detail views **Deliverables** @@ -97,9 +97,9 @@ Improve how useful memory is captured, compacted, and reused. **Scope** - candidate crates/files: - - `/Users/neo/workspace/agent007/crates/memory/` - - `/Users/neo/workspace/agent007/crates/learning/` - - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - `crates/memory/` + - `crates/learning/` + - `crates/web/src/api.rs` - memory views in frontend **Deliverables** @@ -126,8 +126,8 @@ Reduce friction for setup and day-to-day usage. **Scope** - candidate crates/files: - - `/Users/neo/workspace/agent007/crates/web/frontend/src/views/` - - `/Users/neo/workspace/agent007/crates/web/src/api.rs` + - `crates/web/frontend/src/views/` + - `crates/web/src/api.rs` - provider/browser config surfaces **Operating model** @@ -200,7 +200,7 @@ Make the terminal experience genuinely usable for monitoring and control. **Scope** - candidate CLI/TUI surfaces in: - - `/Users/neo/workspace/agent007/crates/cli/` + - `crates/cli/` - dashboard parity for status concepts **Deliverables** @@ -229,9 +229,9 @@ Make generated visual/design artifacts first-class in the dashboard so users can **Scope** - candidate crates/files: - - `/Users/neo/workspace/agent007/crates/web/src/api.rs` - - `/Users/neo/workspace/agent007/crates/web/frontend/src/views/` - - `/Users/neo/workspace/agent007/crates/web/frontend/src/components/` + - `crates/web/src/api.rs` + - `crates/web/frontend/src/views/` + - `crates/web/frontend/src/components/` - candidate artifact-serving helpers in web/runtime crates **Why this belongs here**