Skip to content

feat: add multi-agent team coordination primitives#18753

Closed
pamelia wants to merge 7 commits intoanomalyco:devfrom
pamelia:feat/teams-agents-improvements
Closed

feat: add multi-agent team coordination primitives#18753
pamelia wants to merge 7 commits intoanomalyco:devfrom
pamelia:feat/teams-agents-improvements

Conversation

@pamelia
Copy link
Copy Markdown

@pamelia pamelia commented Mar 23, 2026

Issue for this PR

Closes #12711
Closes #12661
Closes #15035
Closes #17994

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds multi-agent team coordination to opencode so a lead agent can orchestrate parallel specialist sub-agents that communicate findings and produce synthesized results. This is the infrastructure that powers skills like spec-review and self-review-loop.

Core subsystems added:

  • Team management (src/team/) — DB-backed teams with lead/member lifecycle. Members get auto-disambiguated names when duplicate agent types are spawned (e.g., general, general-2). Create and disband are transactional. Members have explicit status transitions: active → completed (normal exit), failed (crash), cancelled (team disbanded).

  • Background agent execution (src/tool/task.ts) — Extended TaskTool with background and team_id params. Instance.bind() preserves ALS context. Process-wide concurrency limit (team.max_agents, default 10). When a background agent crashes, the lead gets an injected [AGENT FAILURE] notification and the agent's tasks cascade to failed.

  • Cross-session messaging (src/session/inject.ts) — Synthetic user messages written to target session's DB. The prompt loop picks them up via typed Bus.subscribe. Race-safe: subscribes to Bus first, then checks DB for already-pending messages. Target session's agent type is resolved dynamically (not hardcoded).

  • Agent memory (src/agent/memory.ts) — Persistent per-agent per-project memory in SQLite. 100KB cap with byte-safe UTF-8 truncation. Injected into system prompt when memory: "local" is set on agent config.

  • Team task board (src/team/task.ts) — Shared task state scoped to a team. Tools validate team exists and is active before operating.

  • Prompt loop changes (src/session/prompt.ts) — Background team members wait for injected messages (configurable timeout via team.member_timeout). Members mark themselves completed on normal exit. Session cancellation auto-disbands owned teams. Server startup reconciles stale active teams.

  • TUI sidebar — New collapsible Teams section shows active teams with member agents and status indicators (working/done/failed/cancelled). Wired to Bus events for real-time updates.

  • 5 new tools: team_create, team_delete, team_task, send_message, agent_memory — all permission-gated at execution time.

  • Config: New team section with max_agents (int, default 10) and member_timeout (ms, default 300000).

The design follows existing opencode patterns: DB-backed state, Bus pub/sub, Instance.bind for ALS, permission gating. Migration is additive only (4 new tables).

How did you verify your code works?

  • Ran bun typecheck from packages/opencode — passes (only pre-existing error in index.tsx unrelated to this PR)
  • Tested the full spec-review workflow: created a team, spawned 5 specialist agents in parallel, they reviewed a spec, sent findings via send_message, lead mediated cross-review, produced synthesized output
  • Tested the self-review-loop workflow: spawned fresh code-review sub-agents across 3 turns, triaged findings, applied fixes, committed and pushed
  • Tested failure handling: verified crashed agents notify the lead and cascade task status
  • Tested team lifecycle: session cancellation disbands teams, server restart reconciles stale teams
  • Tested TUI: sidebar updates in real-time as teams are created, members join, and statuses change

Screenshots / recordings

N/A — this is backend infrastructure + a small TUI sidebar addition

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Found 2 potential duplicate PRs:

  1. PR feat: add team tools, HTTP routes, and tool registry integration #12731: feat: add team tools, HTTP routes, and tool registry integration

  2. PR feat(opencode): add agent teams — DB-based parallel multi-session collaboration #15205: feat(opencode): add agent teams — DB-based parallel multi-session collaboration

These older PRs appear to be addressing similar functionality around team management and multi-agent coordination. Recommend reviewing their status and implementation to check for overlap or conflicts with PR #18753.

pamelia added 6 commits March 25, 2026 21:07
Add teams & agents subsystem enabling parallel multi-agent workflows:
- Team management with lead/member lifecycle, auto-disambiguation of duplicate agent names
- Background agent execution with configurable concurrency limit (max_agents)
- Cross-session message injection with race-safe Bus subscription
- Agent memory with 100KB cap, team task board, 5 new tools
- Failure propagation: crashed agents notify lead, cascade to tasks
- Member status updates on normal exit (completed) and crash (failed)
- Session-end auto-disband, server-startup reconciliation
- TUI sidebar shows active teams and agent status
- Spec updated to document all lifecycle, cleanup, and config behaviors
- Fix async Promise executor in waitForInjection (blocker: silent error swallowing)
- Fix running counter leak with try/catch around Instance.bind (blocker)
- Fix completeMember TOCTOU race using RETURNING on UPDATE
- Fix memory truncation to use byte-based slicing for UTF-8 safety
- Add error logging for team reconciliation and disband failures
- Fix double-decrement bug: move Instance.bind outside try/catch, only catch bound() throw
- Clean up abort listener on background agent completion via .finally()
- Fix misleading comment on running counter (process-wide, not per-instance)
- Move running++ after Instance.bind to avoid leak if bind construction fails
- Log errors instead of swallowing them in completeMember and disband paths
- Remove unused variable in reconcile()
- Move abort listener setup before bound() to ensure cancellation works immediately
- Fix race in waitForInjection: subscribe to Bus before checking DB for pending messages
- Wrap addMember disambiguation in Database.transaction to prevent TOCTOU
- Fix team-delete.txt description: members are cancelled, not completed
- agent-memory: make metadata.updated consistently typed across branches
- task: add background/teamID fields to synchronous return path
- team-task: use shared Meta type for consistent task/tasks metadata shape
@pamelia pamelia force-pushed the feat/teams-agents-improvements branch from de4879e to a16695c Compare March 25, 2026 20:08
- 14 tests for Team (create, disband, addMember disambiguation, failMember, completeMember, reconcile, disbandBySession, findMemberSession)
- 8 tests for TeamTask (CRUD, disband cascade, failMember cascade)
- 8 tests for AgentMemory (read, write, append, 100KB cap, per-agent scoping)
- Remove teams-agents-feature.md from repo root (design doc doesn't belong in PR)
@pamelia
Copy link
Copy Markdown
Author

pamelia commented Mar 26, 2026

Closing this — I should have gone through design review before opening a feature PR of this size. Will open a design issue first and, if approved, split into smaller focused PRs. The implementation is available on pamelia/opencode@feat/teams-agents-improvements with 30 tests for reference.

@pamelia pamelia closed this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant