Skip to content

PRD: Copilot SDK Hooks Lifecycle Plugin — Composable Governance via Squad Marketplace #152

@diberry

Description

@diberry

PRD: Copilot SDK Hooks Lifecycle Plugin for Squad

Executive Summary

This PRD proposes a Squad plugin (not a core product change) that implements Copilot SDK lifecycle hooks as a composable, installable package. Unlike PRD #151 which embeds hooks into Squad's core, this approach treats the hooks lifecycle as an opt-in plugin distributed via the Squad marketplace.

The plugin registers onSessionStart, onUserPromptSubmitted, onPreToolUse, onPostToolUse, onSessionEnd, and onErrorOccurred handlers using Squad's existing HookPipeline and SquadSessionHooks APIs. Users install it, configure what they want, and Squad's plugin system activates the hooks at runtime.

Why a plugin instead of core?

  • Not every team needs every hook — a plugin lets teams opt into exactly what they want
  • Faster iteration — plugin ships independently of Squad's release cycle
  • Marketplace showcase — demonstrates that Squad's extension system can handle real governance
  • Lower risk — doesn't touch Squad's critical path; users can uninstall if hooks misbehave
  • Composable — teams can fork the plugin and customize hooks for their domain

How This Fits Squad's Extension Architecture

Squad has a three-layer architecture for extensibility:

Layer What Lives Here This Plugin's Relationship
Squad Core HookPipeline, SquadSessionHooks, adapter layer Plugin consumes these APIs — does NOT modify them
Squad Extensions (plugins) Skills, ceremonies, hook packages This plugin lives here — installed via marketplace
Team Config squad.config.ts, .squad/policies/ Plugin reads config to know which hooks to activate

The plugin uses publicly exported APIs:

  • HookPipeline from @bradygaster/squad-sdk/hooks (addPreToolHook, addPostToolHook)
  • SquadSessionHooks from adapter/types.ts (onSessionStart, onSessionEnd, onErrorOccurred, onUserPromptSubmitted)
  • SquadSessionConfig for session-level hook registration

No core changes required — the plugin works with Squad as it ships today.


Plugin Structure

squad-hooks-lifecycle/
├── package.json
├── README.md
├── src/
│   ├── index.ts              # Plugin entry — exports registerHooks()
│   ├── hooks/
│   │   ├── session-start.ts   # onSessionStart — auto-context loading
│   │   ├── prompt-guard.ts    # onUserPromptSubmitted — directive capture, prompt enhancement
│   │   ├── tool-guard.ts      # onPreToolUse — dynamic policies from decisions.md
│   │   ├── tool-audit.ts      # onPostToolUse — auto-Scribe, audit trail
│   │   ├── session-end.ts     # onSessionEnd — cleanup, metrics, Ralph trigger
│   │   └── error-recovery.ts  # onErrorOccurred — model fallback, rate limit backoff
│   ├── parsers/
│   │   ├── decision-parser.ts # Extracts enforceable rules from decisions.md
│   │   └── directive-detector.ts # Regex + heuristics for "always/never" signals
│   └── config.ts             # Plugin configuration schema
├── skill/
│   └── SKILL.md              # Squad skill teaching agents about the hooks
└── test/
    └── *.test.ts             # Tests for each hook

Registration API

// squad.config.ts — user's project
import { defineSquad } from '@bradygaster/squad-sdk';
import { registerHooks } from 'squad-hooks-lifecycle';

export default defineSquad({
  plugins: [
    registerHooks({
      // Toggle individual hooks on/off
      sessionStart: {
        enabled: true,
        autoLoadHistory: true,     // inject history.md at session start
        autoLoadDecisions: true,   // inject decisions.md snapshot
        autoLoadSkills: true,      // inject matching skills
        maxHistoryEntries: 20,     // limit context size
      },
      promptGuard: {
        enabled: true,
        captureDirectives: true,   // auto-detect "always/never" patterns
        injectDecisions: true,     // add relevant decisions to prompt context
      },
      toolGuard: {
        enabled: true,
        enforceDecisions: true,    // parse decisions.md into tool policies
        agentBoundaries: true,     // restrict writes to charter scope
        decisionFormat: 'structured', // 'structured' | 'prose-inferred'
      },
      toolAudit: {
        enabled: true,
        autoScribe: true,          // trigger Scribe on .squad/ writes
        auditLog: true,            // log all tool calls
        logPath: '.squad/audit/',  // where audit logs go
      },
      sessionEnd: {
        enabled: true,
        autoArchiveHistory: true,  // summarize history.md if over threshold
        captureMetrics: true,      // write session metrics
        ralphTrigger: true,        // notify Ralph of remaining work
      },
      errorRecovery: {
        enabled: true,
        modelFallback: true,       // automatic model fallback chains
        rateLimitBackoff: true,    // exponential backoff on rate limits
        contextOverflowRecovery: true, // summarize + retry on overflow
        fallbackChain: {
          premium: ['claude-opus-4.6', 'claude-opus-4.5', 'claude-sonnet-4.6'],
          standard: ['claude-sonnet-4.6', 'claude-sonnet-4.5', 'gpt-5.4'],
          fast: ['claude-haiku-4.5', 'gpt-5.4-mini', 'gpt-4.1'],
        },
      },
    }),
  ],
});

Real-World Use Cases

Use Cases in Manual Copilot CLI Chat Mode

These are scenarios where a human is sitting at the terminal, chatting with Squad via Copilot CLI.


UC-1: "My agents keep forgetting team decisions"

Persona: Solo developer using Squad on a side project. Has 10+ decisions in decisions.md but agents routinely ignore them.

Hook: onSessionStart (auto-context loading)

Before plugin: User starts a session. Agent spawns. Coordinator prompt says "read decisions.md." Agent reads 3 of 12 decisions (context budget), misses the one about "always use Tailwind". Agent writes Bootstrap CSS. User is frustrated.

With plugin: onSessionStart hook fires, reads decisions.md, and injects the full decision set as structured context. Agent sees all 12 decisions before it writes a single line. Bootstrap is never introduced.

Business value:

Product Squad: Proves that Squad's governance model works in practice, not just on paper. Turns decisions.md from a suggestion box into an enforcement layer — a key differentiator in the AI team framework space.
Customer: Decisions actually stick. The team's institutional memory works every session, not just when agents happen to read the right file.


UC-2: "I said 'always use async/await' last week and the agent used callbacks today"

Persona: Tech lead using Squad for a Node.js project. Gave a directive in a previous session that wasn't carried forward.

Hook: onUserPromptSubmitted (directive capture)

Before plugin: User says "always use async/await, never callbacks." Coordinator captures it in decisions/inbox/ (maybe). Next session, different coordinator context, directive is lost. Agent writes callback-style code.

With plugin: onUserPromptSubmitted hook fires on every message. Detects the "always/never" pattern. Auto-writes to decisions/inbox/. On the next message in any session, onSessionStart loads it from decisions.md. Directive persists permanently.

Business value:

Product Squad: Demonstrates that Squad has memory across sessions — not just within a session. This is a top-3 user complaint ("my agents forget things") and the plugin solves it mechanically.
Customer: Say it once, it sticks forever. No more repeating yourself across sessions.


UC-3: "An agent edited a file it shouldn't have touched"

Persona: Team of 3 using Squad with role-based agents. Backend agent modified a frontend component.

Hook: onPreToolUse (dynamic policies + agent boundaries)

Before plugin: Backend agent "Fenster" is told to fix an API bug. While exploring, it edits src/components/LoginForm.tsx — a frontend file outside its scope. PR review catches it, but time is wasted.

With plugin: onPreToolUse hook fires before every file write. Checks agent's charter scope (backend: src/api/**, src/services/**). LoginForm.tsx is outside scope — write is blocked with reason: "File outside your charter scope. Route to the Frontend agent." Agent self-corrects or coordinator re-routes.

Business value:

Product Squad: Enforces the agent boundary model that Squad promises but currently only advises. Moves from "agents should stay in scope" to "agents cannot leave scope."
Customer: Clean PRs with no scope creep. Code review effort drops because agents physically can't touch files outside their domain.


UC-4: "I want to know what my agents actually did"

Persona: Engineering manager who runs Squad sessions but wants a trail of what happened for compliance.

Hook: onPostToolUse (audit trail)

Before plugin: Session ends. Manager asks "what files did agents touch?" Answer: check orchestration logs (if Scribe ran), check git diff (if agents committed), or re-read the session transcript. Tedious.

With plugin: Every tool call is logged to .squad/audit/ with timestamp, agent name, tool name, arguments summary, and result status. Manager runs cat .squad/audit/2026-04-14*.jsonl | jq '.toolName' | sort | uniq -c and gets a complete picture in seconds.

Business value:

Product Squad: Opens Squad to enterprise customers who require audit trails. Compliance-ready logging is a gate for regulated industries (fintech, healthcare, government).
Customer: Complete visibility without effort. "What happened?" is answered by a JSONL file, not a detective mission through logs.


UC-5: "My session crashed because the model hit a rate limit"

Persona: Developer running a large fan-out (5 agents in parallel). Third agent hits a rate limit and the session dies.

Hook: onErrorOccurred (model fallback + rate limit backoff)

Before plugin: Agent 3 of 5 hits rate limit. Error surfaces to coordinator. Coordinator's prose instructions say "retry with next model" but the context is already pressured from 2 completed agents. Coordinator fumbles the fallback. Session stalls.

With plugin: onErrorOccurred hook fires immediately. Detects rate limit error code. Applies exponential backoff (1s, 2s, 4s). If still rate-limited, falls back to next model in chain (claude-sonnet-4.6 -> claude-sonnet-4.5 -> gpt-5.4). Agent 3 continues on the fallback model. User never notices.

Business value:

Product Squad: Directly addresses the #1 reliability complaint — sessions dying mid-work. Makes Squad viable for heavy workloads (5+ parallel agents) that stress model rate limits.
Customer: Sessions don't crash. Rate limits are invisible. The work gets done even when infrastructure hiccups.


UC-6: "History.md is 50KB and my agents are slow"

Persona: Long-running project (3 months). History files have grown to 30-50KB per agent. Session starts are sluggish.

Hook: onSessionEnd (auto-archive + metrics)

Before plugin: User must remember to ask Scribe to summarize. If they forget, history grows unbounded. Eventually agents burn 2000+ tokens just reading their own history. Session startup takes 45+ seconds.

With plugin: onSessionEnd hook fires every time a session ends. Checks each agent's history.md size. If over 15KB, triggers summarization — moves old entries to history-archive.md, keeps a compressed summary. Next session starts with a lean 3KB history. Metrics written to .squad/metrics/session-2026-04-14.json.

Business value:

Product Squad: Solves "Squad gets slower over time" — a retention killer. Makes Squad viable for long-running projects without manual maintenance.
Customer: Zero-maintenance history hygiene. Sessions stay fast month after month. Metrics show token trends so teams can optimize.


Use Cases in Agentic SDK Mode

These are scenarios where Squad is embedded in an application via @bradygaster/squad-sdk, running autonomously without a human at the keyboard.


UC-7: "CI pipeline spawns agents to review PRs — needs guardrails"

Persona: Platform team running Squad in CI. GitHub Action triggers Squad SDK to review every PR with 3 agents (security, architecture, tests).

Hook: onPreToolUse (tool guard) + onSessionStart (auto-context) + onErrorOccurred (fallback)

Scenario: PR bradygaster#247 is opened. CI triggers Squad SDK. Three review agents spawn. Security agent needs to read .env.example but must never read .env. Architecture agent should only read, never write. Test agent can write test files only.

import { CopilotClient } from '@github/copilot-sdk';
import { registerHooks } from 'squad-hooks-lifecycle';

const hooks = registerHooks({
  sessionStart: { enabled: true, autoLoadDecisions: true },
  toolGuard: {
    enabled: true,
    agentBoundaries: true,
    // Security agent: read-only, no .env
    // Architecture agent: read-only
    // Test agent: write to test/** only
  },
  errorRecovery: { enabled: true, modelFallback: true },
});

const client = new CopilotClient();
const session = await client.createSession({ hooks: hooks.asSessionHooks() });

Business value:

Product Squad: Proves Squad SDK works in headless/CI environments — not just interactive chat. This is the enterprise adoption vector: automated code review pipelines powered by Squad agents with governance.
Customer: Automated PR review with real guardrails. Security agent can't leak secrets. Architecture agent can't introduce changes. All enforced by hooks, not prompts.


UC-8: "Nightly batch processes 50 issues — needs cost control"

Persona: Startup using Squad SDK to auto-triage and fix GitHub issues overnight. Batch of 50 issues. Budget: $20/night.

Hook: onSessionEnd (metrics) + onErrorOccurred (fallback for cost) + onPreToolUse (rate limiting)

Scenario: Ralph-like loop processes issues. After 30 issues, token spend approaches budget. Need to downgrade models or pause.

const hooks = registerHooks({
  sessionEnd: {
    enabled: true,
    captureMetrics: true,
    // Metrics include: tokens_used, model, duration, tools_called
  },
  errorRecovery: {
    enabled: true,
    modelFallback: true,
    fallbackChain: {
      // Start cheap, only upgrade if needed
      standard: ['claude-haiku-4.5', 'gpt-5.4-mini'],
    },
  },
  toolGuard: {
    enabled: true,
    // Custom pre-tool hook: check cumulative cost
    customPreHooks: [
      async (ctx) => {
        const spent = await readMetrics();
        if (spent > 18.00) {
          return { action: 'block', reason: 'Budget limit reached ($18/$20)' };
        }
        return { action: 'allow' };
      },
    ],
  },
});

Business value:

Product Squad: Enables cost-controlled autonomous operation — the missing piece for Squad in production. No AI team framework offers per-session budget enforcement at the tool level.
Customer: Run Squad overnight without fear of runaway costs. Budget caps are enforced mechanically, not by hoping the coordinator remembers.


UC-9: "Multi-tenant SaaS — each customer gets different agent permissions"

Persona: SaaS platform embedding Squad SDK. Each customer tenant has different permissions (free tier: read-only agents, pro tier: full agents, enterprise: custom policies).

Hook: onSessionStart (inject tenant config) + onPreToolUse (tenant-scoped permissions)

Scenario: Customer A (free) starts a session. Customer B (enterprise) starts a session. Same Squad SDK, different permissions.

function createTenantHooks(tenant: Tenant) {
  return registerHooks({
    sessionStart: {
      enabled: true,
      // Inject tenant-specific context
      customContext: `Tenant: ${tenant.name}, Plan: ${tenant.plan}`,
    },
    toolGuard: {
      enabled: true,
      customPreHooks: [
        async (ctx) => {
          if (tenant.plan === 'free' && isWriteTool(ctx.toolName)) {
            return { action: 'block', reason: 'Upgrade to Pro for write access' };
          }
          if (tenant.plan !== 'enterprise' && isDangerousTool(ctx.toolName)) {
            return { action: 'block', reason: 'Enterprise only' };
          }
          return { action: 'allow' };
        },
      ],
    },
  });
}

Business value:

Product Squad: Positions Squad SDK as embeddable in SaaS products — a new market segment. Multi-tenant governance via hooks is a compelling SDK selling point.
Customer: Build AI-powered features with Squad SDK and offer tiered access to customers. Permission boundaries are enforced by hooks, not application logic.


UC-10: "Autonomous agent loop with human-in-the-loop approval gates"

Persona: Enterprise deploying Squad SDK for code generation. Policy: all file writes must be approved by a human reviewer before committing.

Hook: onPreToolUse (approval gate) + onPostToolUse (audit) + onSessionEnd (compliance report)

Scenario: Agent generates code. Before any file write, a webhook fires to the approval system. Human approves or rejects. Agent proceeds or stops.

const hooks = registerHooks({
  toolGuard: {
    enabled: true,
    customPreHooks: [
      async (ctx) => {
        if (isWriteTool(ctx.toolName)) {
          const approval = await requestHumanApproval({
            agent: ctx.agentName,
            tool: ctx.toolName,
            file: ctx.arguments.path,
            preview: ctx.arguments.content?.substring(0, 500),
          });
          if (approval.status === 'rejected') {
            return { action: 'block', reason: `Rejected by ${approval.reviewer}: ${approval.reason}` };
          }
        }
        return { action: 'allow' };
      },
    ],
  },
  toolAudit: {
    enabled: true,
    auditLog: true,
    // Every approval/rejection is logged for compliance
  },
  sessionEnd: {
    enabled: true,
    captureMetrics: true,
    // Generate compliance report: N writes approved, M rejected, by whom
  },
});

Business value:

Product Squad: Unlocks regulated industries (finance, healthcare, defense) where autonomous AI agents require human oversight. This is the enterprise gate — without it, Squad can't enter these markets.
Customer: Full human-in-the-loop control without losing the speed of AI agents. Every action is approved, logged, and reportable for auditors.


UC-11: "Agent fleet with centralized observability"

Persona: DevOps team running 20+ Squad instances across microservices. Needs centralized monitoring of all agent activity.

Hook: onPostToolUse (telemetry) + onErrorOccurred (alerting) + onSessionEnd (aggregation)

Scenario: Each microservice repo has its own Squad. DevOps needs a single dashboard showing: which agents ran, what they did, error rates, token spend.

const hooks = registerHooks({
  toolAudit: {
    enabled: true,
    customPostHooks: [
      async (ctx) => {
        // Ship telemetry to centralized collector
        await fetch('https://telemetry.internal/ingest', {
          method: 'POST',
          body: JSON.stringify({
            repo: process.env.REPO_NAME,
            agent: ctx.agentName,
            tool: ctx.toolName,
            duration: ctx.duration,
            success: !ctx.error,
            timestamp: new Date().toISOString(),
          }),
        });
        return { result: ctx.result }; // pass through unchanged
      },
    ],
  },
  errorRecovery: {
    enabled: true,
    customErrorHooks: [
      async (error) => {
        // Alert on-call if error rate spikes
        await fetch('https://pagerduty.internal/alert', {
          method: 'POST',
          body: JSON.stringify({ source: 'squad', error: error.message }),
        });
      },
    ],
  },
});

Business value:

Product Squad: Enables fleet-scale Squad deployments. Observability is non-negotiable for platform teams managing multiple AI agent instances. This makes Squad enterprise-infrastructure-grade.
Customer: One dashboard for all Squad activity across all repos. Error spikes trigger PagerDuty. Token spend is tracked per-repo. DevOps has full visibility.


UC-12: "Auto-fix failing CI — agents respond to webhook, hooks enforce safety"

Persona: Platform team. When CI fails, a webhook triggers Squad SDK to diagnose and fix the failure autonomously.

Hook: onSessionStart (inject CI context) + onPreToolUse (safety limits) + onPostToolUse (validate fix) + onSessionEnd (report)

Scenario: CI fails on PR bradygaster#312. Webhook fires. Squad SDK spins up. Agent reads CI logs, identifies the issue, applies a fix, runs tests locally, pushes if green.

const hooks = registerHooks({
  sessionStart: {
    enabled: true,
    customContext: async () => {
      const ciLogs = await fetchCILogs(process.env.CI_RUN_ID);
      return `CI Failure Context:\n${ciLogs.substring(0, 5000)}`;
    },
  },
  toolGuard: {
    enabled: true,
    customPreHooks: [
      async (ctx) => {
        // Safety: max 3 file edits per CI-fix session
        const editCount = getSessionEditCount();
        if (isWriteTool(ctx.toolName) && editCount >= 3) {
          return { action: 'block', reason: 'Max 3 edits per CI-fix session. Escalate to human.' };
        }
        // Safety: never edit CI config files
        if (ctx.arguments?.path?.includes('.github/workflows/')) {
          return { action: 'block', reason: 'Cannot modify CI workflows autonomously.' };
        }
        return { action: 'allow' };
      },
    ],
  },
  sessionEnd: {
    enabled: true,
    captureMetrics: true,
    // Report: what was fixed, how many attempts, was it pushed
  },
});

Business value:

Product Squad: Demonstrates Squad SDK as an autonomous CI remediation engine — a high-value, high-visibility use case that sells itself. Every engineering team wants "CI fixes itself."
Customer: CI failures are auto-diagnosed and fixed with guardrails. Agent can't go rogue (max 3 edits, can't modify workflows). If it can't fix it in 3 edits, it escalates to a human.


Plugin vs. Core: Decision Matrix

Factor Core (PRD #151) Plugin (This PRD)
Availability All Squad users get it Opt-in install
Release cycle Tied to Squad releases Independent — ship anytime
Customization Config only Fork + modify freely
Risk to core High — touches critical path Zero — external package
Enterprise appeal "Built-in governance" "Composable governance"
Cost to implement High — needs core review Medium — uses public APIs
Backward compat Must not break existing users Only affects users who install
Marketplace signal N/A Proves marketplace works for real governance plugins

Recommendation: Ship the plugin FIRST. If adoption proves demand, promote the most-used hooks into core in a future release. This is the lower-risk path that still delivers all the value.


Implementation Priority

Priority Hook CLI Chat Value SDK Agentic Value
P0 onErrorOccurred (model fallback) Sessions stop crashing Autonomous loops stay alive
P0 onSessionStart (auto-context) Agents remember decisions Headless agents get full context
P1 onPreToolUse (dynamic policies) Agents stay in scope Multi-tenant permissions
P1 onPostToolUse (audit trail) "What happened?" answered Fleet observability
P2 onSessionEnd (cleanup/metrics) History stays lean Cost tracking per-session
P2 onUserPromptSubmitted (directives) Directives persist Prompt-level governance

Success Metrics

  • Plugin installs via marketplace: 50+ repos in first quarter
  • Session failure rate for plugin users drops below 3% (from ~10%)
  • Audit log adoption: 80%+ of plugin users enable audit trail
  • Zero-config value: users who install with defaults see immediate improvement (auto-context + error recovery)

This PRD complements PRD #151 (core hooks). Both can coexist — the plugin uses Squad's public APIs, and if specific hooks prove essential, they can graduate into core.

Metadata

Metadata

Assignees

No one assigned

    Labels

    go:needs-researchNeeds investigationsquadSquad triage inbox — Lead will assign to a membersquad:fidoAssigned to FIDO (Quality Owner)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions