feat(skills): add error-recovery skill -- standard agent failure recovery patterns by tamirdresher · Pull Request #623 · bradygaster/squad

tamirdresher · 2026-03-25T21:25:31Z

Summary

Adds the \�rror-recovery\ skill — 5 standard recovery patterns every squad agent should apply when something fails.

Battle-tested in a 13-agent production deployment across 6 months of daily use.

What it does

Defines a clear playbook so agents adapt instead of just reporting failures:

Pattern	When
Retry with Backoff	Rate limits, API timeouts, transient errors
Fallback Alternatives	Primary tool/approach unavailable
Diagnose-and-Fix	Build, test, lint failures (max 3 cycles)
Escalate with Context	Recovery exhausted, human judgment required
Graceful Degradation	Non-critical step fails, task can still deliver value

Also includes a selection guide mapping failure types to the right pattern.

Design

Provider-agnostic — no external services required
Meant to be referenced in each agent's charter ## Error Recovery\ section
Complements \iterative-retrieval\ (cycle caps) and \session-recovery\ (restart scenarios)

Zero breaking changes.

chore: promote dev to main (docs dark mode fix)

…ery patterns 5 recovery patterns for any squad agent: - Retry with Backoff (transient failures, rate limits) - Fallback Alternatives (tool/approach fails) - Diagnose-and-Fix (build/test/lint errors) - Escalate with Context (exhausted recovery) - Graceful Degradation (non-critical step fails) Includes selection guide mapping failure types to patterns. Battle-tested in production multi-agent deployment. Zero breaking changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

bradygaster · 2026-03-26T06:15:46Z

🔍 Squad Team Review

Reviewed by: Procedures (Prompt Architect) — Pass 3 of consensus ceremony
Action: Hold — proposal-first policy applies
Team Consensus: 5/5 AGREE

Review Summary

This PR adds error-recovery mechanisms that enable skills to gracefully handle and recover from failures.

Process Note

Per our team's documented proposal-first policy (.squad/decisions.md): "Meaningful changes require a proposal in docs/proposals/ before execution." This is a feature-class change that needs a proposal document before merge.

What's Needed

Create a proposal in docs/proposals/ describing the skill's purpose, integration points, and expected behavior
Retarget this PR from main to dev (our staging branch)
Once proposal is reviewed and approved, this PR can proceed

Thank you for the contribution! The implementation looks solid — we just need the process alignment. 🙏

⏸️ Action: HOLD — awaiting proposal + retarget to dev

bradygaster

FIDO Quality Review — PR #623: error-recovery skill

Verdict: ⚠️ NEEDS CHANGES

Blocking Issue

❌ Wrong package names in changeset: The .changeset/error-recovery-skill.md uses squad-cli and squad-sdk instead of @bradygaster/squad-cli and @bradygaster/squad-sdk. This will cause the changeset to fail — it won't match any workspace packages. All other existing changesets in this repo use the scoped @bradygaster/ prefix.

Fix: Change the changeset frontmatter to:
\\yaml

"@bradygaster/squad-cli": minor
"@bradygaster/squad-sdk": minor

\\

Non-Blocking Notes

Target branch: Targets main — other skill PRs in this batch target main too, but #625 targets dev. Consider aligning to dev.
Frontmatter deviations: Has license: MIT field (not in template) and missing source field. Non-breaking but inconsistent with other skills.
Content quality: Excellent. The 5 recovery patterns (retry, fallback, diagnose, escalate, degrade) are practical and well-structured. The selection guide table is a nice touch.

Once the changeset package names are fixed, this is ready to merge.

…ster#605, bradygaster#606, bradygaster#607) (bradygaster#623) * test: add first-run gating tests (issue bradygaster#607) 25 tests across 6 categories enforcing Init Mode gating: - Banner renders exactly once (not duplicated) - First-run hint appears on initial session only - Console output contains no raw Node warnings - 'Your squad is assembled' requires non-empty roster - Session-scoped Static keys prevent cross-session collisions - Terminal clear runs before Ink render Closes bradygaster#607 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: standardize elapsed time placement inline after agent messages (bradygaster#605) Elapsed time annotations were inconsistent — the Static scrollback block in App.tsx rendered completed agent messages without any duration, while MessageStream had duration code that was dead (receives messages=[]). Changes: - Export formatDuration from MessageStream.tsx for shared use - Add inline (duration) display to App.tsx Static block for agent messages - Duration computed by finding preceding user message timestamp - Format: inline dimColor after message content, e.g. (4.4s) Closes bradygaster#605 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update cheritto history and decision for bradygaster#605 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(shell): remove duplicate prompt hint that conflicts with header (bradygaster#606) The InputPrompt placeholder showed 'Type @agent or /help' for the first 5 messages, duplicating guidance already present in the header banner. This created visual noise with two competing prompt elements. Consolidated to two hint tiers: tab/history (messageCount < 10) and command hints (>= 10). The header banner remains the single source for @agent and /help guidance. Closes bradygaster#606 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update kovash history and decision for bradygaster#606 prompt consolidation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tamirdresher · 2026-03-26T08:17:01Z

Fixed per FIDO review: corrected changeset package names from squad-cli/squad-sdk to @bradygaster/squad-cli/@bradygaster/squad-sdk. Also retargeting to dev branch.

bradygaster · 2026-03-26T08:22:32Z

Thank you for this work, Tamir! 🚀

To get this PR ready for team review and merge, we need two quick adjustments:

*Retarget to \dev* — All feature work merges to the \dev\ branch first (not \main). You can change the base branch in the PR settings.
Add a proposal document — Feature-class skills need a brief proposal in \docs/proposals/\ before merging. Include:
- What the skill does
- Why it's needed
- How it fits into our existing skill architecture

Once those are in place, the team will take another look and we can get this merged!

Thanks for your continued contributions to Squad 👍

… FIDO review

tamirdresher · 2026-03-26T08:31:56Z

Fixed per FIDO review: corrected changeset package names from squad-cli/squad-sdk to @bradygaster/squad-cli/@bradygaster/squad-sdk. Also retargeted this PR to the dev branch.

tamirdresher · 2026-03-26T08:44:44Z

Addressed both review items: (1) Fixed changeset package names to @bradygaster/squad-cli and @bradygaster/squad-sdk. (2) Added docs/proposals/error-recovery.md proposal document. PR already retargeted to dev.

Copilot

Pull request overview

Adds a new “error-recovery” skill to the Squad skill template library, defining standard patterns agents should apply when tasks/tools fail, plus a supporting proposal doc and release changeset.

Changes:

Add error-recovery skill template to both SDK and CLI template trees.
Add a proposal document describing the motivation/approach for the new skill.
Add a changeset bumping @bradygaster/squad-cli and @bradygaster/squad-sdk minor versions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
packages/squad-sdk/templates/skills/error-recovery/SKILL.md	New SDK-distributed skill content for error recovery patterns.
packages/squad-cli/templates/skills/error-recovery/SKILL.md	New CLI-distributed skill content for error recovery patterns.
docs/proposals/error-recovery.md	Proposal write-up for the new skill (motivation, approach, risks).
.changeset/error-recovery-skill.md	Changeset to release the new skill in CLI/SDK packages.

Copilot · 2026-03-28T13:53:28Z

+description: "Standard recovery patterns for all squad agents. When something fails, adapt — don't just report the failure."
+domain: "reliability, agent-coordination"
+confidence: "high"
+license: MIT


Skill frontmatter deviates from the convention used by other skills: it includes license: MIT but is missing the source: field that’s present across existing skill templates. Consider adding a source: entry (e.g., earned/manual/extracted) and dropping license if it’s not consumed anywhere, to keep metadata consistent and avoid unused fields.

Suggested change

license: MIT

source: manual

Copilot · 2026-03-28T13:53:28Z

+description: "Standard recovery patterns for all squad agents. When something fails, adapt — don't just report the failure."
+domain: "reliability, agent-coordination"
+confidence: "high"
+license: MIT


Skill frontmatter deviates from the convention used by other skills: it includes license: MIT but is missing the source: field that’s present across existing skill templates. Consider adding a source: entry (e.g., earned/manual/extracted) and dropping license if it’s not consumed anywhere, to keep metadata consistent and avoid unused fields.

Suggested change

license: MIT

source: manual

Copilot · 2026-03-28T13:53:29Z

+
+## Fit with Existing Architecture
+
+- **Complements** existing gent-conduct skill (which covers behavior) — this skill covers failure states


There’s a non-printable/control character before gent-conduct in this line, which will render incorrectly in Markdown and makes the skill reference harder to read/search. Replace it with plain text agent-conduct.

Suggested change

- **Complements** existing gent-conduct skill (which covers behavior) — this skill covers failure states

- **Complements** existing agent-conduct skill (which covers behavior) — this skill covers failure states

Copilot · 2026-03-28T13:53:29Z

+| **degrade** | Full functionality unavailable — deliver partial result with caveat |
+
+The skill provides a selection guide table mapping error symptoms to the appropriate pattern,
+plus prompt templates for each pattern that agents can use in their reasoning.


This proposal says the skill includes “prompt templates for each pattern,” but the new SKILL.md content only contains narrative guidance/examples (no reusable prompt templates). Either add the prompt-template sections to the skill files or adjust this proposal text so it matches what’s actually being shipped.

Suggested change

plus prompt templates for each pattern that agents can use in their reasoning.

plus narrative guidance and example prompts for each pattern that agents can adapt in their reasoning.

bradygaster and others added 2 commits March 23, 2026 10:08

Merge pull request bradygaster#570 from bradygaster/dev

6a656b0

chore: promote dev to main (docs dark mode fix)

bradygaster requested changes Mar 26, 2026

View reviewed changes

bradygaster mentioned this pull request Mar 26, 2026

feat(skills): add directive-capture skill -- auto-detect and persist user directives #614

Closed

tamirdresher changed the base branch from main to dev March 26, 2026 08:16

fix: correct changeset package names to @bradygaster scoped names per…

868810b

… FIDO review

tamirdresher force-pushed the feat/error-recovery-skill branch from f8b8b92 to 868810b Compare March 26, 2026 08:29

feat: add proposal document for error-recovery skill per review policy

bdfaf15

Merge branch 'dev' into feat/error-recovery-skill

4b79561

Copilot AI review requested due to automatic review settings March 28, 2026 13:49

Copilot started reviewing on behalf of bradygaster March 28, 2026 13:49 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

bradygaster merged commit 9c3156a into bradygaster:dev Mar 28, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): add error-recovery skill -- standard agent failure recovery patterns#623

feat(skills): add error-recovery skill -- standard agent failure recovery patterns#623
bradygaster merged 5 commits intobradygaster:devfrom
tamirdresher:feat/error-recovery-skill

tamirdresher commented Mar 25, 2026

Uh oh!

bradygaster commented Mar 26, 2026

Uh oh!

bradygaster left a comment

Uh oh!

tamirdresher commented Mar 26, 2026

Uh oh!

bradygaster commented Mar 26, 2026

Uh oh!

tamirdresher commented Mar 26, 2026

Uh oh!

tamirdresher commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Fit with Existing Architecture

		- Complements existing gent-conduct skill (which covers behavior) — this skill covers failure states

	- Complements existing gent-conduct skill (which covers behavior) — this skill covers failure states
	- Complements existing agent-conduct skill (which covers behavior) — this skill covers failure states

	plus prompt templates for each pattern that agents can use in their reasoning.
	plus narrative guidance and example prompts for each pattern that agents can adapt in their reasoning.

Conversation

tamirdresher commented Mar 25, 2026

Summary

What it does

Design

Uh oh!

bradygaster commented Mar 26, 2026

🔍 Squad Team Review

Review Summary

Process Note

What's Needed

Uh oh!

bradygaster left a comment

Choose a reason for hiding this comment

FIDO Quality Review — PR #623: error-recovery skill

Blocking Issue

Fix: Change the changeset frontmatter to: \\yaml

"@bradygaster/squad-cli": minor "@bradygaster/squad-sdk": minor

Non-Blocking Notes

Uh oh!

tamirdresher commented Mar 26, 2026

Uh oh!

bradygaster commented Mar 26, 2026

Uh oh!

tamirdresher commented Mar 26, 2026

Uh oh!

tamirdresher commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Change the changeset frontmatter to:
\\yaml

"@bradygaster/squad-cli": minor
"@bradygaster/squad-sdk": minor