feat(skills): add error-recovery skill -- standard agent failure recovery patterns#623
Conversation
chore: promote dev to main (docs dark mode fix)
…ery patterns 5 recovery patterns for any squad agent: - Retry with Backoff (transient failures, rate limits) - Fallback Alternatives (tool/approach fails) - Diagnose-and-Fix (build/test/lint errors) - Escalate with Context (exhausted recovery) - Graceful Degradation (non-critical step fails) Includes selection guide mapping failure types to patterns. Battle-tested in production multi-agent deployment. Zero breaking changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Squad Team ReviewReviewed by: Procedures (Prompt Architect) — Pass 3 of consensus ceremony Review SummaryThis PR adds error-recovery mechanisms that enable skills to gracefully handle and recover from failures. Process NotePer our team's documented proposal-first policy (.squad/decisions.md): "Meaningful changes require a proposal in docs/proposals/ before execution." This is a feature-class change that needs a proposal document before merge. What's Needed
Thank you for the contribution! The implementation looks solid — we just need the process alignment. 🙏 ⏸️ Action: HOLD — awaiting proposal + retarget to dev |
bradygaster
left a comment
There was a problem hiding this comment.
FIDO Quality Review — PR #623: error-recovery skill
Verdict:
Blocking Issue
❌ Wrong package names in changeset: The .changeset/error-recovery-skill.md uses squad-cli and squad-sdk instead of @bradygaster/squad-cli and @bradygaster/squad-sdk. This will cause the changeset to fail — it won't match any workspace packages. All other existing changesets in this repo use the scoped @bradygaster/ prefix.
Fix: Change the changeset frontmatter to:
\\yaml
"@bradygaster/squad-cli": minor
"@bradygaster/squad-sdk": minor
\\
Non-Blocking Notes
- Target branch: Targets
main— other skill PRs in this batch targetmaintoo, but #625 targetsdev. Consider aligning todev. - Frontmatter deviations: Has
license: MITfield (not in template) and missingsourcefield. Non-breaking but inconsistent with other skills. - Content quality: Excellent. The 5 recovery patterns (retry, fallback, diagnose, escalate, degrade) are practical and well-structured. The selection guide table is a nice touch.
Once the changeset package names are fixed, this is ready to merge.
…ster#605, bradygaster#606, bradygaster#607) (bradygaster#623) * test: add first-run gating tests (issue bradygaster#607) 25 tests across 6 categories enforcing Init Mode gating: - Banner renders exactly once (not duplicated) - First-run hint appears on initial session only - Console output contains no raw Node warnings - 'Your squad is assembled' requires non-empty roster - Session-scoped Static keys prevent cross-session collisions - Terminal clear runs before Ink render Closes bradygaster#607 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: standardize elapsed time placement inline after agent messages (bradygaster#605) Elapsed time annotations were inconsistent — the Static scrollback block in App.tsx rendered completed agent messages without any duration, while MessageStream had duration code that was dead (receives messages=[]). Changes: - Export formatDuration from MessageStream.tsx for shared use - Add inline (duration) display to App.tsx Static block for agent messages - Duration computed by finding preceding user message timestamp - Format: inline dimColor after message content, e.g. (4.4s) Closes bradygaster#605 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update cheritto history and decision for bradygaster#605 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(shell): remove duplicate prompt hint that conflicts with header (bradygaster#606) The InputPrompt placeholder showed 'Type @agent or /help' for the first 5 messages, duplicating guidance already present in the header banner. This created visual noise with two competing prompt elements. Consolidated to two hint tiers: tab/history (messageCount < 10) and command hints (>= 10). The header banner remains the single source for @agent and /help guidance. Closes bradygaster#606 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update kovash history and decision for bradygaster#606 prompt consolidation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Fixed per FIDO review: corrected changeset package names from squad-cli/squad-sdk to @bradygaster/squad-cli/@bradygaster/squad-sdk. Also retargeting to dev branch. |
|
Thank you for this work, Tamir! 🚀 To get this PR ready for team review and merge, we need two quick adjustments:
Once those are in place, the team will take another look and we can get this merged! Thanks for your continued contributions to Squad 👍 |
f8b8b92 to
868810b
Compare
|
Fixed per FIDO review: corrected changeset package names from |
|
Addressed both review items: (1) Fixed changeset package names to @bradygaster/squad-cli and @bradygaster/squad-sdk. (2) Added docs/proposals/error-recovery.md proposal document. PR already retargeted to dev. |
There was a problem hiding this comment.
Pull request overview
Adds a new “error-recovery” skill to the Squad skill template library, defining standard patterns agents should apply when tasks/tools fail, plus a supporting proposal doc and release changeset.
Changes:
- Add
error-recoveryskill template to both SDK and CLI template trees. - Add a proposal document describing the motivation/approach for the new skill.
- Add a changeset bumping
@bradygaster/squad-cliand@bradygaster/squad-sdkminor versions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| packages/squad-sdk/templates/skills/error-recovery/SKILL.md | New SDK-distributed skill content for error recovery patterns. |
| packages/squad-cli/templates/skills/error-recovery/SKILL.md | New CLI-distributed skill content for error recovery patterns. |
| docs/proposals/error-recovery.md | Proposal write-up for the new skill (motivation, approach, risks). |
| .changeset/error-recovery-skill.md | Changeset to release the new skill in CLI/SDK packages. |
| description: "Standard recovery patterns for all squad agents. When something fails, adapt — don't just report the failure." | ||
| domain: "reliability, agent-coordination" | ||
| confidence: "high" | ||
| license: MIT |
There was a problem hiding this comment.
Skill frontmatter deviates from the convention used by other skills: it includes license: MIT but is missing the source: field that’s present across existing skill templates. Consider adding a source: entry (e.g., earned/manual/extracted) and dropping license if it’s not consumed anywhere, to keep metadata consistent and avoid unused fields.
| license: MIT | |
| source: manual |
| description: "Standard recovery patterns for all squad agents. When something fails, adapt — don't just report the failure." | ||
| domain: "reliability, agent-coordination" | ||
| confidence: "high" | ||
| license: MIT |
There was a problem hiding this comment.
Skill frontmatter deviates from the convention used by other skills: it includes license: MIT but is missing the source: field that’s present across existing skill templates. Consider adding a source: entry (e.g., earned/manual/extracted) and dropping license if it’s not consumed anywhere, to keep metadata consistent and avoid unused fields.
| license: MIT | |
| source: manual |
|
|
||
| ## Fit with Existing Architecture | ||
|
|
||
| - **Complements** existing gent-conduct skill (which covers behavior) — this skill covers failure states |
There was a problem hiding this comment.
There’s a non-printable/control character before gent-conduct in this line, which will render incorrectly in Markdown and makes the skill reference harder to read/search. Replace it with plain text agent-conduct.
| - **Complements** existing gent-conduct skill (which covers behavior) — this skill covers failure states | |
| - **Complements** existing agent-conduct skill (which covers behavior) — this skill covers failure states |
| | **degrade** | Full functionality unavailable — deliver partial result with caveat | | ||
|
|
||
| The skill provides a selection guide table mapping error symptoms to the appropriate pattern, | ||
| plus prompt templates for each pattern that agents can use in their reasoning. |
There was a problem hiding this comment.
This proposal says the skill includes “prompt templates for each pattern,” but the new SKILL.md content only contains narrative guidance/examples (no reusable prompt templates). Either add the prompt-template sections to the skill files or adjust this proposal text so it matches what’s actually being shipped.
| plus prompt templates for each pattern that agents can use in their reasoning. | |
| plus narrative guidance and example prompts for each pattern that agents can adapt in their reasoning. |
Summary
Adds the \�rror-recovery\ skill — 5 standard recovery patterns every squad agent should apply when something fails.
Battle-tested in a 13-agent production deployment across 6 months of daily use.
What it does
Defines a clear playbook so agents adapt instead of just reporting failures:
Also includes a selection guide mapping failure types to the right pattern.
Design
Zero breaking changes.