fix: skills schema alignment and eval - Part one of skills normalization by sytone · Pull Request #787 · bradygaster/squad

sytone · 2026-04-03T18:24:15Z

What

Aligns all skill YAML frontmatter across 3 canonical locations to the agentskills.io specification and adds a comprehensive eval framework for testing skill trigger quality.

Part one of a multi-PR skills normalization effort — this PR touches ONLY frontmatter (no skill body content changes) and adds the eval tooling.

Why

Skills used an inconsistent internal schema (domain, confidence, source as top-level fields, tools arrays, some skills had no frontmatter at all)
No mechanism existed to validate skill descriptions trigger correctly on user prompts
No contribution guide or review checklist existed for skill quality

How

Schema alignment (34 skills across 3 directories):

.squad/skills/ (14 skills) — team-level patterns
.copilot/skills/ (17 skills) — coordinator playbook
templates/skills/ (3 skills) — product templates

All skills now have:

name, description, license as top-level (agentskills.io required)
domain, confidence, source, triggers, roles, compatibility inside metadata: map
allowed-tools as top-level string where applicable (agentskills.io optional)
Skills without frontmatter got complete --- blocks added

Eval framework (new):

Phase 1 — run-evals.mjs: keyword-based trigger matching (fast, CI-ready). 88.9% baseline.
Phase 2 — run-llm-evals.mjs: LLM-based trigger + execution evals via Copilot CLI models. Supports --type trigger|exec|all, --runs N for nondeterminism, --split for train/validation.
Phase 3 — optimize-description.mjs: iterative description optimization loop with train/validation split.
Schema validator — validate-schema.mjs: checks frontmatter compliance across all directories.
31 trigger eval fixtures (342 test cases) + 10 execution eval fixtures with LLM-as-judge assertion grading.

New team member:

🔍 SPAN (Skill Curator) — owns skill quality, schema compliance, eval coverage, and trigger testing. Gates skill PRs on eval pass rates.

Docs:

CONTRIBUTING.md — skill creation/modification workflow, schema reference, eval guide
skill-review-checklist.md — 40+ checkpoint review gate
README.md — eval framework documentation

Testing

node .squad/skills/evals/validate-schema.mjs passes (34/34 skills valid)
node .squad/skills/evals/run-evals.mjs passes (88.9%, 304/342)
node .squad/skills/evals/run-llm-evals.mjs --dry-run works for all modes
No source code changes — npm run build and npm test not applicable

Docs

CONTRIBUTING.md for skills
Eval framework README
Skill review checklist template

Breaking Changes

None — frontmatter-only changes. The SDK's simple YAML parser flattens nested metadata, so triggers/roles inside metadata: are still accessible as top-level fields at runtime.

Waivers

No CHANGELOG entry needed (no packages/squad-sdk/src/ or packages/squad-cli/src/ changes)
File count is high (89 files) because this touches 34 skill files + 31 eval fixtures + scripts + docs — all intentional

github-actions · 2026-04-03T18:24:27Z

🛫 PR Readiness Check

⚠️ 3 item(s) to address before review

Status	Check	Details
✅	Single commit	1 commit — clean history
✅	Not in draft	Ready for review
✅	Branch up to date	Up to date with dev
❌	Copilot review	No Copilot review yet — it may still be processing
✅	Changeset present	No source files changed — changeset not required
✅	Scope clean	⚠️ PR includes 70 .squad/ file(s) — ensure these are intentional
✅	No merge conflicts	No merge conflicts
❌	Copilot threads resolved	7 unresolved Copilot thread(s) — fix and resolve before merging
❌	CI passing	No CI checks have run yet

This check runs automatically on every push. Fix any ❌ items and push again.
See CONTRIBUTING.md and PR Requirements for details.

sytone · 2026-04-03T18:25:15Z

FYI, this is still in progress. Just getting the initial PR out. This is a tidy house before looking at skill content and updating to be agnostic of the squad repo.

Copilot

Pull request overview

This PR begins “skills normalization” by updating many SKILL.md files to a new frontmatter shape (license/metadata/triggers/roles) and introducing a local skill schema validator + keyword-based eval fixtures to verify skill discoverability.

Changes:

Normalize skill frontmatter across .copilot/skills/, .squad/skills/, and templates/skills/ (add license, metadata, triggers, roles, allowed-tools).
Add/expand the skill eval framework: new fixtures, docs, and a new validate-schema.mjs validator.
Add SPAN (“Skill Curator”) agent + supporting templates/docs for skill quality and workflow wiring.

Reviewed changes

Copilot reviewed 83 out of 83 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
templates/skills/squad-conventions/SKILL.md	Normalize skill frontmatter for template skill.
templates/skills/rework-rate/SKILL.md	Normalize frontmatter; replace `tools:` with `allowed-tools`.
templates/skills/nap/SKILL.md	Convert markdown-only skill to YAML-frontmatter skill.
.squad/templates/skill.md	Update skill authoring template for the new schema.
.squad/templates/skill-review-checklist.md	Add a reviewer checklist for skill PRs.
.squad/templates/agents/challenger.md	Add challenger agent template.
.squad/team.md	Add SPAN to team roster.
.squad/skills/versioning-policy/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/session-recovery/SKILL.md	Normalize frontmatter + introduce `allowed-tools`.
.squad/skills/release-process/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/ralph-two-pass-scan/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/pr-screenshots/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/personal-squad/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/model-selection/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/humanizer/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/gh-auth-isolation/SKILL.md	Normalize frontmatter + introduce `allowed-tools`.
.squad/skills/fact-checking/SKILL.md	Add new fact-checking skill.
.squad/skills/external-comms/SKILL.md	Normalize frontmatter + introduce `allowed-tools`.
.squad/skills/evals/versioning-policy.eval.yaml	Add eval fixture for versioning-policy.
.squad/skills/evals/validate-schema.mjs	Add schema validator for SKILL.md frontmatter + eval coverage reporting.
.squad/skills/evals/squad-conventions.eval.yaml	Add eval fixture for squad-conventions.
.squad/skills/evals/session-recovery.eval.yaml	Add eval fixture for session-recovery.
.squad/skills/evals/secret-handling.eval.yaml	Add eval fixture for secret-handling.
.squad/skills/evals/rework-rate.eval.yaml	Add eval fixture for rework-rate.
.squad/skills/evals/reviewer-protocol.eval.yaml	Add eval fixture for reviewer-protocol.
.squad/skills/evals/reskill.eval.yaml	Add eval fixture for reskill.
.squad/skills/evals/release-process.eval.yaml	Add eval fixture for release-process.
.squad/skills/evals/README.md	Document how to run skill evals and what fixtures mean.
.squad/skills/evals/ralph-two-pass-scan.eval.yaml	Add eval fixture for ralph-two-pass-scan.
.squad/skills/evals/pr-screenshots.eval.yaml	Add eval fixture for pr-screenshots.
.squad/skills/evals/personal-squad.eval.yaml	Add eval fixture for personal-squad.
.squad/skills/evals/nap.eval.yaml	Add eval fixture for nap.
.squad/skills/evals/model-selection.eval.yaml	Add eval fixture for model-selection.
.squad/skills/evals/init-mode.eval.yaml	Add eval fixture for init-mode.
.squad/skills/evals/humanizer.eval.yaml	Add eval fixture for humanizer.
.squad/skills/evals/history-hygiene.eval.yaml	Add eval fixture for history-hygiene.
.squad/skills/evals/github-multi-account.eval.yaml	Add eval fixture for github-multi-account.
.squad/skills/evals/git-workflow.eval.yaml	Add eval fixture for git-workflow.
.squad/skills/evals/gh-auth-isolation.eval.yaml	Add eval fixture for gh-auth-isolation.
.squad/skills/evals/fact-checking.eval.yaml	Add eval fixture for fact-checking.
.squad/skills/evals/external-comms.eval.yaml	Add eval fixture for external-comms.
.squad/skills/evals/economy-mode.eval.yaml	Add eval fixture for economy-mode.
.squad/skills/evals/distributed-mesh.eval.yaml	Add eval fixture for distributed-mesh.
.squad/skills/evals/cross-squad.eval.yaml	Add eval fixture for cross-squad.
.squad/skills/evals/cross-machine-coordination.eval.yaml	Add eval fixture for cross-machine-coordination.
.squad/skills/evals/client-compatibility.eval.yaml	Add eval fixture for client-compatibility.
.squad/skills/evals/cli-wiring.eval.yaml	Add eval fixture for cli-wiring.
.squad/skills/evals/ci-validation-gates.eval.yaml	Add eval fixture for ci-validation-gates.
.squad/skills/evals/architectural-proposals.eval.yaml	Add eval fixture for architectural-proposals.
.squad/skills/evals/agent-conduct.eval.yaml	Add eval fixture for agent-conduct.
.squad/skills/evals/agent-collaboration.eval.yaml	Add eval fixture for agent-collaboration.
.squad/skills/economy-mode/SKILL.md	Normalize frontmatter + add triggers/roles.
.squad/skills/cross-squad/SKILL.md	Normalize frontmatter + introduce `allowed-tools`.
.squad/skills/CONTRIBUTING.md	Add skill authoring/reviewing guide and validation commands.
.squad/skill.md	Update root skill template to new schema.
.squad/routing.md	Add SPAN ownership entry for skill quality & eval.
.squad/casting/registry.json	Register SPAN in casting registry.
.squad/agents/span/history.md	Seed SPAN agent history.
.squad/agents/span/charter.md	Add SPAN agent charter.
.squad-templates/workflow-wiring-appendix-b-documenter.md	Add wiring walkthrough for a documenter follow-up role.
.squad-templates/workflow-wiring-appendix-a-code-reviewer.md	Add wiring walkthrough for an enforced code reviewer gate.
.squad-templates/squad.agent.md	Update coordinator template to emphasize enforcement wiring for gating roles.
.copilot/skills/squad-conventions/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/reviewer-protocol/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/reskill/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/init-mode/SKILL.md	Normalize frontmatter + introduce `allowed-tools`.
.copilot/skills/history-hygiene/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/github-multi-account/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/client-compatibility/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/cli-wiring/SKILL.md	Add YAML frontmatter to previously markdown-only skill.
.copilot/skills/ci-validation-gates/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/architectural-proposals/SKILL.md	Normalize frontmatter + introduce `allowed-tools`.
.copilot/skills/agent-conduct/SKILL.md	Normalize frontmatter + add triggers/roles.
.copilot/skills/agent-collaboration/SKILL.md	Normalize frontmatter + add triggers/roles.

Copilot · 2026-04-03T18:29:55Z

+        const nestedObj = parseSimpleYaml(nested);
+        const hasKeys = Object.keys(nestedObj).length > 0;
+        if (hasKeys) {
+          Object.assign(result, nestedObj);
+          result[key] = nestedObj;
+        } else {


parseSimpleYaml() flattens nested maps into the parent via Object.assign(result, nestedObj) while also keeping result[key] = nestedObj. That guarantees fields like domain/confidence/source appear both at top-level and inside metadata whenever they’re written under metadata, so the validator’s warning in validateSkill() can’t be satisfied (it will warn even when the author follows the template). Consider either (1) not flattening nested maps (keep metadata truly nested), or (2) if flattening is intentional for the scoring engine, update the disallowed-top-level check to inspect the raw frontmatter lines/indentation instead of the flattened object.

Fixed in d231d6b — removed Object.assign(result, nestedObj) so nested maps stay nested. Updated validator to check top-level independently of metadata. Fields inside metadata: no longer trigger false warnings.

Copilot · 2026-04-03T18:29:55Z

+- [ ] **Frontmatter fields** are at top-level (not nested in `metadata:`)
+  - `triggers` and `roles` must not be nested
+  - Non-standard fields go in `metadata:` block only


The checklist says triggers and roles must not be nested in metadata: (lines 14–16), but the updated skill templates and many skills in this PR place triggers/roles under metadata:. Please align the checklist with the actual schema you want to enforce (either require top-level triggers/roles everywhere, or update templates + validator accordingly).

Fixed in d231d6b — checklist now states: name, description, license, allowed-tools are top-level. domain, confidence, source, triggers, roles, compatibility go inside metadata:.

Copilot · 2026-04-03T18:29:55Z

+metadata:
+  domain: "{e.g., testing, api-design, error-handling}"
+  confidence: "low|medium|high"
+  source: "{how this was learned: manual, observed, earned}"
+  compatibility: "GitHub Copilot CLI, VS Code Copilot Chat"
+  triggers: [keyword1, keyword2, keyword3]
+  roles: [developer, tester]


This skill template nests triggers and roles under metadata:. The SDK skill loader currently reads triggers and roles as top-level frontmatter keys (see packages/squad-sdk/src/skills/skill-loader.ts:129-134), and the contributing guide in this PR also documents them as top-level SDK extensions. To avoid breaking skill discovery when a real YAML parser is introduced, move triggers and roles out of metadata: and keep metadata: for non-standard fields only.

Fixed in d231d6b — kept triggers/roles in metadata: per project decision (user explicitly requested this placement). Added a comment explaining the SDK's simple parser flattens metadata, so these are accessible at runtime. If a full YAML parser is adopted later, they should move to top-level.

Copilot · 2026-04-03T18:29:56Z

+  triggers: [keyword1, keyword2, keyword3]
+  roles: [developer, tester]


Same issue as .squad/templates/skill.md: this template nests triggers and roles under metadata: even though the runtime loader expects triggers/roles as top-level keys. Consider moving triggers and roles out of metadata: to keep the frontmatter forward-compatible with proper YAML parsing.

Suggested change

triggers: [keyword1, keyword2, keyword3]

roles: [developer, tester]

triggers: [keyword1, keyword2, keyword3]

roles: [developer, tester]

Fixed alongside comment #3 — same SDK-flattening comment added to both template files.

Copilot · 2026-04-03T18:29:56Z

+| `.copilot/skills/` | Coordinator playbook — skills loaded by the Copilot CLI coordinator |
+| `templates/skills/` | Product templates — reusable skill scaffolds for new projects |
+
+The eval runner and schema validator scan `.squad/skills/` and `.copilot/skills/`. Template skills are not evaluated automatically unless promoted to one of those directories.


CONTRIBUTING.md says the eval runner and schema validator scan only .squad/skills/ and .copilot/skills/, and that template skills aren’t evaluated automatically (lines 17–18). However, both run-evals.mjs and validate-schema.mjs currently include templates/skills in SKILL_DIRS, so templates are being evaluated/validated. Please update this section to match the actual behavior (or adjust the scripts if templates truly shouldn’t be in the scan set).

Suggested change

The eval runner and schema validator scan `.squad/skills/` and `.copilot/skills/`. Template skills are not evaluated automatically unless promoted to one of those directories.

The eval runner and schema validator scan `.squad/skills/`, `.copilot/skills/`, and `templates/skills/`. Template skills are therefore evaluated and schema-validated automatically alongside the other skill directories.

Fixed in d231d6b — updated to: 'The eval runner and schema validator scan .squad/skills/, .copilot/skills/, and templates/skills/.' Matches actual script behavior.

Copilot · 2026-04-03T18:29:56Z

+  last_validated: "2026-01-01"
+```
+
+**Do not place `domain`, `confidence`, `source`, or `tools` at the top level when using `metadata:`** — the SDK flattens the block and field collisions cause silent errors.


This section documents domain, confidence, and source as optional standard top-level fields (lines 34–40), but later the same doc says not to place domain/confidence/source/tools at the top level when using metadata: (line 64). Given the current parsers flatten nested metadata anyway, this guidance is internally inconsistent and will confuse authors. Please clarify the single intended rule (e.g., always top-level, or always inside metadata with a parser that preserves nesting).

Suggested change

**Do not place `domain`, `confidence`, `source`, or `tools` at the top level when using `metadata:`** — the SDK flattens the block and field collisions cause silent errors.

**Keep standard fields and SDK extensions at the top level** — `domain`, `confidence`, `source`, `triggers`, and `roles` should not be nested under `metadata:`. Use `metadata:` only for custom fields like `author`, `version`, or `last_validated`, and do not repeat top-level keys inside `metadata:` because the SDK may flatten the block and cause silent collisions.

Fixed in d231d6b — consolidated schema section to a single consistent rule. All non-spec fields (domain, confidence, source, triggers, roles, compatibility) go in metadata:. Only name, description, license, allowed-tools at top-level.

Copilot · 2026-04-03T18:29:56Z

 5. Add to team.md roster.
 6. Add routing entries to routing.md.
-7. Say: *"✅ {CastName} joined the team as {Role}."*
+7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read `.squad/templates/workflow-wiring-guide.md` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows.


This template now points readers to .squad/templates/workflow-wiring-guide.md, but that file doesn’t exist in the repo (current .squad/templates contains the two appendix files under .squad-templates/ instead). Either add the referenced guide under .squad/templates/, or update this reference to the actual wiring docs you added (e.g., the new workflow-wiring-appendix-*.md files).

Suggested change

7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read `.squad/templates/workflow-wiring-guide.md` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows.

7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read the workflow wiring appendix docs under `.squad-templates/` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows.

This file (.squad-templates/squad.agent.md) came from the upstream merge — not changed by this PR. The missing reference is a pre-existing issue. Should be addressed in a separate PR.

sytone

Addressed all 6 actionable review comments in commit d231d6b. Comment #7 (squad.agent.md line 918) is from upstream merge, not this PR's changes.

…l framework Migrates all 34 skills across .squad/skills/, .copilot/skills/, and templates/skills/ to the agentskills.io specification schema: - name, description, license as top-level fields (spec required) - domain, confidence, source, triggers, roles, compatibility in metadata map - tools arrays converted to allowed-tools strings - Skills without frontmatter get complete --- blocks added - Body content of all SKILL.md files unchanged (frontmatter only) Adds three-phase eval framework: - Phase 1 (run-evals.mjs): keyword matching, 88.9% baseline, CI-ready - Phase 2 (run-llm-evals.mjs): LLM trigger + execution evals via Copilot CLI - Phase 3 (optimize-description.mjs): iterative description optimization loop - Schema validator (validate-schema.mjs): frontmatter compliance checker - 31 trigger eval fixtures (342 cases) + 10 execution eval fixtures - CONTRIBUTING.md, skill-review-checklist.md, eval README Adds SPAN (Skill Curator) team member for skill quality gating. Spec: https://agentskills.io/specification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sytone · 2026-04-03T19:39:02Z

Closing to rebuild with clean commit history — upstream merge artifacts leaked into the diff. Will reopen with clean branch.

Copilot AI review requested due to automatic review settings April 3, 2026 18:24

Copilot started reviewing on behalf of sytone April 3, 2026 18:24 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

sytone force-pushed the squad/skills-schema-alignment branch from a90d701 to 25af615 Compare April 3, 2026 18:37

sytone commented Apr 3, 2026

View reviewed changes

sytone force-pushed the squad/skills-schema-alignment branch from d231d6b to dc62621 Compare April 3, 2026 19:24

sytone closed this Apr 3, 2026

sytone deleted the squad/skills-schema-alignment branch April 3, 2026 19:39

sytone mentioned this pull request Apr 3, 2026

feat(skills): align YAML frontmatter to agentskills.io spec + add eval framework #798

Open

7 tasks

		triggers: [keyword1, keyword2, keyword3]
		roles: [developer, tester]

	The eval runner and schema validator scan `.squad/skills/` and `.copilot/skills/`. Template skills are not evaluated automatically unless promoted to one of those directories.
	The eval runner and schema validator scan `.squad/skills/`, `.copilot/skills/`, and `templates/skills/`. Template skills are therefore evaluated and schema-validated automatically alongside the other skill directories.

	Do not place `domain`, `confidence`, `source`, or `tools` at the top level when using `metadata:` — the SDK flattens the block and field collisions cause silent errors.
	Keep standard fields and SDK extensions at the top level — `domain`, `confidence`, `source`, `triggers`, and `roles` should not be nested under `metadata:`. Use `metadata:` only for custom fields like `author`, `version`, or `last_validated`, and do not repeat top-level keys inside `metadata:` because the SDK may flatten the block and cause silent collisions.

Conversation

sytone commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Testing

Docs

Breaking Changes

Waivers

Uh oh!

github-actions Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛫 PR Readiness Check

⚠️ 3 item(s) to address before review

Uh oh!

sytone commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sytone left a comment

Choose a reason for hiding this comment

Uh oh!

sytone commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sytone commented Apr 3, 2026 •

edited

Loading

github-actions Bot commented Apr 3, 2026 •

edited

Loading