Skip to content

fix: skills schema alignment and eval - Part one of skills normalization#787

Closed
sytone wants to merge 1 commit intobradygaster:devfrom
sytone:squad/skills-schema-alignment
Closed

fix: skills schema alignment and eval - Part one of skills normalization#787
sytone wants to merge 1 commit intobradygaster:devfrom
sytone:squad/skills-schema-alignment

Conversation

@sytone
Copy link
Copy Markdown

@sytone sytone commented Apr 3, 2026

What

Aligns all skill YAML frontmatter across 3 canonical locations to the agentskills.io specification and adds a comprehensive eval framework for testing skill trigger quality.

Part one of a multi-PR skills normalization effort — this PR touches ONLY frontmatter (no skill body content changes) and adds the eval tooling.

Why

  • Skills used an inconsistent internal schema (domain, confidence, source as top-level fields, tools arrays, some skills had no frontmatter at all)
  • No mechanism existed to validate skill descriptions trigger correctly on user prompts
  • No contribution guide or review checklist existed for skill quality

How

Schema alignment (34 skills across 3 directories):

  • .squad/skills/ (14 skills) — team-level patterns
  • .copilot/skills/ (17 skills) — coordinator playbook
  • templates/skills/ (3 skills) — product templates

All skills now have:

  • name, description, license as top-level (agentskills.io required)
  • domain, confidence, source, triggers, roles, compatibility inside metadata: map
  • allowed-tools as top-level string where applicable (agentskills.io optional)
  • Skills without frontmatter got complete --- blocks added

Eval framework (new):

  • Phase 1run-evals.mjs: keyword-based trigger matching (fast, CI-ready). 88.9% baseline.
  • Phase 2run-llm-evals.mjs: LLM-based trigger + execution evals via Copilot CLI models. Supports --type trigger|exec|all, --runs N for nondeterminism, --split for train/validation.
  • Phase 3optimize-description.mjs: iterative description optimization loop with train/validation split.
  • Schema validatorvalidate-schema.mjs: checks frontmatter compliance across all directories.
  • 31 trigger eval fixtures (342 test cases) + 10 execution eval fixtures with LLM-as-judge assertion grading.

New team member:

  • 🔍 SPAN (Skill Curator) — owns skill quality, schema compliance, eval coverage, and trigger testing. Gates skill PRs on eval pass rates.

Docs:

  • CONTRIBUTING.md — skill creation/modification workflow, schema reference, eval guide
  • skill-review-checklist.md — 40+ checkpoint review gate
  • README.md — eval framework documentation

Testing

  • node .squad/skills/evals/validate-schema.mjs passes (34/34 skills valid)
  • node .squad/skills/evals/run-evals.mjs passes (88.9%, 304/342)
  • node .squad/skills/evals/run-llm-evals.mjs --dry-run works for all modes
  • No source code changes — npm run build and npm test not applicable

Docs

  • CONTRIBUTING.md for skills
  • Eval framework README
  • Skill review checklist template

Breaking Changes

None — frontmatter-only changes. The SDK's simple YAML parser flattens nested metadata, so triggers/roles inside metadata: are still accessible as top-level fields at runtime.

Waivers

  • No CHANGELOG entry needed (no packages/squad-sdk/src/ or packages/squad-cli/src/ changes)
  • File count is high (89 files) because this touches 34 skill files + 31 eval fixtures + scripts + docs — all intentional

Copilot AI review requested due to automatic review settings April 3, 2026 18:24
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

🛫 PR Readiness Check

⚠️ 3 item(s) to address before review

Status Check Details
Single commit 1 commit — clean history
Not in draft Ready for review
Branch up to date Up to date with dev
Copilot review No Copilot review yet — it may still be processing
Changeset present No source files changed — changeset not required
Scope clean ⚠️ PR includes 70 .squad/ file(s) — ensure these are intentional
No merge conflicts No merge conflicts
Copilot threads resolved 7 unresolved Copilot thread(s) — fix and resolve before merging
CI passing No CI checks have run yet

This check runs automatically on every push. Fix any ❌ items and push again.
See CONTRIBUTING.md and PR Requirements for details.

@sytone
Copy link
Copy Markdown
Author

sytone commented Apr 3, 2026

FYI, this is still in progress. Just getting the initial PR out. This is a tidy house before looking at skill content and updating to be agnostic of the squad repo.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR begins “skills normalization” by updating many SKILL.md files to a new frontmatter shape (license/metadata/triggers/roles) and introducing a local skill schema validator + keyword-based eval fixtures to verify skill discoverability.

Changes:

  • Normalize skill frontmatter across .copilot/skills/, .squad/skills/, and templates/skills/ (add license, metadata, triggers, roles, allowed-tools).
  • Add/expand the skill eval framework: new fixtures, docs, and a new validate-schema.mjs validator.
  • Add SPAN (“Skill Curator”) agent + supporting templates/docs for skill quality and workflow wiring.

Reviewed changes

Copilot reviewed 83 out of 83 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
templates/skills/squad-conventions/SKILL.md Normalize skill frontmatter for template skill.
templates/skills/rework-rate/SKILL.md Normalize frontmatter; replace tools: with allowed-tools.
templates/skills/nap/SKILL.md Convert markdown-only skill to YAML-frontmatter skill.
.squad/templates/skill.md Update skill authoring template for the new schema.
.squad/templates/skill-review-checklist.md Add a reviewer checklist for skill PRs.
.squad/templates/agents/challenger.md Add challenger agent template.
.squad/team.md Add SPAN to team roster.
.squad/skills/versioning-policy/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/session-recovery/SKILL.md Normalize frontmatter + introduce allowed-tools.
.squad/skills/release-process/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/ralph-two-pass-scan/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/pr-screenshots/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/personal-squad/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/model-selection/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/humanizer/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/gh-auth-isolation/SKILL.md Normalize frontmatter + introduce allowed-tools.
.squad/skills/fact-checking/SKILL.md Add new fact-checking skill.
.squad/skills/external-comms/SKILL.md Normalize frontmatter + introduce allowed-tools.
.squad/skills/evals/versioning-policy.eval.yaml Add eval fixture for versioning-policy.
.squad/skills/evals/validate-schema.mjs Add schema validator for SKILL.md frontmatter + eval coverage reporting.
.squad/skills/evals/squad-conventions.eval.yaml Add eval fixture for squad-conventions.
.squad/skills/evals/session-recovery.eval.yaml Add eval fixture for session-recovery.
.squad/skills/evals/secret-handling.eval.yaml Add eval fixture for secret-handling.
.squad/skills/evals/rework-rate.eval.yaml Add eval fixture for rework-rate.
.squad/skills/evals/reviewer-protocol.eval.yaml Add eval fixture for reviewer-protocol.
.squad/skills/evals/reskill.eval.yaml Add eval fixture for reskill.
.squad/skills/evals/release-process.eval.yaml Add eval fixture for release-process.
.squad/skills/evals/README.md Document how to run skill evals and what fixtures mean.
.squad/skills/evals/ralph-two-pass-scan.eval.yaml Add eval fixture for ralph-two-pass-scan.
.squad/skills/evals/pr-screenshots.eval.yaml Add eval fixture for pr-screenshots.
.squad/skills/evals/personal-squad.eval.yaml Add eval fixture for personal-squad.
.squad/skills/evals/nap.eval.yaml Add eval fixture for nap.
.squad/skills/evals/model-selection.eval.yaml Add eval fixture for model-selection.
.squad/skills/evals/init-mode.eval.yaml Add eval fixture for init-mode.
.squad/skills/evals/humanizer.eval.yaml Add eval fixture for humanizer.
.squad/skills/evals/history-hygiene.eval.yaml Add eval fixture for history-hygiene.
.squad/skills/evals/github-multi-account.eval.yaml Add eval fixture for github-multi-account.
.squad/skills/evals/git-workflow.eval.yaml Add eval fixture for git-workflow.
.squad/skills/evals/gh-auth-isolation.eval.yaml Add eval fixture for gh-auth-isolation.
.squad/skills/evals/fact-checking.eval.yaml Add eval fixture for fact-checking.
.squad/skills/evals/external-comms.eval.yaml Add eval fixture for external-comms.
.squad/skills/evals/economy-mode.eval.yaml Add eval fixture for economy-mode.
.squad/skills/evals/distributed-mesh.eval.yaml Add eval fixture for distributed-mesh.
.squad/skills/evals/cross-squad.eval.yaml Add eval fixture for cross-squad.
.squad/skills/evals/cross-machine-coordination.eval.yaml Add eval fixture for cross-machine-coordination.
.squad/skills/evals/client-compatibility.eval.yaml Add eval fixture for client-compatibility.
.squad/skills/evals/cli-wiring.eval.yaml Add eval fixture for cli-wiring.
.squad/skills/evals/ci-validation-gates.eval.yaml Add eval fixture for ci-validation-gates.
.squad/skills/evals/architectural-proposals.eval.yaml Add eval fixture for architectural-proposals.
.squad/skills/evals/agent-conduct.eval.yaml Add eval fixture for agent-conduct.
.squad/skills/evals/agent-collaboration.eval.yaml Add eval fixture for agent-collaboration.
.squad/skills/economy-mode/SKILL.md Normalize frontmatter + add triggers/roles.
.squad/skills/cross-squad/SKILL.md Normalize frontmatter + introduce allowed-tools.
.squad/skills/CONTRIBUTING.md Add skill authoring/reviewing guide and validation commands.
.squad/skill.md Update root skill template to new schema.
.squad/routing.md Add SPAN ownership entry for skill quality & eval.
.squad/casting/registry.json Register SPAN in casting registry.
.squad/agents/span/history.md Seed SPAN agent history.
.squad/agents/span/charter.md Add SPAN agent charter.
.squad-templates/workflow-wiring-appendix-b-documenter.md Add wiring walkthrough for a documenter follow-up role.
.squad-templates/workflow-wiring-appendix-a-code-reviewer.md Add wiring walkthrough for an enforced code reviewer gate.
.squad-templates/squad.agent.md Update coordinator template to emphasize enforcement wiring for gating roles.
.copilot/skills/squad-conventions/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/reviewer-protocol/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/reskill/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/init-mode/SKILL.md Normalize frontmatter + introduce allowed-tools.
.copilot/skills/history-hygiene/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/github-multi-account/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/client-compatibility/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/cli-wiring/SKILL.md Add YAML frontmatter to previously markdown-only skill.
.copilot/skills/ci-validation-gates/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/architectural-proposals/SKILL.md Normalize frontmatter + introduce allowed-tools.
.copilot/skills/agent-conduct/SKILL.md Normalize frontmatter + add triggers/roles.
.copilot/skills/agent-collaboration/SKILL.md Normalize frontmatter + add triggers/roles.

Comment on lines +59 to +64
const nestedObj = parseSimpleYaml(nested);
const hasKeys = Object.keys(nestedObj).length > 0;
if (hasKeys) {
Object.assign(result, nestedObj);
result[key] = nestedObj;
} else {
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseSimpleYaml() flattens nested maps into the parent via Object.assign(result, nestedObj) while also keeping result[key] = nestedObj. That guarantees fields like domain/confidence/source appear both at top-level and inside metadata whenever they’re written under metadata, so the validator’s warning in validateSkill() can’t be satisfied (it will warn even when the author follows the template). Consider either (1) not flattening nested maps (keep metadata truly nested), or (2) if flattening is intentional for the scoring engine, update the disallowed-top-level check to inspect the raw frontmatter lines/indentation instead of the flattened object.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d231d6b — removed Object.assign(result, nestedObj) so nested maps stay nested. Updated validator to check top-level independently of metadata. Fields inside metadata: no longer trigger false warnings.

Comment on lines +14 to +16
- [ ] **Frontmatter fields** are at top-level (not nested in `metadata:`)
- `triggers` and `roles` must not be nested
- Non-standard fields go in `metadata:` block only
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checklist says triggers and roles must not be nested in metadata: (lines 14–16), but the updated skill templates and many skills in this PR place triggers/roles under metadata:. Please align the checklist with the actual schema you want to enforce (either require top-level triggers/roles everywhere, or update templates + validator accordingly).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d231d6b — checklist now states: name, description, license, allowed-tools are top-level. domain, confidence, source, triggers, roles, compatibility go inside metadata:.

Comment thread .squad/templates/skill.md Outdated
Comment on lines +5 to +11
metadata:
domain: "{e.g., testing, api-design, error-handling}"
confidence: "low|medium|high"
source: "{how this was learned: manual, observed, earned}"
compatibility: "GitHub Copilot CLI, VS Code Copilot Chat"
triggers: [keyword1, keyword2, keyword3]
roles: [developer, tester]
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skill template nests triggers and roles under metadata:. The SDK skill loader currently reads triggers and roles as top-level frontmatter keys (see packages/squad-sdk/src/skills/skill-loader.ts:129-134), and the contributing guide in this PR also documents them as top-level SDK extensions. To avoid breaking skill discovery when a real YAML parser is introduced, move triggers and roles out of metadata: and keep metadata: for non-standard fields only.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d231d6b — kept triggers/roles in metadata: per project decision (user explicitly requested this placement). Added a comment explaining the SDK's simple parser flattens metadata, so these are accessible at runtime. If a full YAML parser is adopted later, they should move to top-level.

Comment thread .squad/skill.md Outdated
Comment on lines +10 to +11
triggers: [keyword1, keyword2, keyword3]
roles: [developer, tester]
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as .squad/templates/skill.md: this template nests triggers and roles under metadata: even though the runtime loader expects triggers/roles as top-level keys. Consider moving triggers and roles out of metadata: to keep the frontmatter forward-compatible with proper YAML parsing.

Suggested change
triggers: [keyword1, keyword2, keyword3]
roles: [developer, tester]
triggers: [keyword1, keyword2, keyword3]
roles: [developer, tester]

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed alongside comment #3 — same SDK-flattening comment added to both template files.

Comment thread .squad/skills/CONTRIBUTING.md Outdated
| `.copilot/skills/` | Coordinator playbook — skills loaded by the Copilot CLI coordinator |
| `templates/skills/` | Product templates — reusable skill scaffolds for new projects |

The eval runner and schema validator scan `.squad/skills/` and `.copilot/skills/`. Template skills are not evaluated automatically unless promoted to one of those directories.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTRIBUTING.md says the eval runner and schema validator scan only .squad/skills/ and .copilot/skills/, and that template skills aren’t evaluated automatically (lines 17–18). However, both run-evals.mjs and validate-schema.mjs currently include templates/skills in SKILL_DIRS, so templates are being evaluated/validated. Please update this section to match the actual behavior (or adjust the scripts if templates truly shouldn’t be in the scan set).

Suggested change
The eval runner and schema validator scan `.squad/skills/` and `.copilot/skills/`. Template skills are not evaluated automatically unless promoted to one of those directories.
The eval runner and schema validator scan `.squad/skills/`, `.copilot/skills/`, and `templates/skills/`. Template skills are therefore evaluated and schema-validated automatically alongside the other skill directories.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d231d6b — updated to: 'The eval runner and schema validator scan .squad/skills/, .copilot/skills/, and templates/skills/.' Matches actual script behavior.

Comment thread .squad/skills/CONTRIBUTING.md Outdated
last_validated: "2026-01-01"
```

**Do not place `domain`, `confidence`, `source`, or `tools` at the top level when using `metadata:`** — the SDK flattens the block and field collisions cause silent errors.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section documents domain, confidence, and source as optional standard top-level fields (lines 34–40), but later the same doc says not to place domain/confidence/source/tools at the top level when using metadata: (line 64). Given the current parsers flatten nested metadata anyway, this guidance is internally inconsistent and will confuse authors. Please clarify the single intended rule (e.g., always top-level, or always inside metadata with a parser that preserves nesting).

Suggested change
**Do not place `domain`, `confidence`, `source`, or `tools` at the top level when using `metadata:`** the SDK flattens the block and field collisions cause silent errors.
**Keep standard fields and SDK extensions at the top level**`domain`, `confidence`, `source`, `triggers`, and `roles` should not be nested under `metadata:`. Use `metadata:` only for custom fields like `author`, `version`, or `last_validated`, and do not repeat top-level keys inside `metadata:` because the SDK may flatten the block and cause silent collisions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d231d6b — consolidated schema section to a single consistent rule. All non-spec fields (domain, confidence, source, triggers, roles, compatibility) go in metadata:. Only name, description, license, allowed-tools at top-level.

5. Add to team.md roster.
6. Add routing entries to routing.md.
7. Say: *"✅ {CastName} joined the team as {Role}."*
7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read `.squad/templates/workflow-wiring-guide.md` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template now points readers to .squad/templates/workflow-wiring-guide.md, but that file doesn’t exist in the repo (current .squad/templates contains the two appendix files under .squad-templates/ instead). Either add the referenced guide under .squad/templates/, or update this reference to the actual wiring docs you added (e.g., the new workflow-wiring-appendix-*.md files).

Suggested change
7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read `.squad/templates/workflow-wiring-guide.md` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows.
7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read the workflow wiring appendix docs under `.squad-templates/` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (.squad-templates/squad.agent.md) came from the upstream merge — not changed by this PR. The missing reference is a pre-existing issue. Should be addressed in a separate PR.

@sytone sytone force-pushed the squad/skills-schema-alignment branch from a90d701 to 25af615 Compare April 3, 2026 18:37
Copy link
Copy Markdown
Author

@sytone sytone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all 6 actionable review comments in commit d231d6b. Comment #7 (squad.agent.md line 918) is from upstream merge, not this PR's changes.

…l framework

Migrates all 34 skills across .squad/skills/, .copilot/skills/, and
templates/skills/ to the agentskills.io specification schema:
- name, description, license as top-level fields (spec required)
- domain, confidence, source, triggers, roles, compatibility in metadata map
- tools arrays converted to allowed-tools strings
- Skills without frontmatter get complete --- blocks added
- Body content of all SKILL.md files unchanged (frontmatter only)

Adds three-phase eval framework:
- Phase 1 (run-evals.mjs): keyword matching, 88.9% baseline, CI-ready
- Phase 2 (run-llm-evals.mjs): LLM trigger + execution evals via Copilot CLI
- Phase 3 (optimize-description.mjs): iterative description optimization loop
- Schema validator (validate-schema.mjs): frontmatter compliance checker
- 31 trigger eval fixtures (342 cases) + 10 execution eval fixtures
- CONTRIBUTING.md, skill-review-checklist.md, eval README

Adds SPAN (Skill Curator) team member for skill quality gating.

Spec: https://agentskills.io/specification

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sytone sytone force-pushed the squad/skills-schema-alignment branch from d231d6b to dc62621 Compare April 3, 2026 19:24
@sytone
Copy link
Copy Markdown
Author

sytone commented Apr 3, 2026

Closing to rebuild with clean commit history — upstream merge artifacts leaked into the diff. Will reopen with clean branch.

@sytone sytone closed this Apr 3, 2026
@sytone sytone deleted the squad/skills-schema-alignment branch April 3, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants