fix: skills schema alignment and eval - Part one of skills normalization#787
fix: skills schema alignment and eval - Part one of skills normalization#787sytone wants to merge 1 commit intobradygaster:devfrom
Conversation
🛫 PR Readiness Check
|
| Status | Check | Details |
|---|---|---|
| ✅ | Single commit | 1 commit — clean history |
| ✅ | Not in draft | Ready for review |
| ✅ | Branch up to date | Up to date with dev |
| ❌ | Copilot review | No Copilot review yet — it may still be processing |
| ✅ | Changeset present | No source files changed — changeset not required |
| ✅ | Scope clean | |
| ✅ | No merge conflicts | No merge conflicts |
| ❌ | Copilot threads resolved | 7 unresolved Copilot thread(s) — fix and resolve before merging |
| ❌ | CI passing | No CI checks have run yet |
This check runs automatically on every push. Fix any ❌ items and push again.
See CONTRIBUTING.md and PR Requirements for details.
|
FYI, this is still in progress. Just getting the initial PR out. This is a tidy house before looking at skill content and updating to be agnostic of the squad repo. |
There was a problem hiding this comment.
Pull request overview
This PR begins “skills normalization” by updating many SKILL.md files to a new frontmatter shape (license/metadata/triggers/roles) and introducing a local skill schema validator + keyword-based eval fixtures to verify skill discoverability.
Changes:
- Normalize skill frontmatter across
.copilot/skills/,.squad/skills/, andtemplates/skills/(addlicense,metadata,triggers,roles,allowed-tools). - Add/expand the skill eval framework: new fixtures, docs, and a new
validate-schema.mjsvalidator. - Add SPAN (“Skill Curator”) agent + supporting templates/docs for skill quality and workflow wiring.
Reviewed changes
Copilot reviewed 83 out of 83 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| templates/skills/squad-conventions/SKILL.md | Normalize skill frontmatter for template skill. |
| templates/skills/rework-rate/SKILL.md | Normalize frontmatter; replace tools: with allowed-tools. |
| templates/skills/nap/SKILL.md | Convert markdown-only skill to YAML-frontmatter skill. |
| .squad/templates/skill.md | Update skill authoring template for the new schema. |
| .squad/templates/skill-review-checklist.md | Add a reviewer checklist for skill PRs. |
| .squad/templates/agents/challenger.md | Add challenger agent template. |
| .squad/team.md | Add SPAN to team roster. |
| .squad/skills/versioning-policy/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/session-recovery/SKILL.md | Normalize frontmatter + introduce allowed-tools. |
| .squad/skills/release-process/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/ralph-two-pass-scan/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/pr-screenshots/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/personal-squad/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/model-selection/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/humanizer/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/gh-auth-isolation/SKILL.md | Normalize frontmatter + introduce allowed-tools. |
| .squad/skills/fact-checking/SKILL.md | Add new fact-checking skill. |
| .squad/skills/external-comms/SKILL.md | Normalize frontmatter + introduce allowed-tools. |
| .squad/skills/evals/versioning-policy.eval.yaml | Add eval fixture for versioning-policy. |
| .squad/skills/evals/validate-schema.mjs | Add schema validator for SKILL.md frontmatter + eval coverage reporting. |
| .squad/skills/evals/squad-conventions.eval.yaml | Add eval fixture for squad-conventions. |
| .squad/skills/evals/session-recovery.eval.yaml | Add eval fixture for session-recovery. |
| .squad/skills/evals/secret-handling.eval.yaml | Add eval fixture for secret-handling. |
| .squad/skills/evals/rework-rate.eval.yaml | Add eval fixture for rework-rate. |
| .squad/skills/evals/reviewer-protocol.eval.yaml | Add eval fixture for reviewer-protocol. |
| .squad/skills/evals/reskill.eval.yaml | Add eval fixture for reskill. |
| .squad/skills/evals/release-process.eval.yaml | Add eval fixture for release-process. |
| .squad/skills/evals/README.md | Document how to run skill evals and what fixtures mean. |
| .squad/skills/evals/ralph-two-pass-scan.eval.yaml | Add eval fixture for ralph-two-pass-scan. |
| .squad/skills/evals/pr-screenshots.eval.yaml | Add eval fixture for pr-screenshots. |
| .squad/skills/evals/personal-squad.eval.yaml | Add eval fixture for personal-squad. |
| .squad/skills/evals/nap.eval.yaml | Add eval fixture for nap. |
| .squad/skills/evals/model-selection.eval.yaml | Add eval fixture for model-selection. |
| .squad/skills/evals/init-mode.eval.yaml | Add eval fixture for init-mode. |
| .squad/skills/evals/humanizer.eval.yaml | Add eval fixture for humanizer. |
| .squad/skills/evals/history-hygiene.eval.yaml | Add eval fixture for history-hygiene. |
| .squad/skills/evals/github-multi-account.eval.yaml | Add eval fixture for github-multi-account. |
| .squad/skills/evals/git-workflow.eval.yaml | Add eval fixture for git-workflow. |
| .squad/skills/evals/gh-auth-isolation.eval.yaml | Add eval fixture for gh-auth-isolation. |
| .squad/skills/evals/fact-checking.eval.yaml | Add eval fixture for fact-checking. |
| .squad/skills/evals/external-comms.eval.yaml | Add eval fixture for external-comms. |
| .squad/skills/evals/economy-mode.eval.yaml | Add eval fixture for economy-mode. |
| .squad/skills/evals/distributed-mesh.eval.yaml | Add eval fixture for distributed-mesh. |
| .squad/skills/evals/cross-squad.eval.yaml | Add eval fixture for cross-squad. |
| .squad/skills/evals/cross-machine-coordination.eval.yaml | Add eval fixture for cross-machine-coordination. |
| .squad/skills/evals/client-compatibility.eval.yaml | Add eval fixture for client-compatibility. |
| .squad/skills/evals/cli-wiring.eval.yaml | Add eval fixture for cli-wiring. |
| .squad/skills/evals/ci-validation-gates.eval.yaml | Add eval fixture for ci-validation-gates. |
| .squad/skills/evals/architectural-proposals.eval.yaml | Add eval fixture for architectural-proposals. |
| .squad/skills/evals/agent-conduct.eval.yaml | Add eval fixture for agent-conduct. |
| .squad/skills/evals/agent-collaboration.eval.yaml | Add eval fixture for agent-collaboration. |
| .squad/skills/economy-mode/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .squad/skills/cross-squad/SKILL.md | Normalize frontmatter + introduce allowed-tools. |
| .squad/skills/CONTRIBUTING.md | Add skill authoring/reviewing guide and validation commands. |
| .squad/skill.md | Update root skill template to new schema. |
| .squad/routing.md | Add SPAN ownership entry for skill quality & eval. |
| .squad/casting/registry.json | Register SPAN in casting registry. |
| .squad/agents/span/history.md | Seed SPAN agent history. |
| .squad/agents/span/charter.md | Add SPAN agent charter. |
| .squad-templates/workflow-wiring-appendix-b-documenter.md | Add wiring walkthrough for a documenter follow-up role. |
| .squad-templates/workflow-wiring-appendix-a-code-reviewer.md | Add wiring walkthrough for an enforced code reviewer gate. |
| .squad-templates/squad.agent.md | Update coordinator template to emphasize enforcement wiring for gating roles. |
| .copilot/skills/squad-conventions/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/reviewer-protocol/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/reskill/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/init-mode/SKILL.md | Normalize frontmatter + introduce allowed-tools. |
| .copilot/skills/history-hygiene/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/github-multi-account/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/client-compatibility/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/cli-wiring/SKILL.md | Add YAML frontmatter to previously markdown-only skill. |
| .copilot/skills/ci-validation-gates/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/architectural-proposals/SKILL.md | Normalize frontmatter + introduce allowed-tools. |
| .copilot/skills/agent-conduct/SKILL.md | Normalize frontmatter + add triggers/roles. |
| .copilot/skills/agent-collaboration/SKILL.md | Normalize frontmatter + add triggers/roles. |
| const nestedObj = parseSimpleYaml(nested); | ||
| const hasKeys = Object.keys(nestedObj).length > 0; | ||
| if (hasKeys) { | ||
| Object.assign(result, nestedObj); | ||
| result[key] = nestedObj; | ||
| } else { |
There was a problem hiding this comment.
parseSimpleYaml() flattens nested maps into the parent via Object.assign(result, nestedObj) while also keeping result[key] = nestedObj. That guarantees fields like domain/confidence/source appear both at top-level and inside metadata whenever they’re written under metadata, so the validator’s warning in validateSkill() can’t be satisfied (it will warn even when the author follows the template). Consider either (1) not flattening nested maps (keep metadata truly nested), or (2) if flattening is intentional for the scoring engine, update the disallowed-top-level check to inspect the raw frontmatter lines/indentation instead of the flattened object.
There was a problem hiding this comment.
Fixed in d231d6b — removed Object.assign(result, nestedObj) so nested maps stay nested. Updated validator to check top-level independently of metadata. Fields inside metadata: no longer trigger false warnings.
| - [ ] **Frontmatter fields** are at top-level (not nested in `metadata:`) | ||
| - `triggers` and `roles` must not be nested | ||
| - Non-standard fields go in `metadata:` block only |
There was a problem hiding this comment.
The checklist says triggers and roles must not be nested in metadata: (lines 14–16), but the updated skill templates and many skills in this PR place triggers/roles under metadata:. Please align the checklist with the actual schema you want to enforce (either require top-level triggers/roles everywhere, or update templates + validator accordingly).
There was a problem hiding this comment.
Fixed in d231d6b — checklist now states: name, description, license, allowed-tools are top-level. domain, confidence, source, triggers, roles, compatibility go inside metadata:.
| metadata: | ||
| domain: "{e.g., testing, api-design, error-handling}" | ||
| confidence: "low|medium|high" | ||
| source: "{how this was learned: manual, observed, earned}" | ||
| compatibility: "GitHub Copilot CLI, VS Code Copilot Chat" | ||
| triggers: [keyword1, keyword2, keyword3] | ||
| roles: [developer, tester] |
There was a problem hiding this comment.
This skill template nests triggers and roles under metadata:. The SDK skill loader currently reads triggers and roles as top-level frontmatter keys (see packages/squad-sdk/src/skills/skill-loader.ts:129-134), and the contributing guide in this PR also documents them as top-level SDK extensions. To avoid breaking skill discovery when a real YAML parser is introduced, move triggers and roles out of metadata: and keep metadata: for non-standard fields only.
There was a problem hiding this comment.
Fixed in d231d6b — kept triggers/roles in metadata: per project decision (user explicitly requested this placement). Added a comment explaining the SDK's simple parser flattens metadata, so these are accessible at runtime. If a full YAML parser is adopted later, they should move to top-level.
| triggers: [keyword1, keyword2, keyword3] | ||
| roles: [developer, tester] |
There was a problem hiding this comment.
Same issue as .squad/templates/skill.md: this template nests triggers and roles under metadata: even though the runtime loader expects triggers/roles as top-level keys. Consider moving triggers and roles out of metadata: to keep the frontmatter forward-compatible with proper YAML parsing.
| triggers: [keyword1, keyword2, keyword3] | |
| roles: [developer, tester] | |
| triggers: [keyword1, keyword2, keyword3] | |
| roles: [developer, tester] |
There was a problem hiding this comment.
Fixed alongside comment #3 — same SDK-flattening comment added to both template files.
| | `.copilot/skills/` | Coordinator playbook — skills loaded by the Copilot CLI coordinator | | ||
| | `templates/skills/` | Product templates — reusable skill scaffolds for new projects | | ||
|
|
||
| The eval runner and schema validator scan `.squad/skills/` and `.copilot/skills/`. Template skills are not evaluated automatically unless promoted to one of those directories. |
There was a problem hiding this comment.
CONTRIBUTING.md says the eval runner and schema validator scan only .squad/skills/ and .copilot/skills/, and that template skills aren’t evaluated automatically (lines 17–18). However, both run-evals.mjs and validate-schema.mjs currently include templates/skills in SKILL_DIRS, so templates are being evaluated/validated. Please update this section to match the actual behavior (or adjust the scripts if templates truly shouldn’t be in the scan set).
| The eval runner and schema validator scan `.squad/skills/` and `.copilot/skills/`. Template skills are not evaluated automatically unless promoted to one of those directories. | |
| The eval runner and schema validator scan `.squad/skills/`, `.copilot/skills/`, and `templates/skills/`. Template skills are therefore evaluated and schema-validated automatically alongside the other skill directories. |
There was a problem hiding this comment.
Fixed in d231d6b — updated to: 'The eval runner and schema validator scan .squad/skills/, .copilot/skills/, and templates/skills/.' Matches actual script behavior.
| last_validated: "2026-01-01" | ||
| ``` | ||
|
|
||
| **Do not place `domain`, `confidence`, `source`, or `tools` at the top level when using `metadata:`** — the SDK flattens the block and field collisions cause silent errors. |
There was a problem hiding this comment.
This section documents domain, confidence, and source as optional standard top-level fields (lines 34–40), but later the same doc says not to place domain/confidence/source/tools at the top level when using metadata: (line 64). Given the current parsers flatten nested metadata anyway, this guidance is internally inconsistent and will confuse authors. Please clarify the single intended rule (e.g., always top-level, or always inside metadata with a parser that preserves nesting).
| **Do not place `domain`, `confidence`, `source`, or `tools` at the top level when using `metadata:`** — the SDK flattens the block and field collisions cause silent errors. | |
| **Keep standard fields and SDK extensions at the top level** — `domain`, `confidence`, `source`, `triggers`, and `roles` should not be nested under `metadata:`. Use `metadata:` only for custom fields like `author`, `version`, or `last_validated`, and do not repeat top-level keys inside `metadata:` because the SDK may flatten the block and cause silent collisions. |
There was a problem hiding this comment.
Fixed in d231d6b — consolidated schema section to a single consistent rule. All non-spec fields (domain, confidence, source, triggers, roles, compatibility) go in metadata:. Only name, description, license, allowed-tools at top-level.
| 5. Add to team.md roster. | ||
| 6. Add routing entries to routing.md. | ||
| 7. Say: *"✅ {CastName} joined the team as {Role}."* | ||
| 7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read `.squad/templates/workflow-wiring-guide.md` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows. |
There was a problem hiding this comment.
This template now points readers to .squad/templates/workflow-wiring-guide.md, but that file doesn’t exist in the repo (current .squad/templates contains the two appendix files under .squad-templates/ instead). Either add the referenced guide under .squad/templates/, or update this reference to the actual wiring docs you added (e.g., the new workflow-wiring-appendix-*.md files).
| 7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read `.squad/templates/workflow-wiring-guide.md` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows. | |
| 7. **Wire enforcement (if applicable).** If the new member's role involves gating other agents' work (reviewer, design approver, quality gate), add a numbered enforcement rule to `routing.md` → Rules section. A routing table entry (step 6) only handles explicit requests — enforcement rules are required for automatic gates. Read the workflow wiring appendix docs under `.squad-templates/` for the full wiring process, including walkthroughs for common role types (code reviewer, documenter). Check `.squad/templates/issue-lifecycle.md` for lifecycle integration if the project uses PR-gated workflows. |
There was a problem hiding this comment.
This file (.squad-templates/squad.agent.md) came from the upstream merge — not changed by this PR. The missing reference is a pre-existing issue. Should be addressed in a separate PR.
a90d701 to
25af615
Compare
…l framework Migrates all 34 skills across .squad/skills/, .copilot/skills/, and templates/skills/ to the agentskills.io specification schema: - name, description, license as top-level fields (spec required) - domain, confidence, source, triggers, roles, compatibility in metadata map - tools arrays converted to allowed-tools strings - Skills without frontmatter get complete --- blocks added - Body content of all SKILL.md files unchanged (frontmatter only) Adds three-phase eval framework: - Phase 1 (run-evals.mjs): keyword matching, 88.9% baseline, CI-ready - Phase 2 (run-llm-evals.mjs): LLM trigger + execution evals via Copilot CLI - Phase 3 (optimize-description.mjs): iterative description optimization loop - Schema validator (validate-schema.mjs): frontmatter compliance checker - 31 trigger eval fixtures (342 cases) + 10 execution eval fixtures - CONTRIBUTING.md, skill-review-checklist.md, eval README Adds SPAN (Skill Curator) team member for skill quality gating. Spec: https://agentskills.io/specification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d231d6b to
dc62621
Compare
|
Closing to rebuild with clean commit history — upstream merge artifacts leaked into the diff. Will reopen with clean branch. |
What
Aligns all skill YAML frontmatter across 3 canonical locations to the agentskills.io specification and adds a comprehensive eval framework for testing skill trigger quality.
Part one of a multi-PR skills normalization effort — this PR touches ONLY frontmatter (no skill body content changes) and adds the eval tooling.
Why
domain,confidence,sourceas top-level fields,toolsarrays, some skills had no frontmatter at all)How
Schema alignment (34 skills across 3 directories):
.squad/skills/(14 skills) — team-level patterns.copilot/skills/(17 skills) — coordinator playbooktemplates/skills/(3 skills) — product templatesAll skills now have:
name,description,licenseas top-level (agentskills.io required)domain,confidence,source,triggers,roles,compatibilityinsidemetadata:mapallowed-toolsas top-level string where applicable (agentskills.io optional)---blocks addedEval framework (new):
run-evals.mjs: keyword-based trigger matching (fast, CI-ready). 88.9% baseline.run-llm-evals.mjs: LLM-based trigger + execution evals via Copilot CLI models. Supports--type trigger|exec|all,--runs Nfor nondeterminism,--splitfor train/validation.optimize-description.mjs: iterative description optimization loop with train/validation split.validate-schema.mjs: checks frontmatter compliance across all directories.New team member:
Docs:
CONTRIBUTING.md— skill creation/modification workflow, schema reference, eval guideskill-review-checklist.md— 40+ checkpoint review gateREADME.md— eval framework documentationTesting
node .squad/skills/evals/validate-schema.mjspasses (34/34 skills valid)node .squad/skills/evals/run-evals.mjspasses (88.9%, 304/342)node .squad/skills/evals/run-llm-evals.mjs --dry-runworks for all modesnpm run buildandnpm testnot applicableDocs
Breaking Changes
None — frontmatter-only changes. The SDK's simple YAML parser flattens nested metadata, so
triggers/rolesinsidemetadata:are still accessible as top-level fields at runtime.Waivers
packages/squad-sdk/src/orpackages/squad-cli/src/changes)