feat(ce-compound): add frontmatter parser-safety validator#697
Conversation
…validator Add a stdlib-only Python script that ce-compound and ce-compound-refresh run after writing a doc, to catch the YAML parser-safety bug class that prose rules miss. PR #695 had a Codex review catch a real instance of this on safe-auto-rubric-calibration-2026-04-25.md: `related_pr: PR #685 ...` silently truncated to `'PR'` because YAML treats unquoted ' #' as a comment delimiter. The prose rules in references/yaml-schema.md cover this for array items but not for top-level scalars; the validator closes that gap. ## What changed - `plugins/compound-engineering/skills/ce-compound/scripts/validate-frontmatter` — bash wrapper, executable - `plugins/compound-engineering/skills/ce-compound/scripts/validate-frontmatter.py` — Python implementation, no third-party deps - Same pair duplicated under `ce-compound-refresh/scripts/` (per AGENTS.md no-cross-skill-shared-files rule). Tests assert content identity. - SKILL.md updates wire `bash scripts/validate-frontmatter <doc-path>` into the post-write step in both skills, with retry-on-fail loop. - `tests/frontmatter-validator.test.ts` — 24 cases covering valid frontmatter, each failure mode, usage errors, and identity across the two skill copies. ## Why stdlib-only (no PyYAML) Original draft required PyYAML for parser-fidelity validation. Dropped after considering distribution: PyYAML is not Python stdlib; macOS system Python doesn't include it, mise/asdf-installed Pythons don't, and many user environments would need `pip install pyyaml` before /ce-compound works end-to-end. The truncation/quoting bug class is purely regex-detectable — ` #` substring, `: ` substring, leading `-`/`*`/`&`/`!`/etc. — so dropping the YAML parser dependency costs nothing for the bug class we care about. ## Scope: parser-safety only, not full schema validation The validator catches *parser-safety* issues (silent data loss from YAML's quoting rules). It does NOT validate against schema.yaml's required-field or enum-value rules. A repo-wide sweep of existing docs/solutions/ found that 17+ docs have stale schema mismatches (component values like `subagent-template` or `installer` that aren't in the enum, missing required fields, etc.). Fixing those is a separate concern that depends on schema.yaml being brought current relative to ce-compound's actual usage. Filed as a follow-up; this PR ships only what catches the Codex bug class. ## Catch radius Would have caught the safe-auto-rubric-calibration-2026-04-25.md and the plan doc's title issue from PR #695 #696 — both top-level scalars with unquoted ' #' or ': '. Verified by running on synthetic broken docs in the test suite and on every existing doc in docs/solutions/ (30/30 pass on docs that have any frontmatter; 1 legit fail on a doc that has none, which is correct behavior). Tests: 934/934 pass (+24 from the new test file). release:validate clean.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 816b251907
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…python3 directly The wrapper script (validate-frontmatter that exec'd python3 validate-frontmatter.py) didn't earn its keep. It added 4 files to maintain (2 skills × 2 wrappers), a byte-identity test, an extra subprocess per call, and an indirection layer where bash quoting/path issues would surface as Python failures — all to save typing 'python3 ' and '.py' in the SKILL.md invocation. The existing repo idiom for Python skill scripts is direct invocation: ce-session-extract calls 'python3 scripts/extract-skeleton.py' in its own SKILL.md; ce-session-inventory and ce-release-notes do the same. There's no wrapper convention to consistency-match against. Drop the wrappers. Update SKILL.md in both skills to invoke 'python3 scripts/validate-frontmatter.py' directly. Drop the bash-wrapper-identity test (no wrappers left to compare). Tests: 933/933 pass (-1, the dropped wrapper-identity test). release:validate clean.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1136c4e7df
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…dicator checks Codex on PR #697 caught two false-positive vectors in the leading-reserved- indicator check: 1. `-` is a list/sequence marker in YAML only when followed by whitespace. Bare `-foo` (no space after `-`) is a valid plain scalar that strict parsers accept without complaint. The unconditional check would reject `title: -foo` and force /ce-compound to retry on frontmatter that's actually fine. 2. `?` is a complex-mapping-key marker only when followed by whitespace. Bare `?foo` is a valid plain scalar with the same story. Drop `-` from the leading-indicator check entirely. Drop `?` from the reserved tuple. The cases where they ARE problematic (`- foo` / `? foo` with whitespace) are loud parser failures, not silent misreads, so they're out of scope for this script (which targets silent-corruption bugs). The remaining reserved indicators kept in the check (`` ` ``, `*`, `&`, `!`, `|`, `>`, `%`, `@`) are special at first character without needing trailing whitespace, so the unconditional check is correct for them. Tests: - Removed the `-starts-with-dash` rejection test (was asserting wrong behavior all along — Codex's catch). - Added two positive tests that lock in the fix: valid plain scalars starting with `-` and `?` must now pass validation. Verified the YAML semantics with PyYAML: `title: -foo` parses as `{'title': '-foo'}`; `title: - foo` raises `ScannerError`. The validator's job is to catch the silent-misread bugs, not the loud-parser-error ones. Tests: 935/935 pass (+1 net from new positive tests). release:validate clean.
… full lines Codex on PR #697 caught that `text.find("\n---", 4)` accepts any line starting with `---` as a valid closing delimiter — including `----` or `---extra`. The validator returns OK on those, but downstream frontmatter readers in this repo require a delimiter line whose stripped content is exactly `---`, so malformed docs slip through and parse as body-only later. Replaced the substring search with line-based matching: split the file into lines, require both delimiters to be lines whose `rstrip()` is exactly `---`. This rejects `----` and `---extra` while still permitting trailing whitespace on a valid `---` line. The same fix applies to the opening delimiter check (which was less vulnerable but inconsistently strict — `text.startswith("---\\n")` rejected `--- \\n`). Same change duplicated to ce-compound-refresh's copy per AGENTS.md no-cross-skill-shared-files; the byte-identity test catches drift. Tests: - New: rejects '----' as closing delimiter - New: rejects '---extra' as closing delimiter - New: accepts '--- ' (with trailing whitespace) on either delimiter - Existing tests still pass Tests: 941/941 pass (+6 from new delimiter cases). release:validate clean.
…eck (out of stated scope) Per the validator's own docstring its scope is silent corruption — values that strict YAML parsers misread without raising — but the leading reserved-indicator check (`*`, `&`, `!`, `|`, `>`, `%`, `@`, `` ` ``) was catching the loud-error class instead. Those characters either parse with a different intent than the author meant (anchor declaration, alias reference, tag) or trigger explicit parser failures, which downstream parsers already surface. The validator's job is the bug class downstream parsers DON'T surface (silent truncation, silent restructuring), not lint. Background: this check was added speculatively, by extrapolating from the YAML 1.2 reserved-indicator list rather than from observed bugs. Subsequent review rounds had to walk back the `-` and `?` cases (Codex flagged them as YAML-spec-incorrect — both only act as markers when followed by whitespace). The exercise shows the check was reaching beyond the evidence-based scope. What's gone: - The unconditional first-character-against-tuple check - Its dedicated test case - Mention of reserved indicators in the docstring What stays (the actual evidence-based bug class): - Frontmatter delimiter line matching (catches `----` / `---extra` / missing closer / missing opener) - ` #` substring → silent comment truncation (the original Codex bug) - `: ` substring → silent mapping confusion (the plan-doc title bug) Tests: 939/939 pass. release:validate clean.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f848177b8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…tor's actual scope Codex on PR #697 caught that the SKILL.md instruction for invoking validate-frontmatter.py still claimed it catches "leading reserved indicators" — but the previous commit (f848177) deliberately removed that check as out-of-stated-scope (loud parser errors, not silent corruption). The doc-vs-code drift creates false confidence. Fix: reword the validator-invocation step in both skills to describe what the script actually checks (delimiter lines + ' #' truncation + ': ' mapping confusion) and explicitly call out what it does NOT check (schema rules, reserved indicators) so the boundary is clear. Tests: 939/939 pass. release:validate clean.
Summary
Frontmatter that strict YAML parsers would silently misread is now caught at write-time. After
/ce-compoundor/ce-compound-refreshwrites a doc, a stdlib-only Python script scans the frontmatter for the bug class that prose rules miss; bad output blocks success with a concrete diagnostic ("related_prvalue contains ' #' — quote it") so the agent can fix and retry until the doc is parser-safe.The original surfacing was Codex on PR #695:
related_pr: PR #685 ...parsed as just'PR'because YAML treats unquoted#as a comment delimiter. A 2026-04-16 plan doc had the same family of bug with unquoted:in atitle:field. Both are silent — parsers accept them, downstream code reads garbage, no error fires. The validator targets exactly that bug class.What the validator catches
---matched as full lines, not substrings)----,---extra, missing closer) silently parses as body-only#substring in unquoted scalar:substring in unquoted scalartitle:fieldThe validator does not flag values starting with leading reserved indicators (
`/*/&/!/\|/>/%/@) because those produce loud parser errors downstream rather than silent corruption — out of stated scope. The two YAML 1.2 indicators that need-trailing-whitespace (-,?) are also not flagged because bare-fooand?fooare valid plain scalars.Design decisions
Stdlib only, no PyYAML dependency. First draft required PyYAML for parser-fidelity validation. Dropped after considering distribution: PyYAML is not Python stdlib, macOS system Python doesn't ship it, mise/asdf-installed Pythons don't ship it. Many user environments would need
pip install pyyamlbefore/ce-compoundworks end-to-end. The bug class the validator targets is purely regex-detectable, so dropping the YAML parser dependency costs nothing.Scope: silent corruption only, not lint or schema validation. The validator catches issues YAML parsers won't surface as errors. It does NOT enforce
schema.yaml's required-field or enum rules, and does NOT flag YAML reserved indicators that produce loud failures (those are caught by the next tool in the pipeline). A repo-wide sweep found 17+ existing docs with stale schema mismatches; tightening here would block valid new docs because the schema is out of date relative to actual/ce-compoundusage. Schema alignment is a separate concern.Two skill copies, one script. Per AGENTS.md, skills are self-contained — no cross-skill file references. Both
ce-compound/scripts/andce-compound-refresh/scripts/get identical copies ofvalidate-frontmatter.py. A test asserts byte-identity to keep them from drifting. SKILL.md in each skill invokespython3 scripts/validate-frontmatter.py <doc-path>directly, matching the existing repo idiom for Python skill scripts (ce-session-extract,ce-session-inventory,ce-release-notes).Catch radius
Verified against three surfaces:
tests/frontmatter-validator.test.tscovering valid frontmatter, each silent-corruption failure mode, malformed-delimiter cases (----,---extra, trailing whitespace on---), valid plain scalars starting with-and?, and identity across the two skill copiesdocs/solutions/files: 30 pass on docs that have frontmatter, 1 legit FAIL on a doc that has no frontmatter (correct behavior —/ce-compoundwould never have written it that way)related_prtruncation + plan doctitle:confusion): both would have been caught before mergeTest plan
bun test(939/939) andbun run release:validateclean.References
ce-session-historianinefficiency notes from the same session