diff --git a/CHANGELOG.md b/CHANGELOG.md index df35b0f..d669ca5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.23.0] - 2026-04-20 + +> **Version note:** P1.3.4 was scoped as 0.22.0 per the handoff, but two envelope-conformance fixes (PR #124 telemetry, PR #125 catalog) landed on main in parallel and were released as 0.22.0 via PR #128 while this branch was in Sonnet 4.6 validator dispatch. Per `klappy://canon/constraints/release-validation-gate` Rule 3 (canon outranks session artifacts) and SemVer discipline, this refactor is re-versioned to 0.23.0. The handoff's "ship as 0.22.0" recommendation was session-scoped; main-reality is the canon. + +### Changed + +- **`oddkit_encode` trigger-word classifier migrated from regex alternation to stemmed phrase-subset matching** (per PRD D5 from P1.3.4 — split-by-fit, same matcher family shipped for challenge in 0.21.0 and gate in 0.20.0, adapted for encode's phrasal vocabulary). `EncodingTypeDef.triggerRegex: RegExp | null` is replaced with `stemmedPhrases: string[][]` — each inner array is the ordered stem sequence of a single canon trigger word or phrase, parsed once per canon fetch. The runtime matcher `matchesStemmedPhrases(phrases, inputStems)` declares a match when ALL stems of at least one phrase appear in the input stem set. Single-stem phrases degenerate to set membership (identical to the old behavior for inflection matching like `deciding` → `decid`); multi-stem phrases like `committed to` → `[committ, to]` require both stems to co-occur, so ubiquitous function words like `to`, `with`, `by`, `up`, `out`, `not` cannot fire as standalone match triggers just because they appear inside a canon phrase. This preserves the pre-refactor regex semantic where `\b(committed to)\b` matched only when both words were present. Canon trigger vocabulary reads unchanged from `odd/encoding-types/*.md` (`## Trigger Words` fenced block); the matcher tokenizes each vocabulary entry with stop-words disabled (`tokenize(word, new Set())`) and stores the ordered stem array at parse time, and intersects against a stop-word-disabled stemmed input set at runtime. Inflected forms (`deciding`, `realizing`, `discovering`) now match their canonical stems (`decid`, `realiz`, `discover`) without canon having to enumerate each inflection. **Strictly additive** over the pre-refactor regex: every input that matched still matches (both phrase conjunction and word-boundary semantics preserved), plus stemmed variations of single-word vocab now match additionally. Stop-words disabled on both parse-time and runtime `tokenize()` calls — canon vocab survival is mandatory for the strictly-additive invariant to hold, per the P1.3.3 C-04 precedent. Both classifier call sites preserve their existing semantics: `parsePrefixedBatchInput` untagged-paragraph path picks first match via `break` (one artifact per paragraph); `parseUnstructuredInput` emits one artifact per matching type (no `break` — the load-bearing design comment at L1161–1164 preserved verbatim). `tokenize(para, new Set())` is hoisted once per paragraph into an `inputStems` Set reused across the per-type loop. The phrase-subset match (all stems co-occurring, any order) was adopted mid-PR in response to a high-severity Cursor Bugbot finding on commit `259170a` — the first version's flat `stemmedTokens: Set` would have fired Decision on virtually every English paragraph because the ubiquitous function-word constituents of phrasal canon vocab (`to`, `with`) were being added as standalone singletons. Per `klappy://canon/principles/vodka-architecture`: fit the matcher to the problem shape. + +### Removed + +- **Module-level `cachedEncodingTypes` in-process cache** (per PRD D9 from P1.3.4 — don't cache microsecond derivations; same pattern challenge shipped in 0.21.0 and gate shipped in 0.20.0). `cachedEncodingTypes`, `cachedEncodingTypesKnowledgeBaseUrl`, `cachedEncodingTypesSource` module-level fields deleted; cache-check short-circuit at the top of `discoverEncodingTypes` deleted; `cleanup_storage` resets for the three fields deleted. Per `klappy://canon/principles/cache-fetches-and-parses`: the fetch layer (Module Memory → Cache API → R2, 5-minute TTL) already caches the canon file content; caching the parse product for microsecond re-derivation savings is the anti-pattern the principle names. Parse runs fresh per call; overhead is sub-millisecond on hot fetches. + +### Added + +- **New smoke regression assertions in `workers/test/canon-tool-envelope.smoke.mjs`** anchoring the D5 migration and the Bugbot phrase-subset fix: (12) stemmed inflection match — `"I'm deciding to ship the two-tier cascade"` classifies as Decision (`decid` stem degenerate-singleton matches `decided` in canon vocab); (13) stop-word phrase survival — `"we're going with option B after the review"` matches Decision via the `[go, with]` phrase having both stems present in the input set; (14) multi-type preservation — `"We must never deploy without tests because we decided this last week"` emits both `C` and `D` artifacts via the no-break path (`must`/`never` singletons for Constraint; `decid` singleton for Decision); (15) first-match preservation — untagged paragraph in a mixed batch emits exactly one artifact via the batch classifier's `break` semantic; (16) phrase-subset regression anchor — `"I need to wait until tomorrow for the review"` does NOT classify as Decision or Handoff (the pre-Bugbot-fix flat-Set implementation would have fired Decision via standalone `to` and Handoff via standalone `to`/`for`; post-fix, no phrase of either type has all its stems present in the input). Assertion (16) is the Bugbot PR #126 regression anchor and will fail against any revision where multi-word vocab is flattened back into standalone-singleton triggers. + +### Refs + +- Handoff: `klappy://odd/handoffs/2026-04-20-p1-3-4-encode-canon-parity` +- Canon basis: `klappy://canon/principles/cache-fetches-and-parses`, `klappy://canon/principles/vodka-architecture` +- Precedent: oddkit 0.21.1 (challenge's D5 + D9), 0.20.0 (gate's D5 + D9) +- Shipping gate: `klappy://canon/constraints/release-validation-gate` (binding) +- Bugbot finding dispositioned: PR #126 review `cursor[bot]` 2026-04-20T12:55:03Z (high severity, multi-word vocab flattening) — fix-forward in same PR via Cursor autofix commit `113ba11` (phrase-subset match). The in-session orchestrator proposed a stricter consecutive-subsequence variant; autofix's subset-match was accepted as the simpler design better aligned with encode's multi-type tolerance philosophy. +- Closes the canon-parity sweep — all three tools now use stemmed matching and have their in-process derivation caches removed per `cache-fetches-and-parses`. + ## [0.22.0] - 2026-04-20 ### Added diff --git a/package-lock.json b/package-lock.json index 0e0dd1a..b476280 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "oddkit", - "version": "0.22.0", + "version": "0.23.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "oddkit", - "version": "0.22.0", + "version": "0.23.0", "license": "MIT", "dependencies": { "@modelcontextprotocol/sdk": "^1.0.0", diff --git a/package.json b/package.json index d377ab3..1116fad 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "oddkit", - "version": "0.22.0", + "version": "0.23.0", "description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.", "type": "module", "bin": { diff --git a/workers/package-lock.json b/workers/package-lock.json index a7994f0..6a50a31 100644 --- a/workers/package-lock.json +++ b/workers/package-lock.json @@ -1,12 +1,12 @@ { "name": "oddkit-mcp-worker", - "version": "0.22.0", + "version": "0.23.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "oddkit-mcp-worker", - "version": "0.22.0", + "version": "0.23.0", "dependencies": { "agents": "^0.4.1", "fflate": "^0.8.2", diff --git a/workers/package.json b/workers/package.json index ecf3988..97eb87b 100644 --- a/workers/package.json +++ b/workers/package.json @@ -1,6 +1,6 @@ { "name": "oddkit-mcp-worker", - "version": "0.22.0", + "version": "0.23.0", "private": true, "type": "module", "scripts": { diff --git a/workers/src/orchestrate.ts b/workers/src/orchestrate.ts index f2ba00b..716cae6 100644 --- a/workers/src/orchestrate.ts +++ b/workers/src/orchestrate.ts @@ -56,12 +56,23 @@ export interface OddkitEnvelope { /** Internal type — handlers return this, handleUnifiedAction stamps server_time */ type ActionResult = Omit; -// Governance-driven encoding types +// Governance-driven encoding types. Trigger-word classification is stemmed +// phrase-subset matching per klappy://canon/principles/vodka-architecture +// (fit the matcher to the problem) — same D5 shape applied to challenge +// prereqs in 0.21.0 and gate prereqs in 0.20.0. triggerWords kept for +// debugging only; stemmedPhrases is the parse product the runtime evaluates +// against. Each inner array is the ordered stem sequence of a single +// trigger word or phrase; a type matches an input when ALL stems of at +// least one phrase are present in the input's stem set. This preserves +// phrase-level semantics (`committed to`, `going with`, `must not`, +// `next step`, `follow up`, `blocked by`, `turns out`) so common function +// words (`to`, `with`, `by`, `up`, `out`, `not`) do not become standalone +// match triggers on every English paragraph. interface EncodingTypeDef { letter: string; name: string; triggerWords: string[]; - triggerRegex: RegExp | null; + stemmedPhrases: string[][]; qualityCriteria: Array<{ criterion: string; check: string; gapMessage: string }>; } @@ -79,9 +90,12 @@ interface ParsedArtifact { priority_band?: string; } -let cachedEncodingTypes: EncodingTypeDef[] | null = null; -let cachedEncodingTypesKnowledgeBaseUrl: string | undefined = undefined; -let cachedEncodingTypesSource: "knowledge_base" | "minimal" = "minimal"; +// D9 / klappy://canon/principles/cache-fetches-and-parses — no module-level +// cache on the parse product. fetcher.getFile / fetcher.getIndex already cache +// the canon read (Module Memory → Cache API → R2, 5-min TTL). Re-running the +// parse loop per request is sub-millisecond derivation work, not worth the +// plumbing tax of a keyed cache. Same pattern challenge (0.21.0) and gate +// (0.20.0) already applied. // Governance-driven challenge types (E0008 — mirrors encode pattern from PR #96) interface ChallengeTypeDef { @@ -409,10 +423,6 @@ async function discoverEncodingTypes( fetcher: KnowledgeBaseFetcher, knowledgeBaseUrl?: string, ): Promise<{ types: EncodingTypeDef[]; source: "knowledge_base" | "minimal" }> { - if (cachedEncodingTypes && cachedEncodingTypesKnowledgeBaseUrl === knowledgeBaseUrl) { - return { types: cachedEncodingTypes, source: cachedEncodingTypesSource }; - } - const index = await fetcher.getIndex(knowledgeBaseUrl); const typeArticles = index.entries.filter( (entry: IndexEntry) => entry.tags?.includes("encoding-type") && entry.path.includes("encoding-types/"), @@ -437,10 +447,28 @@ async function discoverEncodingTypes( const triggerWords = triggerSection ? triggerSection[1].split(",").map((w: string) => w.trim()).filter((w: string) => w.length > 0) : []; - const triggerRegex = - triggerWords.length > 0 - ? new RegExp("\\b(" + triggerWords.map((w: string) => w.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")).join("|") + ")\\b", "i") - : null; + // D5 / klappy://canon/principles/vodka-architecture — classification is + // stemmed phrase-subset matching, not regex alternation. Each canon + // trigger word/phrase is parsed once into its ordered stem sequence; + // runtime tokenizes input once and a type matches when ALL stems of + // at least one phrase are present. Inflected forms (deciding → decid, + // realizing → realiz) match their canonical stems without canon having + // to list each inflection. Stop-word filtering is disabled (empty Set) + // on both the parse-time and runtime tokenize() calls — canon vocab + // includes stop-word-adjacent phrases (`going with`, `committed to`, + // `must not`, `turns out`, `next step`, `blocked by`, `found that`) + // and dropping them would silently break the strictly-additive + // invariant, the same failure mode P1.3.3 hit on challenge's + // `from`-in-source-named vocab. Phrase-level conjunction (all stems + // of a phrase must match) is the precision floor: without it, + // ubiquitous function words like `to`/`with`/`by`/`up`/`out`/`not` + // would become standalone triggers on every English paragraph. + // Per canon/constraints/release-validation-gate and P1.3.3 C-04. + const stemmedPhrases: string[][] = []; + for (const word of triggerWords) { + const stems = tokenize(word, new Set()); + if (stems.length > 0) stemmedPhrases.push(stems); + } const criteriaSection = content.match( /## Quality Criteria[\s\S]*?\| Criterion[\s\S]*?\|[-|\s]+\|\n([\s\S]*?)(?=\n\n|\n##|$)/, @@ -459,7 +487,7 @@ async function discoverEncodingTypes( } } - types.push({ letter, name, triggerWords, triggerRegex, qualityCriteria }); + types.push({ letter, name, triggerWords, stemmedPhrases, qualityCriteria }); } catch { continue; } @@ -495,17 +523,21 @@ async function discoverEncodingTypes( ["H", "Handoff", ["next session", "next step", "todo", "follow up", "blocked by"]], ["E", "Encode", ["encoded", "captured", "crystallized", "persisted", "artifact"]], ]; - resolved = defaults.map(([letter, name, words]) => ({ - letter, name, triggerWords: words, - triggerRegex: new RegExp("\\b(" + words.join("|") + ")\\b", "i"), - qualityCriteria: [], - })); + resolved = defaults.map(([letter, name, words]) => { + const stemmedPhrases: string[][] = []; + for (const word of words) { + const stems = tokenize(word, new Set()); + if (stems.length > 0) stemmedPhrases.push(stems); + } + return { + letter, name, triggerWords: words, + stemmedPhrases, + qualityCriteria: [], + }; + }); source = "minimal"; } - cachedEncodingTypes = resolved; - cachedEncodingTypesKnowledgeBaseUrl = knowledgeBaseUrl; - cachedEncodingTypesSource = source; return { types: resolved, source }; } @@ -1084,6 +1116,25 @@ function isPrefixedBatchInput(input: string): boolean { return paragraphs.some((p) => PREFIX_TAG_REGEX.test(p)); } +// Phrase-subset match — a phrase matches when ALL of its stems appear in the +// input stem set. Short-circuits on the first phrase that matches. The D5 +// matcher shape for encode trigger-word classification, mirroring the shape +// used by evaluatePrerequisiteCheck in the P1.3.3 challenge evaluator: +// single-stem phrases degenerate to set membership (identical to the old +// single-token behavior), while multi-stem phrases like +// `committed to` → ["committ","to"] require both stems to co-occur, so +// ubiquitous function words cannot match on their own. +function matchesStemmedPhrases(phrases: string[][], input: Set): boolean { + for (const phrase of phrases) { + let allPresent = true; + for (const stem of phrase) { + if (!input.has(stem)) { allPresent = false; break; } + } + if (allPresent) return true; + } + return false; +} + function parsePrefixedBatchInput(input: string, types: EncodingTypeDef[]): ParsedArtifact[] { const typeMap = new Map(types.map((t) => [t.letter, t.name])); const paragraphs = input.split(/\n\n+/).map((p) => p.trim()).filter((p) => p.length > 0); @@ -1118,9 +1169,16 @@ function parsePrefixedBatchInput(input: string, types: EncodingTypeDef[]): Parse // Untagged paragraph in a batch that contains tags: classify via trigger // words like parseUnstructuredInput, but emit one artifact per paragraph // (not one-per-match) to preserve the author's paragraph boundaries. + // Stemmed set intersection mirrors parseUnstructuredInput — stop-words + // disabled on tokenize() both sides per P1.3.3 C-04 (canon vocab + // includes stop-word phrases like `going with` / `must not`). let matched: EncodingTypeDef | null = null; + const inputStems = new Set(tokenize(para, new Set())); for (const t of types) { - if (t.triggerRegex && t.triggerRegex.test(para)) { matched = t; break; } + // Break on first match: this path picks one type per paragraph by + // design (paragraph boundaries are the author's). Unlike + // parseUnstructuredInput which emits one artifact per matching type. + if (matchesStemmedPhrases(t.stemmedPhrases, inputStems)) { matched = t; break; } } const pick = matched ?? types[0] ?? { letter: "D", name: "Decision" }; const first = para.split(/[.!?\n]/)[0]?.trim() || para.slice(0, 60); @@ -1157,12 +1215,19 @@ function parseUnstructuredInput(input: string, types: EncodingTypeDef[]): Parsed const artifacts: ParsedArtifact[] = []; for (const para of paragraphs) { let matched = false; + // Hoist tokenize(para) out of the per-type loop — para is constant across + // the loop, stemmedTokens differ per type. Mirrors the P1.3.3 challenge + // prereq evaluator shape. Stop-words disabled (empty Set) on both parse- + // time and runtime tokenize() calls so canon vocab like `going with`, + // `must not`, `turns out`, `found that` survives on both sides. Per + // canon/constraints/release-validation-gate and P1.3.3 Bug #1 precedent. + const inputStems = new Set(tokenize(para, new Set())); for (const t of types) { // DESIGN: no break — a paragraph can match multiple types intentionally. // "We must never deploy without tests" is both Decision and Constraint. // Multi-typing at the server level mirrors what the model would do with // separate TSV rows. Do not add a break here. - if (t.triggerRegex && t.triggerRegex.test(para)) { + if (matchesStemmedPhrases(t.stemmedPhrases, inputStems)) { const first = para.split(/[.!?\n]/)[0]?.trim() || para.slice(0, 60); const title = first.split(/\s+/).length <= 12 ? first : first.split(/\s+/).slice(0, 8).join(" ") + "..."; artifacts.push({ type: t.letter, typeName: t.name, fields: [t.letter, title, para.trim()], title, body: para.trim() }); @@ -1518,9 +1583,10 @@ async function runCleanupStorage( // Also clear the in-memory BM25 index cachedBM25Index = null; cachedBM25Entries = null; - cachedEncodingTypes = null; - cachedEncodingTypesKnowledgeBaseUrl = undefined; - cachedEncodingTypesSource = "minimal"; + // cachedEncodingTypes removed in 0.23.0 per cache-fetches-and-parses — + // encode's parse product is no longer cached in-process. The fetch tier + // (Cache API, R2) already handles canon file caching; the derivation is + // sub-millisecond. No reset needed here. // E0008 — governance-driven challenge caches (mirror PR #96 fix) cachedChallengeTypes = null; cachedChallengeTypesKnowledgeBaseUrl = undefined; diff --git a/workers/test/canon-tool-envelope.smoke.mjs b/workers/test/canon-tool-envelope.smoke.mjs index 89ce465..f46c828 100644 --- a/workers/test/canon-tool-envelope.smoke.mjs +++ b/workers/test/canon-tool-envelope.smoke.mjs @@ -224,6 +224,98 @@ async function run() { `got: ${encodeOverride.result?.governance_source}`, ); + // P1.3.4 D5 regression anchors — stemmed set intersection replaces regex + // alternation on the encode classifier. These assertions exist because + // the pre-refactor literal regex path could not match inflections of + // canon vocab (`deciding` does not match `decided` under `\bdecided\b`), + // and the P1.3.3 Bug #1 precedent showed that tokenize()'s default + // stop-word filter silently breaks multi-word canon vocab (`going with`, + // `committed to`, `must not`). The assertions are numbered (12)–(15) to + // continue the sequence P1.3.3 established at (10)/(11). + console.log(`\n─── oddkit_encode: (12) stemmed inflection match (D5 landed) ───`); + const encodeInflection = await callTool("oddkit_encode", { + input: "I'm deciding to ship the two-tier cascade", + }); + expectFullEnvelope("oddkit_encode (inflection match)", encodeInflection); + const inflectionTypes = (encodeInflection.result?.artifacts ?? []).map((a) => a.type); + ok( + "oddkit_encode: (12) `deciding` (inflection of `decided`) classifies as Decision via stem intersection", + inflectionTypes.includes("D"), + `got artifact types: ${inflectionTypes.join(",")}`, + ); + + console.log(`\n─── oddkit_encode: (13) stop-word canon vocab survives tokenize (P1.3.3 C-04 ported) ───`); + const encodeStopWord = await callTool("oddkit_encode", { + input: "we're going with option B after the review", + }); + expectFullEnvelope("oddkit_encode (stop-word survival)", encodeStopWord); + const stopWordTypes = (encodeStopWord.result?.artifacts ?? []).map((a) => a.type); + ok( + "oddkit_encode: (13) `going with` (multi-word canon vocab containing stop-word `with`) matches Decision", + stopWordTypes.includes("D"), + `got artifact types: ${stopWordTypes.join(",")}`, + ); + + console.log(`\n─── oddkit_encode: (14) multi-type no-break preservation (L1161 design comment) ───`); + const encodeMultiType = await callTool("oddkit_encode", { + input: "We must never deploy without tests because we decided this last week", + }); + expectFullEnvelope("oddkit_encode (multi-type)", encodeMultiType); + const multiTypeTypes = (encodeMultiType.result?.artifacts ?? []).map((a) => a.type); + ok( + "oddkit_encode: (14) paragraph matching both Constraint and Decision emits both artifact types (no-break path)", + multiTypeTypes.includes("C") && multiTypeTypes.includes("D"), + `got artifact types: ${multiTypeTypes.join(",")}`, + ); + + console.log(`\n─── oddkit_encode: (15) first-match preservation in batch-untagged path ───`); + const encodeBatchUntagged = await callTool("oddkit_encode", { + input: "[D] explicit decision tag on first paragraph\n\nwe must always write tests before we decided on TDD", + }); + expectFullEnvelope("oddkit_encode (batch first-match)", encodeBatchUntagged); + const batchArtifacts = encodeBatchUntagged.result?.artifacts ?? []; + ok( + "oddkit_encode: (15) batch with tagged + untagged paragraphs emits exactly 2 artifacts (first-match path picks one type per untagged paragraph)", + batchArtifacts.length === 2, + `got length: ${batchArtifacts.length}; types: ${batchArtifacts.map((a) => a.type).join(",")}`, + ); + + console.log(`\n─── oddkit_encode: (16) phrase-subset regression anchor (Bugbot PR #126) ───`); + // Pre-Bugbot-fix the matcher used a flat stemmedTokens: Set where + // multi-word canon phrases like `committed to` (Decision) and `next step` + // (Handoff) were flattened into individual stems and each was added as a + // standalone singleton. Stop-word filtering is disabled by design (P1.3.3 + // C-04), so function-word stems like `to`, `with`, `by`, `up`, `out` + // became universal match triggers — virtually every English paragraph + // would fire Decision and Handoff and more. Autofix commit 113ba11 + // adopted a phrase-subset match: a phrase matches only when ALL of its + // stems appear in the input stem set. Single-stem phrases degenerate to + // set membership (inflection matching still works); multi-stem phrases + // require conjunction. The input below contains stems `need`, `to`, + // `wait`, `until`, `tomorrow`, `for`, `the`, `review` — no Decision + // phrase has ALL its stems present (`decid` / `decis` / `chose` / `choos` + // / `select` all absent; `[committ, to]` fails on `committ`; `[go, with]` + // fails on both), and no Handoff phrase has ALL its stems present + // (`[next, session]` / `[next, step]` / `[follow, up]` / `[block, by]` + // / `[wait, on]` all fail on their second stem; `todo` / `continu` / + // `remain` / `handoff` singletons all absent). A revision that + // re-flattens the matcher would spuriously fire D and H on this input. + const encodePhraseSubset = await callTool("oddkit_encode", { + input: "I need to wait until tomorrow for the review", + }); + expectFullEnvelope("oddkit_encode (phrase-subset regression)", encodePhraseSubset); + const phraseSubsetTypes = (encodePhraseSubset.result?.artifacts ?? []).map((a) => a.type); + ok( + "oddkit_encode: (16) `to` inside phrasal canon vocab does NOT fire Decision as a standalone trigger", + !phraseSubsetTypes.includes("D"), + `got artifact types: ${phraseSubsetTypes.join(",")}`, + ); + ok( + "oddkit_encode: (16) `to` inside phrasal canon vocab does NOT fire Handoff as a standalone trigger", + !phraseSubsetTypes.includes("H"), + `got artifact types: ${phraseSubsetTypes.join(",")}`, + ); + // Tool 5: oddkit_challenge — canon-driven, four governance surfaces. // Full envelope + governance_source + governance_uris (plural, per PRD D4 — // shape diverges from encode by design because challenge reads four peer