Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.21.0] - 2026-04-20

### Changed

- **`oddkit_challenge` prerequisite evaluation migrated from regex-per-check to stemmed set intersection** (per PRD D5 from P1.3.2 — split-by-fit). Each prereq now evaluates via `Array.from(prereq.stemmedTokens).some(s => inputStems.has(s))` over a Set computed once at canon-fetch time, with `tokenize(input)` hoisted out of the per-prereq loop. **Strictly additive**: every input that matched the prior regex still matches, plus stemmed variations now do too — `problems identified` satisfies `evidence-cited` (stems `problem` + `identif`), `considered alternatives` satisfies `alternatives-considered` (stems `consid` + `altern`), `acknowledged the risks` satisfies `risk-acknowledged` (stems `acknowledg` + `risk`). The four structural side-tests (URL / numeric / proper-noun / citation) preserved verbatim from the pre-refactor evaluator because they cover cases the keyword vocabulary cannot — `source-named` inputs like `"here's the URL: https://..."` have no stemmed overlap with the vocab `per / according to / from / source: / who said / where i read` but the URL structural test catches them. The conservative no-keyword-no-flag fallback (pass on `input.trim().length >= 20`) also preserved. Same matcher gate shipped in 0.20.0.

- **`oddkit_challenge` type-detection BM25 index cache removed** (per PRD D9 from P1.3.2 — don't cache microsecond derivations). `cachedChallengeTypeIndex` and `cachedChallengeTypeIndexKnowledgeBaseUrl` module-level fields deleted; `getOrBuildChallengeTypeIndex` function deleted; `cleanup_storage` resets deleted; the call site in `runChallengeAction` rebuilds the BM25 index inline per request via `buildBM25Index(types.map(t => ({id: t.slug, text: t.detectionText})), vocab.stopWords)`. Same pattern gate shipped in 0.20.0. Removes module-level cache state, URL-keyed invalidation logic, cleanup_storage wiring, and drift risk when source data changes — the four hidden costs enumerated in the new canon principle. Parse-product caches (`cachedChallengeTypes`, `cachedBasePrerequisites`, `cachedNormativeVocabulary`, `cachedStakesCalibration`) remain — those are actual parse work.

### Added

- **New canon principle:** `klappy://canon/principles/cache-fetches-and-parses` (klappy.dev#125, merged `3726073`). Graduates the "cache fetches and parses, not microsecond derivations" pattern to canon as a tier-2 principle after its third deciding-argument recurrence across the tool sweep: 0.18.0 encode parse-product caching (implicit), 0.20.0 gate D9 (first explicit), 0.21.0 challenge `cachedChallengeTypeIndex` removal (second explicit). Names the two halves of the principle, enumerates the four-cost plumbing tax, and anchors the threshold to current corpus sizes (6–9 challenge types, 4 gate transitions, 8 base prereqs).

- **New shared interface `PrereqMatchVocab`** in `workers/src/orchestrate.ts` capturing `stemmedTokens: Set<string>` plus four boolean structural-test flags (`hasURLCheck`, `hasNumericCheck`, `hasProperNounCheck`, `hasCitationCheck`). Mixed into both `BasePrerequisite` and the inline type on `ChallengeTypeDef.prerequisiteOverlays[]` to keep per-type and base-prereq structs in sync. Populated by the new `parseCheckColumn(check: string)` helper at canon-fetch time in both `discoverChallengeTypes` and `fetchBasePrerequisites`.

### Known limitations

- Same as 0.20.0 — Porter-style stemmer does not reverse consonant gemination (`shipping` → `shipp`, not `ship`); affected vocabulary is fixed at canon tier per `klappy.dev#122` precedent. `getIndex` strict-mode (`skipBaselineFallback`) still pending across encode/challenge/gate (carry-forward O-open P2).

## [0.20.0] - 2026-04-20

### Added
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "oddkit",
"version": "0.20.0",
"version": "0.21.0",
"description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.",
"type": "module",
"bin": {
Expand Down
4 changes: 2 additions & 2 deletions workers/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion workers/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "oddkit-mcp-worker",
"version": "0.20.0",
"version": "0.21.0",
"private": true,
"type": "module",
"scripts": {
Expand Down
194 changes: 129 additions & 65 deletions workers/src/orchestrate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,13 @@ interface ChallengeTypeDef {
triggerWords: string[];
detectionText: string; // triggerWords + blockquote, fed to BM25 indexer
questions: Array<{ question: string; tier: string }>;
prerequisiteOverlays: Array<{ prerequisite: string; check: string; gapMessage: string }>;
prerequisiteOverlays: Array<
{
prerequisite: string;
check: string;
gapMessage: string;
} & PrereqMatchVocab
>;
reframings: string[];
fallback: boolean;
}
Expand All @@ -100,6 +106,26 @@ interface BasePrerequisite {
prerequisite: string;
check: string;
gapMessage: string;
// Per PRD D2 (P1.3.3): parse products populated at canon-fetch time.
// stemmedTokens is the stemmed form of quoted keywords in `check`;
// the four has*Check booleans flag structural-test hints detected in
// the check description. See parseCheckColumn below. These are parse
// products per klappy://canon/principles/cache-fetches-and-parses.
stemmedTokens: Set<string>;
hasURLCheck: boolean;
hasNumericCheck: boolean;
hasProperNounCheck: boolean;
hasCitationCheck: boolean;
}

/** Shared shape for the runtime match vocabulary attached to challenge
* prereqs. Keeps the per-type and base-prereq structs in sync (DRY). */
interface PrereqMatchVocab {
stemmedTokens: Set<string>;
hasURLCheck: boolean;
hasNumericCheck: boolean;
hasProperNounCheck: boolean;
hasCitationCheck: boolean;
}

// Gate governance types — P1.3.2 (0.20.0). Consumed by runGateAction via
Expand Down Expand Up @@ -160,8 +186,12 @@ interface StakesCalibration {
let cachedChallengeTypes: ChallengeTypeDef[] | null = null;
let cachedChallengeTypesKnowledgeBaseUrl: string | undefined = undefined;
let cachedChallengeTypesSource: "knowledge_base" | "minimal" = "minimal";
let cachedChallengeTypeIndex: BM25Index | null = null;
let cachedChallengeTypeIndexKnowledgeBaseUrl: string | undefined = undefined;
// Note: challenge's BM25 type-detection index is NOT cached — per
// klappy://canon/principles/cache-fetches-and-parses, rebuilding a BM25
// index over challenge's 6–9-type corpus is a microsecond derivation and
// the plumbing tax (URL-keyed invalidation + cleanup_storage wiring +
// drift risk) costs more than the rebuild. Inline-built at the call site
// in runChallengeAction, same pattern as gate's transition index (0.20.0).
let cachedBasePrerequisites: BasePrerequisite[] | null = null;
let cachedBasePrerequisitesKnowledgeBaseUrl: string | undefined = undefined;
let cachedBasePrerequisitesSource: "knowledge_base" | "minimal" = "minimal";
Expand Down Expand Up @@ -550,15 +580,19 @@ async function discoverChallengeTypes(
}
}

// Prerequisite Overlays table — rows of (Prerequisite, Check, Gap message)
// Prerequisite Overlays table — rows of (Prerequisite, Check, Gap message).
// Per P1.3.3 PRD D2: each row is enriched with PrereqMatchVocab (stemmed
// tokens + structural-test flags) at parse time; see parseCheckColumn.
const prereqSection = content.match(
/## Prerequisite Overlays[\s\S]*?\| Prerequisite[\s\S]*?\|[-|\s]+\|\n([\s\S]*?)(?=\n\n|\n##|$)/,
);
const prerequisiteOverlays: Array<{
prerequisite: string;
check: string;
gapMessage: string;
}> = [];
const prerequisiteOverlays: Array<
{
prerequisite: string;
check: string;
gapMessage: string;
} & PrereqMatchVocab
> = [];
if (prereqSection) {
for (const row of prereqSection[1].split("\n").filter((r: string) => r.includes("|"))) {
const cols = parseTableRow(row);
Expand All @@ -569,6 +603,7 @@ async function discoverChallengeTypes(
prerequisite: cols[0],
check: cols[1],
gapMessage: gap,
...parseCheckColumn(cols[1]),
});
}
}
Expand Down Expand Up @@ -620,37 +655,14 @@ async function discoverChallengeTypes(
// rather than inventing a built-in fallback registry — see PRD D7).
const source: "knowledge_base" | "minimal" = types.length > 0 ? "knowledge_base" : "minimal";
cachedChallengeTypesSource = source;
// Index build deferred — needs vocab.stopWords from fetchNormativeVocabulary,
// assembled lazily by getOrBuildChallengeTypeIndex below. Both types and the
// index are deterministic functions of knowledgeBaseUrl, so caching by knowledgeBaseUrl
// remains safe.
// Note: the BM25 type-detection index over per-type detection text is
// NOT cached — it's a microsecond derivation over already-cached parse
// products, rebuilt inline per request in runChallengeAction. See
// klappy://canon/principles/cache-fetches-and-parses for the principle
// and the plumbing-tax argument.
return { types, source };
}

/** Lazily build (or return cached) per-knowledgeBaseUrl BM25 index over the per-type
* detection text, using governance-sourced stop words from normative-vocabulary.md.
* The cache is keyed on knowledgeBaseUrl so different canon sources do not contaminate
* each other's indexes. */
function getOrBuildChallengeTypeIndex(
types: ChallengeTypeDef[],
vocab: NormativeVocabulary,
knowledgeBaseUrl?: string,
): BM25Index {
if (cachedChallengeTypeIndex && cachedChallengeTypeIndexKnowledgeBaseUrl === knowledgeBaseUrl) {
return cachedChallengeTypeIndex;
}
// Build BM25 index over per-type detection text (triggers + blockquote).
// Stemming handles morphology; IDF weights distinctive trigger terms above filler.
// vocab.stopWords comes from `## Detection Noise` in normative-vocabulary.md;
// it deliberately preserves modal verbs and negation as signal. An empty
// Set means no filtering (governance opted into IDF-only scoring).
const bm25Docs = types.map((t) => ({ id: t.slug, text: t.detectionText }));
const bm25Index = buildBM25Index(bm25Docs, vocab.stopWords);
cachedChallengeTypeIndex = bm25Index;
cachedChallengeTypeIndexKnowledgeBaseUrl = knowledgeBaseUrl;
return bm25Index;
}

// Gate minimal-tier vocabulary — P1.3.2 D6. Used when canon is unreachable
// or missing required sections. Vocabulary mirrors the pre-0.20.0 hardcoded
// detectTransition regexes (L306–L324 pre-refactor) and checkPatterns map
Expand Down Expand Up @@ -847,6 +859,7 @@ async function fetchBasePrerequisites(
prerequisite: cols[0],
check: cols[1],
gapMessage: cols[2].replace(/^"|"$/g, ""),
...parseCheckColumn(cols[1]),
});
}
}
Expand Down Expand Up @@ -1515,8 +1528,6 @@ async function runCleanupStorage(
cachedChallengeTypes = null;
cachedChallengeTypesKnowledgeBaseUrl = undefined;
cachedChallengeTypesSource = "minimal";
cachedChallengeTypeIndex = null;
cachedChallengeTypeIndexKnowledgeBaseUrl = undefined;
cachedBasePrerequisites = null;
cachedBasePrerequisitesKnowledgeBaseUrl = undefined;
cachedBasePrerequisitesSource = "minimal";
Expand Down Expand Up @@ -2023,9 +2034,15 @@ async function runChallengeAction(
// Detection runs BEFORE the voice-dump suppression check so the SUPPRESSED
// response can still expose `governance` — the model sees what would have
// fired without surfacing the pressure-test questions.
// Build BM25 type-detection index inline per request (not cached) —
// per klappy://canon/principles/cache-fetches-and-parses, a BM25 index
// over challenge's 6–9-type corpus is a microsecond derivation and the
// plumbing tax is not worth the rebuild cost. Parse products (types,
// vocab) are cached upstream; the index is just a reshape.
// Stop words come from `## Detection Noise` in normative-vocabulary.md
// (governance), not a hardcoded constant in this file.
const typeIndex = getOrBuildChallengeTypeIndex(types, vocab, knowledgeBaseUrl);
const bm25Docs = types.map((t) => ({ id: t.slug, text: t.detectionText }));
const typeIndex = buildBM25Index(bm25Docs, vocab.stopWords);
const matchedTypes: ChallengeTypeDef[] = [];
const hits = searchBM25(typeIndex, input, types.length);
const typeBySlug = new Map(types.map((t) => [t.slug, t]));
Expand Down Expand Up @@ -2124,9 +2141,14 @@ async function runChallengeAction(
}

const strictness = modeConfig?.prerequisiteStrictness?.toLowerCase() || "required";
// Hoist tokenize(input) out of the per-prereq loop — input is constant across
// the loop, stemmedTokens differ per prereq. Per PRD D3 (P1.3.3): stemmed
// set intersection at runtime, structural tests preserved, no regex compile
// per check. This is the fit-to-problem matcher per D5.
const inputStems = new Set(tokenize(input));
const missing: string[] = [];
for (const p of prereqMap.values()) {
const passed = evaluatePrerequisiteCheck(input, p.check);
const passed = evaluatePrerequisiteCheck(inputStems, input, p);
if (!passed) {
// source-named check is escalated to blocking when strictness says so
if (strictness.includes("optional") && !p.prerequisite.includes("source-named")) {
Expand Down Expand Up @@ -2288,36 +2310,78 @@ async function runChallengeAction(
};
}

// Governance-driven check evaluator — interprets natural-language `check` strings
// from ## Prerequisite Overlays tables. Uses cheap heuristics: substring matching
// against quoted keywords in the check description, plus a few special-case patterns.
function evaluatePrerequisiteCheck(input: string, check: string): boolean {
// Extract quoted keywords like "evidence", "observed", "alternative"
const quotedKeywords: string[] = [];
// Parse-time helper: extract quoted keywords from a `check` description and
// detect the four structural-test hints. Called at canon-fetch time from
// both discoverChallengeTypes (per-type prereqs) and fetchBasePrerequisites
// (universal prereqs). Produces a PrereqMatchVocab that the runtime consumes
// via evaluatePrerequisiteCheck. Per klappy://canon/principles/cache-fetches-
// and-parses, this is a parse product: the Set is the stemmed form of the
// canon's vocabulary and is cached alongside the rest of the prereq struct.
function parseCheckColumn(check: string): PrereqMatchVocab {
const quotedRegex = /"([^"]+)"/g;
const stemmedTokens = new Set<string>();
let m: RegExpExecArray | null;
while ((m = quotedRegex.exec(check)) !== null) {
quotedKeywords.push(m[1]);
}

if (quotedKeywords.length > 0) {
// Pass if ANY quoted keyword appears in input (case-insensitive, word-boundary where possible)
for (const kw of quotedKeywords) {
const escaped = kw.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
// Use word-boundary for single words, substring for phrases
const pattern = /^\w+$/.test(kw) ? new RegExp("\\b" + escaped + "\\b", "i") : new RegExp(escaped, "i");
if (pattern.test(input)) return true;
// Tokenize each quoted keyword or phrase — multi-word phrases like
// "according to" contribute multiple stems; stop-words are dropped
// by tokenize(). This preserves semantic coverage while normalizing
// morphology (problems → problem, considered → consid, etc.).
for (const stem of tokenize(m[1])) {
stemmedTokens.add(stem);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop-word keywords silently dropped breaking additive invariant

Medium Severity

parseCheckColumn calls tokenize(m[1]) on each quoted keyword, which applies the default STOP_WORDS filter. Quoted canon keywords that happen to be stop words — like "from" (explicitly listed in the comment at line 2351 as part of the source-named vocab) — are silently dropped from stemmedTokens. The same filtering on tokenize(input) at the call site removes them from inputStems too. The old regex evaluator matched "from" via new RegExp("\\bfrom\\b", "i") against the raw input, so an input like "I got this from my colleague" would pass source-named before but fails now. This breaks the "strictly additive" invariant claimed by the PR and CHANGELOG.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 33ca5bf. Configure here.

}
// Special-case check descriptions that mention URLs, citations, numeric markers
if (/\bURL\b/i.test(check) && /https?:\/\//.test(input)) return true;
if (/numeric/i.test(check) && /\d/.test(input)) return true;
if (/proper-?noun/i.test(check) && /\b[A-Z][a-z]+\s+[A-Z]/.test(input)) return true;
if (/citation/i.test(check) && /\[\d+\]|\bper\s+[A-Z]|\baccording to\b/i.test(input)) return true;
return false;
}
return {
stemmedTokens,
hasURLCheck: /\bURL\b/i.test(check),
hasNumericCheck: /\bnumeric\b/i.test(check),
hasProperNounCheck: /\bproper-?noun\b/i.test(check),
hasCitationCheck: /\bcitation\b/i.test(check),
};
}

// No quoted keywords: conservative fallback — passes if input is non-trivial
return input.trim().length >= 20;
// Governance-driven check evaluator — runtime pairing for parseCheckColumn.
// Per PRD D5 (split-by-fit): prereq evaluation is independent gap-or-not per
// prereq, not ranked. Stemmed set intersection is the fit-to-problem matcher
// and catches morphological variations that the prior regex cascade missed
// (e.g. "problems identified" now stems to `problem` + `identif` and matches
// a prereq whose vocab includes `problem`). Structural side-tests (URL,
// numeric, proper-noun, citation) preserved from the pre-refactor evaluator
// because they cover cases the keyword vocabulary can't — `source-named`
// inputs like "here's the URL: https://..." have no stemmed overlap with the
// vocab `per / according to / from / source: / who said / where i read` but
// the URL structural test catches them. Strictly additive over the prior
// regex: every input that matched pre-refactor still matches post-refactor.
function evaluatePrerequisiteCheck(
inputStems: Set<string>,
rawInput: string,
prereq: PrereqMatchVocab,
): boolean {
// Token match — stemmed set intersection.
for (const s of prereq.stemmedTokens) {
if (inputStems.has(s)) return true;
}
// Structural tests — preserved from pre-refactor evaluator. Check against
// the raw input because these patterns are inherently case- and shape-
// sensitive (URLs, proper-noun capitalization, bracketed citations).
if (prereq.hasURLCheck && /https?:\/\//.test(rawInput)) return true;
if (prereq.hasNumericCheck && /\d/.test(rawInput)) return true;
if (prereq.hasProperNounCheck && /\b[A-Z][a-z]+\s+[A-Z]/.test(rawInput)) return true;
if (prereq.hasCitationCheck && /\[\d+\]|\bper\s+[A-Z]|\baccording to\b/i.test(rawInput)) {
return true;
}
// Conservative fallback: prereqs whose check description had NO quoted
// keywords AND NO structural hints pass on any non-trivial input. This
// preserves the pre-refactor fallback behavior (`input.trim().length >= 20`).
if (
prereq.stemmedTokens.size === 0 &&
!prereq.hasURLCheck &&
!prereq.hasNumericCheck &&
!prereq.hasProperNounCheck &&
!prereq.hasCitationCheck
) {
return rawInput.trim().length >= 20;
}
return false;
}

async function runGateAction(
Expand Down
Loading
Loading