diff --git a/docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17.md b/docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17.md new file mode 100644 index 0000000..01f7f7b --- /dev/null +++ b/docs/oddkit/audit/governance-anti-pattern-sweep-2026-04-17.md @@ -0,0 +1,117 @@ +--- +title: Governance Anti-Pattern Sweep — All oddkit Tools +date: 2026-04-17 +audience: maintainer +tier: 2 +stability: stable +voice: technical +status: active +governs: workers/src/orchestrate.ts, workers/src/index.ts, src/core/tool-registry.js +tags: ["audit", "vodka-architecture", "refactor", "governance"] +--- + +# Governance Anti-Pattern Sweep — All oddkit Tools + +## Summary + +Following PR #100's voice-dump suppression bug — where canon-driven detection worked internally but the public MCP schema rejected 6 of 9 modes for 1h 39m of production breakage — this audit inspects every oddkit tool for the same Vodka anti-pattern: **canon defines the vocabulary, but code hardcodes the interpretation**. + +Five of eleven tools carry the anti-pattern. Two are SEVERE and silently broken (`validate` ignores its own canon). Two are PARTIAL (`encode`, `preflight`). One cross-cutting issue (mode enum quadruplication) is already named in PR #102's commit message as a follow-up. + +Audit method: direct inspection of `workers/src/orchestrate.ts` (2338 lines) and `workers/src/index.ts` (872 lines), cross-referenced against H2 of `odd/ledger/journal/2026-04-17-pr100-rage-quit-handoff.md`. + +## Findings by tool + +### SEVERE — same anti-pattern class as PR #100 + +#### `orient` + +| # | Issue | Location | What canon should define | +|---|-------|----------|--------------------------| +| 1 | `MODE_SIGNALS` — 12 hardcoded English regex defining what counts as exploration/planning/execution mode | `orchestrate.ts:279` | `odd/orient/mode-signals.md` (vocabulary per mode) | +| 2 | Per-mode questions — three hardcoded English question triplets returned to caller | `orchestrate.ts:1490` | `odd/orient/questions-by-mode.md` | +| 3 | "Proactive posture" prose — 70-word paragraph baked as string literal | `orchestrate.ts:1528` | `canon/values/proactive-posture.md` (canonical text fetched at runtime) | +| 4 | Assumption-marker regex — `is\|are\|will\|should\|must\|always\|never\|obviously\|clearly` | `orchestrate.ts:1480` | `odd/orient/assumption-markers.md` (vocabulary doc) | + +The "Proactive posture" prose is the most visible Vodka violation in the codebase. It is the exact text returned by `oddkit_orient` to every caller. Canon updates do not reach users until the worker is redeployed. + +#### `gate` + +| # | Issue | Location | What canon should define | +|---|-------|----------|--------------------------| +| 1 | `detectTransition` — six hardcoded English regex mapping phrases to transition pairs | `orchestrate.ts:315` | `odd/gate/transition-signals.md` | +| 2 | Per-transition prereqs — five hardcoded transition tuples with hardcoded English prereq descriptions | `orchestrate.ts:1916` | `odd/gate/prerequisites-by-transition.md` | +| 3 | `checkPatterns` — eight hardcoded regex per prereq ID; new canon prereq → silent failure unless code updated | `orchestrate.ts:1956` | Same canon doc, with `evidence_pattern` field per prereq | + +#### `validate` + +| # | Issue | Location | What canon should define | +|---|-------|----------|--------------------------| +| 1 | `isFinalization` — hardcoded English `commit\|pr\|merge\|ship\|deploy\|release\|publish\|finalize\|submit\|deliver` | `orchestrate.ts:1186` | `canon/definition-of-done.md` (finalization markers) | +| 2 | Hardcoded journal/changelog/version evidence checks | `orchestrate.ts:1188-1194` | Same canon doc (evidence requirements) | +| 3 | **Validate gates "done" but does not read `definition-of-done.md` at all.** Preflight surfaces it; validate ignores it. The two tools have inconsistent definitions of "done." | structural | Validate must read the same doc preflight surfaces | + +This is the most surprising finding. `validate`'s entire purpose is to gate completion claims, yet it never consults the canonical definition of done. Refactor priority is high not because the code is dense but because the contract is silently broken. + +### PARTIAL — discovery is canon-driven but interpreter is hardcoded + +#### `encode` + +| # | Issue | Location | What canon should define | +|---|-------|----------|--------------------------| +| 1 | `discoverEncodingTypes` correctly reads `encoding-type`-tagged articles, parses identity/trigger-words/quality-criteria tables | `orchestrate.ts:336` | ✓ Already canon-driven | +| 2 | `scoreArtifactQuality` treats canon-defined `check` strings as opaque text and hardcodes English keyword matching: `ck.includes("non-empty")`, `ck.includes("number")`, `/must\|must not\|never\|always\|shall/i` | `orchestrate.ts:855` | Quality criteria need a structured grammar canon can declare, not freeform strings the worker keyword-matches | +| 3 | `isStructuredInput` hardcodes the TSV format that `odd/encoding-types/serialization-format` already declares | `orchestrate.ts:757` | Format canon should be the source the parser reads | +| 4 | Default fallback OLDC+H trigger words — acceptable safety fallback, but could equally be a `baseline/encoding-types/` canon directory | `orchestrate.ts:393-405` | Optional | + +The scoring interpreter (#2) is the same bug shape as PR #100's mode enum: governance defines vocabulary, code hardcodes interpretation. New criteria added in canon are silently scored as the generic fallback. + +**Encoding-of-this-audit demonstrates the bug.** When this audit was first encoded via `oddkit_encode`, prefixed `L:`/`O:`/`D:`/`H:` markers were ignored (input wasn't TSV), and `parseUnstructuredInput` typed almost every paragraph as "Constraint" because the audit text contains "must" and "constraint" throughout. The matching is positional and vocabulary-driven, not semantic. + +#### `preflight` + +| # | Issue | Location | What canon should define | +|---|-------|----------|--------------------------| +| 1 | Hardcoded "Before claiming done" tail — three English bullets ("Provide visual proof for UI changes", "Include test output for logic changes", "Reference any decisions made") | `orchestrate.ts:1389` | `canon/definition-of-done.md` (the same doc validate should read) | + +Small refactor, low risk, naturally bundled with the validate refactor since both should read `definition-of-done.md`. + +### CROSS-CUTTING — mode enum quadruplication + +The 9-mode vocabulary is now declared in four places: + +1. `workers/src/index.ts:170` — unified `oddkit` tool schema +2. `workers/src/index.ts:235` — dedicated `oddkit_challenge` tool schema +3. `src/core/tool-registry.js` — local registry (parallel; fixed in PR #104) +4. `MODE_SIGNALS` in `orchestrate.ts:279` — only knows the 3 epistemic modes; does not acknowledge the 6 writing-lifecycle modes + +Canon source of truth: `odd/challenge/stakes-calibration`. Klappy named this as the next refactor target in the PR #102 commit message: *"drop the enum entirely and let canon be the validator. The runtime already validates against the calibration table at fetchStakesCalibration time — having the schema also enforce vocabulary is the same Vodka anti-pattern shape that PR #100 fixed for stop words."* + +### CLEAN — no anti-pattern + +`challenge` (recently refactored, gold standard), `search`, `get`, `catalog`, `version`, `time`, `cleanup_storage`, `telemetry_public`. + +(Earlier classification of `telemetry_policy` as CLEAN was wrong — it had a hardcoded header dictionary next to the canon-fetched policy prose. Reclassified to LOW severity and selected as the canary refactor; see status below.) + +## Refactor priority + +Revised during planning after the canary was selected and the `core-governance-baseline` contract was drafted. The sequence reflects lessons-first-smallest ordering, not raw severity: + +0. **✅ CANARY: `telemetry_policy`** — smallest blast radius; proved the three-tier contract and refactor template. **Shipped to prod 2026-04-18 via oddkit#106 + oddkit#107.** Live smoke confirms `governance_source: "canon"` with 8/8 canon-sourced descriptions. Canon extension to add the Description column shipped via klappy.dev#102. +1. **`validate` + `preflight` (bundled)** — next. Requires writing `canon/constraints/definition-of-done.md` first (currently referenced by user-facing docs but does not exist in the repo). Fixes validate's silently-broken "done" contract. +2. **Mode-enum collapse** — cross-cutting; single source of truth for the 9-mode vocabulary. Already named by Klappy in PR #102 commit. +3. **`orient`** — three issues; "Proactive posture" prose is the headline embarrassment. +4. **`gate`** — three issues; mirrors orient pattern. +5. **`encode` quality interpreter** — same bug class as PR #100; subtle and silent. + +## Constraints for future refactors + +- **Public-contract verification is mandatory.** Every refactor that touches a tool must include a smoke test that invokes the public MCP API with a canon change loaded but no worker redeploy. Internal tests passing is the exact failure mode that caused PR #100. + +- **Vocabulary sweeps are non-optional.** Any refactor that touches mode/transition/claim-type vocabulary must verify all four declaration sites (`workers/src/index.ts` ×2, `src/core/tool-registry.js`, `orchestrate.ts`) agree, OR collapse them to a single source of truth. + +- **`definition-of-done.md` is load-bearing.** Both `preflight` and `validate` should read it. Inconsistency between them is a contract bug. + +## Handoff + +This audit is ready to inform planning. Recommend a separate session-by-tool refactor cadence rather than a single megabranch — the PR #100 sprawl (5 PRs for one feature) is the exact failure mode larger scope would amplify. diff --git a/workers/src/index.ts b/workers/src/index.ts index 999e3da..b2a19ea 100644 --- a/workers/src/index.ts +++ b/workers/src/index.ts @@ -217,7 +217,7 @@ Use when: "voice-dump", "drafting", "peer-review-ready", "canon-tier-2", "canon-tier-1", "published-essay", ]).optional().describe("Optional mode hint. Epistemic modes (exploration/planning/execution) or writing-lifecycle modes (voice-dump/drafting/peer-review-ready/canon-tier-2/canon-tier-1/published-essay). Sourced from odd/challenge/stakes-calibration."), - canon_url: z.string().optional().describe("Optional GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier rather than silently substituting from the default knowledge base."), include_metadata: z.boolean().optional().describe("When true, search/get responses include a metadata object with full parsed frontmatter. Default: false."), section: z.string().optional().describe("For action='get': extract only the named ## section from the document. Returns section content or available sections if not found."), sort_by: z.enum(["date", "path"]).optional().describe("For action='catalog': sort articles. 'date' returns newest first (requires frontmatter). 'path' returns all docs alphabetically, including undated."), @@ -238,7 +238,7 @@ Use when: input: args.input, context: args.context, mode: args.mode, - canon_url: args.canon_url, + canon_url: args.knowledge_base_url, include_metadata: args.include_metadata, section: args.section, sort_by: args.sort_by, @@ -271,7 +271,7 @@ Use when: action: "orient", schema: { input: z.string().describe("A goal, idea, or situation description to orient against."), - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), }, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true }, }, @@ -286,7 +286,7 @@ Use when: "voice-dump", "drafting", "peer-review-ready", "canon-tier-2", "canon-tier-1", "published-essay", ]).optional().describe("Mode for proportional challenge. Epistemic (exploration/planning/execution) or writing-lifecycle (voice-dump/drafting/peer-review-ready/canon-tier-2/canon-tier-1/published-essay). voice-dump suppresses all challenge output."), - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), }, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true }, }, @@ -297,7 +297,7 @@ Use when: schema: { input: z.string().describe("The proposed transition (e.g., 'ready to build', 'moving to planning')."), context: z.string().optional().describe("Optional context about what's been decided so far."), - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), }, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true }, }, @@ -308,7 +308,7 @@ Use when: schema: { input: z.string().describe("A decision, insight, or boundary to capture."), context: z.string().optional().describe("Optional supporting context."), - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), }, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: false }, }, @@ -318,7 +318,7 @@ Use when: action: "search", schema: { input: z.string().describe("Natural language query or tags to search for."), - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), include_metadata: z.boolean().optional().describe("When true, each hit includes a metadata object with full parsed frontmatter. Default: false."), }, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true }, @@ -329,7 +329,7 @@ Use when: action: "get", schema: { input: z.string().describe("Canonical URI (e.g., klappy://canon/values/orientation)."), - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), include_metadata: z.boolean().optional().describe("When true, response includes a metadata object with full parsed frontmatter. Default: false."), section: z.string().optional().describe("Extract only the named ## section from the document. Returns available sections if not found."), }, @@ -340,7 +340,7 @@ Use when: description: "Lists available documentation with categories, counts, and start-here suggestions. Supports temporal discovery: use sort_by='date' to get recent articles with full frontmatter metadata.", action: "catalog", schema: { - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), sort_by: z.enum(["date", "path"]).optional().describe("Sort articles. 'date' returns newest first (requires frontmatter). 'path' returns all docs alphabetically, including undated."), limit: z.number().min(1).max(500).optional().describe("Max articles to return when sort_by is provided. Default: 10, max: 500."), offset: z.number().min(0).optional().describe("Skip this many articles before returning results. Use with limit for pagination. Default: 0."), @@ -363,7 +363,7 @@ Use when: action: "preflight", schema: { input: z.string().describe("Description of what you're about to implement."), - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), }, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true }, }, @@ -372,7 +372,7 @@ Use when: description: "Returns oddkit version and the authoritative canon target (commit/mode).", action: "version", schema: { - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), }, annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: false }, }, @@ -381,7 +381,7 @@ Use when: description: "Storage hygiene: clears orphaned cached data. NOT required for correctness — content-addressed caching ensures fresh content is served automatically when the baseline changes.", action: "cleanup_storage", schema: { - canon_url: z.string().optional().describe("Optional: GitHub repo URL for canon override."), + knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."), }, annotations: { readOnlyHint: false, destructiveHint: false, idempotentHint: true, openWorldHint: false }, }, @@ -399,7 +399,7 @@ Use when: input: (args.input as string) || "", context: args.context as string | undefined, mode: args.mode as string | undefined, - canon_url: args.canon_url as string | undefined, + canon_url: args.knowledge_base_url as string | undefined, include_metadata: args.include_metadata as boolean | undefined, section: args.section as string | undefined, sort_by: args.sort_by as string | undefined, @@ -430,7 +430,7 @@ Schema: blob3 — tool_name oddkit action (e.g. "orient", "search") blob4 — consumer_label best-effort caller identity blob5 — consumer_source how label was resolved (e.g. "user-agent") - blob6 — canon_url which repo baseline is being served + blob6 — knowledge_base_url which knowledge base is being served blob7 — document_uri for get calls, the klappy:// URI requested blob8 — worker_version oddkit version string double1 — count always 1 @@ -496,35 +496,46 @@ Time filter example: WHERE timestamp > NOW() - INTERVAL '30' DAY`, server.tool( "telemetry_policy", - "Return oddkit telemetry and sharing policy guidance. What is tracked, what is excluded, and why. Fetched from canonical governance document at runtime. Response envelope declares governance_source (canon|baseline|minimal) per canon/constraints/core-governance-baseline.", - {}, + "Return oddkit telemetry and sharing policy guidance. What is tracked, what is excluded, and why. Fetched from canonical governance document at runtime. Response envelope declares governance_source (knowledge_base|bundled|minimal) per canon/constraints/core-governance-baseline. Accepts knowledge_base_url to read from an alternate knowledge base.", + { + knowledge_base_url: z.string().optional().describe("Optional GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier rather than silently substituting from the default knowledge base. When provided, fetches canon/constraints/telemetry-governance.md from this repo instead of the oddkit-hosted default. Falls back to the minimal baseline if the file is missing."), + }, { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true, }, - async () => { + async ({ knowledge_base_url }) => { // Governance resolution per canon/constraints/core-governance-baseline: - // 1. Live canon fetch (preferred) → governance_source: "canon" - // 2. Minimal baseline (shipped in code) → governance_source: "minimal" + // 1. Live knowledge base fetch (preferred) → governance_source: "knowledge_base" + // 2. Bundled governance (oddkit Worker snapshot) → governance_source: "bundled" + // 3. Minimal hardcoded fallback → governance_source: "minimal" // // This canary refactor implements tiers 1 and 3 only. The bundled // baseline tier (2) and the build-time schema check arrive in follow-up // work; the manifest + baseline directory are not yet in place. + const startTime = Date.now(); const fetcher = new ZipBaselineFetcher(env); let policyContent: string | null = null; let selfReportHeaders: Record | null = null; - let governanceSource: "canon" | "baseline" | "minimal" = "minimal"; + let governanceSource: "knowledge_base" | "bundled" | "minimal" = "minimal"; try { - const content = await fetcher.getFile("canon/constraints/telemetry-governance.md"); + // When knowledge_base_url is set, strict mode is automatic: suppress the bundled-governance fallback + // so a missing file in the override knowledge base surfaces as "minimal" rather + // than silently serving content from the default knowledge base. + const content = await fetcher.getFile( + "canon/constraints/telemetry-governance.md", + knowledge_base_url, + knowledge_base_url ? { skipBaselineFallback: true } : undefined, + ); if (content) { policyContent = content; const parsed = parseSelfReportHeadersTable(content); if (parsed && Object.keys(parsed).length > 0) { selfReportHeaders = parsed; - governanceSource = "canon"; + governanceSource = "knowledge_base"; } } } catch { @@ -551,6 +562,9 @@ Time filter example: WHERE timestamp > NOW() - INTERVAL '30' DAY`, } } + const headerCount = selfReportHeaders ? Object.keys(selfReportHeaders).length : 0; + const assistantText = `Telemetry policy loaded from ${governanceSource}. ${headerCount} self-report headers available.${knowledge_base_url ? ` (knowledge_base_url override: ${knowledge_base_url})` : ""}`; + return { content: [{ type: "text" as const, @@ -563,6 +577,9 @@ Time filter example: WHERE timestamp > NOW() - INTERVAL '30' DAY`, self_report_headers: selfReportHeaders, generated_at: new Date().toISOString(), }, + server_time: new Date().toISOString(), + assistant_text: assistantText, + debug: { duration_ms: Date.now() - startTime, knowledge_base_url: knowledge_base_url ?? null }, }, null, 2), }], }; diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts index f3b6192..b360070 100644 --- a/workers/src/telemetry.ts +++ b/workers/src/telemetry.ts @@ -160,8 +160,10 @@ export function parseToolCall(payload: unknown): { if (typeof a.input === "string" && a.input.includes("://")) { documentUri = a.input; } - // Extract canon_url from tool arguments - if (typeof a.canon_url === "string" && a.canon_url) { + // Extract knowledge base URL from tool arguments (accept legacy canon_url alias) + if (typeof a.knowledge_base_url === "string" && a.knowledge_base_url) { + canonUrl = a.knowledge_base_url; + } else if (typeof a.canon_url === "string" && a.canon_url) { canonUrl = a.canon_url; } } diff --git a/workers/src/zip-baseline-fetcher.ts b/workers/src/zip-baseline-fetcher.ts index a718c54..a39341e 100644 --- a/workers/src/zip-baseline-fetcher.ts +++ b/workers/src/zip-baseline-fetcher.ts @@ -978,12 +978,25 @@ export class ZipBaselineFetcher { * Get a specific file from the baseline or canon. * Content-addressed: file cache is keyed to each repo's own commit SHA. * Three-tier: module memory → R2 → ZIP extraction. + * + * When `options.skipBaselineFallback` is true, the baseline repo is not + * appended to the search sources. Callers that need to distinguish between + * "file found in the canon_url override" and "file found in the baseline + * fallback" can pass this flag so a null return unambiguously means the + * override canon lacks the file. */ - async getFile(path: string, canonUrl?: string): Promise { + async getFile( + path: string, + canonUrl?: string, + options?: { skipBaselineFallback?: boolean }, + ): Promise { const baselineRepoUrl = "https://github.com/klappy/klappy.dev"; + const skipBaselineFallback = options?.skipBaselineFallback === true; - // Resolve SHA for each repo independently - const baselineSha = await this.getLatestCommitSha(baselineRepoUrl); + // Resolve SHA for the baseline only when it will actually be searched. + const baselineSha = skipBaselineFallback && canonUrl + ? null + : await this.getLatestCommitSha(baselineRepoUrl); // Build the list of repos to search, each with its own SHA const sources: Array<{ url: string; repoKey: string; sha: string }> = []; @@ -998,13 +1011,15 @@ export class ZipBaselineFetcher { }); } - sources.push({ - url: this.env.BASELINE_URL.includes("raw.githubusercontent.com") - ? this.env.BASELINE_URL.replace("/main", "").replace("raw.githubusercontent.com", "github.com") - : baselineRepoUrl, - repoKey: getCacheKey("baseline"), - sha: baselineSha || "unknown", - }); + if (!(skipBaselineFallback && canonUrl)) { + sources.push({ + url: this.env.BASELINE_URL.includes("raw.githubusercontent.com") + ? this.env.BASELINE_URL.replace("/main", "").replace("raw.githubusercontent.com", "github.com") + : baselineRepoUrl, + repoKey: getCacheKey("baseline"), + sha: baselineSha || "unknown", + }); + } for (const source of sources) { // Content-addressed cache key: repo identity + repo SHA + file path diff --git a/workers/test/canon-tool-envelope.smoke.mjs b/workers/test/canon-tool-envelope.smoke.mjs new file mode 100644 index 0000000..07bc8f7 --- /dev/null +++ b/workers/test/canon-tool-envelope.smoke.mjs @@ -0,0 +1,139 @@ +#!/usr/bin/env node +/** + * Live smoke test for knowledge-base-driven MCP tool envelope contracts. + * + * Exercises the actual MCP endpoint (preview or prod) and verifies that + * every canon-driven tool returns the full envelope shape: + * + * { action, result, server_time, assistant_text, debug, ... } + * + * AND that knowledge-base-driven tools surface `governance_source` inside `result` with one of: knowledge_base | bundled | minimal. + * + * Why this exists: parser tests (workers/test/governance-parser.test.mjs) + * exercise parser logic in isolation. They passed for the telemetry_policy + * canary, but the canary shipped with a broken envelope and silent + * knowledge_base_url fallback because no test invoked the MCP tool end-to-end. + * Parser tests cannot catch the tool's response contract — only live smoke + * against the MCP endpoint can. This test also verifies the strict-override + * contract: when knowledge_base_url points at a repo lacking the file, the + * response must surface governance_source: 'minimal', not silently substitute + * from the default knowledge base. + * + * Usage: + * node workers/test/canon-tool-envelope.smoke.mjs + * ODDKIT_URL=https://preview-xxx.oddkit.klappy.dev/mcp node ... + * + * Exit 0 on all pass, 1 on any failure. + */ + +const ODDKIT_URL = process.env.ODDKIT_URL || "https://oddkit.klappy.dev/mcp"; + +let passed = 0; +let failed = 0; + +function ok(label, cond, hint = "") { + if (cond) { + console.log(` ✓ ${label}`); + passed++; + } else { + console.log(` ✗ ${label}${hint ? ` — ${hint}` : ""}`); + failed++; + } +} + +async function callTool(name, args = {}) { + const body = JSON.stringify({ + jsonrpc: "2.0", + id: 1, + method: "tools/call", + params: { name, arguments: args }, + }); + const res = await fetch(ODDKIT_URL, { + method: "POST", + headers: { + "Content-Type": "application/json", + "Accept": "application/json, text/event-stream", + "x-oddkit-client": "envelope-smoke-test", + }, + body, + }); + const text = await res.text(); + // SSE format: `event: message\ndata: {...}\n\n` + const match = text.match(/data: (\{[\s\S]*\})/); + if (!match) throw new Error(`No data payload from ${name}: ${text.slice(0, 300)}`); + const envelope = JSON.parse(match[1]); + const inner = JSON.parse(envelope.result.content[0].text); + return inner; +} + +function expectFullEnvelope(toolName, inner) { + console.log(`\n─── Envelope shape: ${toolName} ───`); + ok(`${toolName}: has 'action'`, typeof inner.action === "string"); + ok(`${toolName}: has 'result'`, typeof inner.result === "object" && inner.result !== null); + ok(`${toolName}: has 'server_time' (ISO 8601)`, + typeof inner.server_time === "string" && /^\d{4}-\d{2}-\d{2}T/.test(inner.server_time), + `got: ${inner.server_time}`); + ok(`${toolName}: has 'assistant_text'`, typeof inner.assistant_text === "string" && inner.assistant_text.length > 0); + ok(`${toolName}: has 'debug'`, typeof inner.debug === "object" && inner.debug !== null); + ok(`${toolName}: debug.duration_ms is a number`, typeof inner.debug?.duration_ms === "number"); +} + +function expectGovernanceSource(toolName, inner, expectedTier) { + console.log(`\n─── Governance source: ${toolName} ───`); + const source = inner.result?.governance_source; + ok(`${toolName}: result.governance_source present`, typeof source === "string", `got: ${source}`); + ok(`${toolName}: result.governance_source is one of knowledge_base|bundled|minimal`, + ["knowledge_base", "bundled", "minimal"].includes(source), + `got: ${source}`); + if (expectedTier) { + ok(`${toolName}: result.governance_source == "${expectedTier}"`, + source === expectedTier, + `got: ${source}`); + } +} + +async function run() { + console.log(`Target: ${ODDKIT_URL}\n`); + + // Tool 1: oddkit_time — non-canon-driven baseline for envelope convention + const timeResult = await callTool("oddkit_time"); + expectFullEnvelope("oddkit_time", timeResult); + + // Tool 2: telemetry_policy — canon-driven, should have full envelope + governance_source + const policyDefault = await callTool("telemetry_policy"); + expectFullEnvelope("telemetry_policy (default knowledge_base)", policyDefault); + expectGovernanceSource("telemetry_policy (default knowledge_base)", policyDefault, "knowledge_base"); + + // Tool 3: telemetry_policy with knowledge_base_url override pointing at a repo + // that doesn't have the governance file — should fall back to minimal. + // This verifies the strict-override contract: when knowledge_base_url is set, + // the bundled fallback is suppressed so a missing file surfaces as "minimal". + console.log(`\n─── knowledge_base_url override: telemetry_policy ───`); + const policyOverride = await callTool("telemetry_policy", { + knowledge_base_url: "https://github.com/torvalds/linux", + }); + expectFullEnvelope("telemetry_policy (knowledge_base_url override)", policyOverride); + ok( + "telemetry_policy: knowledge_base_url override falls back to minimal when file missing (strict mode)", + policyOverride.result?.governance_source === "minimal", + `got: ${policyOverride.result?.governance_source}`, + ); + ok( + "telemetry_policy: minimal fallback still returns 8 headers", + Object.keys(policyOverride.result?.self_report_headers ?? {}).length === 8, + `got: ${Object.keys(policyOverride.result?.self_report_headers ?? {}).length}`, + ); + ok( + "telemetry_policy: debug.knowledge_base_url echoes the override", + policyOverride.debug?.knowledge_base_url === "https://github.com/torvalds/linux", + `got: ${policyOverride.debug?.knowledge_base_url}`, + ); + + console.log(`\n${passed} passed, ${failed} failed`); + process.exit(failed === 0 ? 0 : 1); +} + +run().catch((e) => { + console.error(e); + process.exit(1); +});