Conversation
Mirrors the PR #96 encode pattern. Extracts challenge behavior from live governance articles (landed in klappy.dev canon via PR #99) rather than hardcoded source logic. New functions in workers/src/orchestrate.ts: - discoverChallengeTypes — per-canonUrl cached type discovery - fetchBasePrerequisites — universal prerequisite checks - fetchNormativeVocabulary — RFC 2119 + architectural load-bearing terms - fetchStakesCalibration — mode-to-depth filter - extractPrereqTable / extractKeywordsFromCheck — shared helpers Refactored: - runChallengeAction — replaces hardcoded detectClaimType / generateChallenges / findTensions / findMissingPrerequisites with governance extraction. Supports multi-match. Filters output by stakes calibration based on mode parameter. - runCleanupStorage — clears all four new caches on invalidation Invariant: voice-dump mode suppresses all challenge output regardless of matched types. Load-bearing per stakes-calibration governance — some modes exist for raw capture and pressure-testing at that stage damages the mode. Graceful degradation: missing governance articles fall back to minimal built-in behavior with warnings, rather than failing. Co-authored-by: Claude <noreply@anthropic.com>
Refactor runChallengeAction in workers/src/orchestrate.ts to extract challenge-type behavior from canon governance articles at runtime rather than hardcoding claim-type detection, questions, prerequisites, and tension rules in source. Structural mirror of PR #96 (encode). Detection upgraded mid-implementation from regex-OR to BM25 + stemming after the gauntlet revealed that regex-based matching was morphologically brittle ("coin" doesn't match trigger "coining"). The pivot removed an entire class of bug and seeded a reusable pattern for future governance-driven tools. Changes in workers/src/orchestrate.ts: - New: ChallengeTypeDef, BasePrerequisite, NormativeVocabulary, StakesModeConfig, StakesCalibration - New: discoverChallengeTypes (builds per-canonUrl BM25 index over detection text), fetchBasePrerequisites, fetchNormativeVocabulary, fetchStakesCalibration — each with per-canonUrl cache and graceful degradation on missing articles - New: evaluatePrerequisiteCheck — interprets natural-language check strings from prerequisite overlay tables - Refactored runChallengeAction: multi-match via BM25 score > 0, base + overlay prerequisite aggregation, stakes calibration filtering, voice-dump suppression invariant, governance-driven tension detection - Extended runCleanupStorage with five new cache clears (types, type-index, base prerequisites, vocabulary, calibration) - Removed dead detectClaimType (legacy src/tasks/challenge.js retains its copy for CLI backward-compat) - Added CHALLENGE_STOP_WORDS set preserving modal verbs as signal Changes in workers/src/bm25.ts (backward-compatible extension): - tokenize(), buildBM25Index() accept optional stopWords: Set<string> - BM25Index gains optional stopWords field so searchBM25 tokenizes queries consistently with the index - Default behavior unchanged — existing callers unaffected - Motivation: default STOP_WORDS filters modals (must, should, shall, may, not) which are signal for challenge-type detection New tests: workers/test/governance-parser.test.mjs — 94 assertions against live governance articles fetched from klappy.dev raw. Covers type parsing, fallback resolution, BM25 detection, stemming regression cases (coin/coining, propose/proposed, principle/principles), multi- match, and the voice-dump suppression invariant. 94/94 pass. Bugs the gauntlet caught on this PR: 1. Voice-dump suppression invariant would have shipped broken — the calibration cell reads "none (suppress all challenge)" not bare "none". Strict-equality parser would have produced a single-element array, voice-dump mode would have surfaced all challenges in prod. 2. Morphological brittleness in regex detection (coin vs coining) — triggered the pivot to BM25 + stemming. 3. Default BM25 STOP_WORDS silently breaks strong-claim and proposal detection by filtering modal verbs. Fixed via custom stop word set. Verification: - npm run typecheck: clean - tests/smoke.sh: 6/6 pass (legacy CLI path — backward compat preserved) - workers/test/governance-parser.test.mjs: 94/94 pass - AI voice clichés audit on new comments: clean - oddkit_preflight, challenge, gate, validate: all run; gate NOT_READY due to same hardcoded-logic gap as challenge pre-refactor (flagged as follow-up) Response shape change: adds mode, matched_types, type_definitions, block_until_addressed; removes claim_type. Consumed programmatically, not rendered. Follow-ups flagged: - Encode parity PR — same regex-OR brittleness in runEncodeAction; pattern proven here, port will be near-mechanical - klappy.dev meta governance PR — "compiles into a case-insensitive word-boundary regex" is now stale language - Gate refactor candidate — same hardcoded-logic shape as challenge pre-refactor Refs: - Depends on: klappy/klappy.dev#99 (governance articles this code reads) - Structural mirror: #96 (governance-driven encode) - Evidence: docs/oddkit/evidence/challenge-governance-code-refactor.md
Re-applies the four review fixes from sibling commits (31f8134, e9ef2f9, 84932f0) and the dead-code removal that the bugbot review also flagged, on top of the BM25 + stemming detection swap. - Vocabulary regex sorted by length descending so 'MUST NOT' matches before 'MUST' (closes bugbot 'Regex alternation order') - Stakes calibration mode column lowercased at parse time AND mode normalized to lowercase at lookup time (closes bugbot 'Mode column not lowercased breaks voice-dump suppression') - first_1 reframings policy now surfaces a single reframing total across all matched types, not one per type (closes bugbot 'first_1 reframings surfaces multiple instead of one') - Detection runs BEFORE voice-dump suppression check, and SUPPRESSED response includes the governance field for shape parity with CHALLENGED (closes bugbot 'SUPPRESSED response missing governance') - Renames type_definitions to governance in CHALLENGED response so both statuses return the same shape under the same key - Dead detectClaimType already removed by the BM25 commit (closes bugbot 'Dead code: detectClaimType has zero callers') Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs: 94/94 pass - tests/smoke.sh: 6/6 pass
…ctor evidence Captures the fork-resolution and bugbot-review-driven fixes as a sixth layer of catch alongside the gauntlet bugs. Records the lesson: read PR review comments before treating divergent remote as unknown work.
…e cross-contamination
Caught in PR #100 review by Klappy: the CHALLENGE_STOP_WORDS Set added mid-PR to fix a BM25 over-match was itself a Vodka Architecture violation in a refactor explicitly about removing such violations. The constant carried a domain opinion ('modals are signal, articles are filler in challenge detection') that belonged in canon, not in worker source. Anti-pattern fixed: - Drop the hardcoded CHALLENGE_STOP_WORDS Set from workers/src/orchestrate.ts - Drop the duplicate hardcoded copy from workers/test/governance-parser.test.mjs - Extend NormativeVocabulary interface with stopWords: Set<string> - Extend fetchNormativeVocabulary to extract '## Detection Noise' code block from odd/challenge/normative-vocabulary.md (lands in klappy.dev#100) - Move BM25 index build out of discoverChallengeTypes into a new lazy builder getOrBuildChallengeTypeIndex(types, vocab, canonUrl) so the index can use governance-sourced stop words rather than a constant - Update parser test to fetch Detection Noise the same way the worker does — no hardcoded duplicate, no drift risk. Test gains 3 new assertions: Detection Noise parses non-empty, excludes modal verbs, includes common filler Net hardcoded-constants delta: this PR removes ~6 classes of hardcoded domain opinion (claim type detection, questions, prereqs, tension regex, reframings, stop words) and adds zero. The remaining minimal RFC 2119 fallback ('MUST', 'MUST NOT', 'SHOULD', 'SHOULD NOT') and 'planning' default mode are server-availability fallbacks for when canon is unreachable, not domain governance. Test currently runs against the feature branch via KLAPPYDEV_RAW env override. After klappy.dev#100 merges, the override comes off and the test reads from main with no further changes. Verification: - npm run typecheck: clean - workers/test/governance-parser.test.mjs (vs feature branch): 97/97 pass - tests/smoke.sh: 6/6 pass - grep CHALLENGE_STOP_WORDS in workers/ and src/: zero matches Refs: - Caught in: this PR review by Klappy - Depends on: klappy/klappy.dev#100 (Detection Noise section) - Lesson: 'is this the right architectural shape' is a category the current gauntlet does not catch — the tools verify governance content, not whether new code is creating new ungoverned content. Possible future tool: a vodka-audit that flags non-trivial Sets/Maps/lists in worker source and asks 'should this be in canon?'
…arser - Use matchAll and prefer prohibition directive type over leftmost requirement match so excerpts like 'You MUST X and MUST NOT Y' surface the prohibition. Regex switched to global flag to support matchAll. - Port fetchStakesCalibration toLowerCase fix to the fidelity test parser so byMode keys stay lowercase even if governance introduces capitalized mode names.
…st pickStrongest
Two fixes:
1. Table row parser (6 call sites) was using
.split('|').map(trim).filter(c => c.length > 0) which also drops
legitimately-empty interior cells, silently collapsing the column
count. In fetchStakesCalibration this would silently drop a
voice-dump row with an empty tiers cell, breaking the suppression
invariant with no error signal. Introduce parseTableRow helper that
only strips the leading/trailing empties produced by surrounding
pipes, preserving empty middle cells.
2. Hoist pickStrongest (now pickStrongestDirective) out of the
per-entry loop in runChallengeAction. It captures no loop-scoped
state, so defining it inside the loop needlessly re-allocated the
closure on each iteration and misled readers about its scope.
Matches the placement of evaluatePrerequisiteCheck.
…and dead branch Two issues from bugbot's 14:29 review: 1. Reframing 'none' check applies same defensive pattern as the tiersRaw fix in fetchStakesCalibration. The cell may be 'none' or 'none (parenthetical reason)' — strict equality would silently surface all reframings via the 'all' fallback when authors include explanatory text. Same defect class as bug #3 in the evidence note; sweep applied. 2. Remove unreachable questionTiers.length === 0 branch in the question- surfacing condition. The SUPPRESSED early-return at line 1635 already handles that case, so the branch was dead code that misleadingly suggested 'surface all questions for empty tiers' semantics — the actual semantic is full suppression. Verified: typecheck clean, parser test 97/97 against main, smoke 6/6. Defect-class sweep on governance cell strict-equality checks: only two sites (tiersRaw, surfacing), both now defensive.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
oddkit | 6e01a00 | Commit Preview URL Branch Preview URL |
Apr 17 2026, 03:17 PM |
⚠ HOLD — schema bug found by preview testAfter opening this promotion PR, I actually tested the production preview at https://main-oddkit.klappy.workers.dev — which I should have done before opening the PR. The MCP tool's Net effect: the voice-dump suppression invariant — the load-bearing feature this PR's evidence note specifically calls out — cannot be invoked through the public MCP tool. Schema rejects the call before runtime ever sees it. PR #102 fixes this with a 10-line schema expansion. Hold this promotion until #102 lands and merges to main. Then this PR fast-forwards and the prod deploy ships a feature whose headline invariant actually works. What the production preview confirmed works
LessonThree verification layers passed (typecheck, parser-fidelity 97/97, smoke 6/6) and the deploy succeeded. None exercised the public API contract end-to-end with the load-bearing mode value. Testing the running preview is not optional. |
Captures DOLCHE for the session that delivered PR #100 (governance-driven challenge refactor with BM25 + stemming) and the unresolved schema bug that made the voice-dump suppression invariant unreachable from the public API. Critical for next model picking up: - PR #101 (prod promotion) is BLOCKED — schema fix not yet merged - fix/challenge-mode-schema-includes-writing-modes has the fix - After fix lands, manually curl preview with mode=voice-dump before promoting Also records lessons for the next session: defect-class sweep discipline, public API contract verification, parser test flakiness, and three follow-up refactors carrying the same anti-pattern as challenge pre-refactor (encode, gate, orient).
Promote E0008 challenge governance refactor (PR #100) to production
Promotes 18 commits from
maintoprod, all originating from PR #100 (governance-drivenoddkit_challengerefactor with BM25 + stemming).Scope verification
Diff confirmed clean — only challenge-related code, evidence note, ledger entry, parser test, and the small backward-compatible
bm25.tsextension.runOrientAction,runGateAction,runEncodeActionbodies are byte-identical between main and prod. The two near-orient diff hunks are line-number shifts only (deaddetectClaimTyperemoved above orient; newpickStrongestDirectiveadded after it).What ships
oddkit_challenge— claim type detection, questions, prerequisites, tension vocabulary, and stakes calibration all read from canon at runtime, mirroring the encode pattern from PR feat: governance-driven encode architecture #96coiningandcoinnow map to the same stembm25.tsextension — backward-compatiblestopWords: Set<string>parameter ontokenize,buildBM25Index, andsearchBM25. Default behavior unchanged for all existing callers includingoddkit_searchodd/challenge/normative-vocabulary.md(klappy.dev#100, already merged), not as a hardcoded constant in worker source"none (parenthetical)"cell contentparseTableRowhelper — preserves empty interior cells across all six governance table parsersgovernancefieldclaim_typealias retained in response envelopeVerification on
mainworkers/test/governance-parser.test.mjs): 97/97 against live klappy.dev mainBugs caught and fixed during PR #100
15+ across three review surfaces: oddkit gauntlet (3 governance/architectural), bugbot (12+ code-correctness across multiple defect classes), human review (1 Vodka Architecture violation). All resolved before merge to main.
Follow-ups (not blocking this promotion) — same anti-pattern in three remaining tools
The challenge refactor proved out a reusable pattern: governance-driven extraction with per-canonUrl caches, BM25 + stemming for detection, response-shape parity, parseTableRow safety. Three other tools still carry the pre-refactor shape and should be ported next:
oddkit_encodeparity —runEncodeActionstill uses regex-OR matching for encoding-type detection; same morphological brittleness as challenge pre-refactor. Pattern proven, port will be near-mechanicaloddkit_gaterefactor —runGateActionhas hardcodedexploration→planningandplanning→executionprereq lists; same hardcoded-logic gap as challenge pre-refactor (NOT_READY false negatives demonstrated twice during PR feat(challenge): governance-driven runChallengeAction (E0008) #100 work)oddkit_orientrefactor —runOrientActionhas three hardcoded class instances of governance-in-source: per-mode question lists (lines 1489–1508), assumption-detection regex on modal verbs (line 1482), and the "Proactive posture" governance prose baked as a string literal (line 1528). All three should move to canon articles parallel toodd/challenge/. The proactive-posture string is especially load-bearing — evolving the posture currently requires a worker redeployRefs