From 5d982aeeb9fc7ed51620e07a62622039fe50d99f Mon Sep 17 00:00:00 2001 From: Klappy Date: Fri, 17 Apr 2026 04:59:18 +0000 Subject: [PATCH] feat(challenge): governance articles for E0008 challenge refactor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Eleven governance articles plus an evidence note, mirroring the PR #96 encode pattern. Governance lands in canon first; the future workers/src/orchestrate.ts refactor extracts against live articles. No runtime behavior change in this PR. Meta governance - odd/challenge-types/how-to-write-challenge-types.md — extraction contract, Domain Adaptation with three worked patterns (software engineering, thought leadership from books, comparative architectural writing), six-step procedure for KB stewards Software-engineering default challenge types (shipped defaults, labeled as such on purpose) - odd/challenge-types/strong-claim.md - odd/challenge-types/proposal.md - odd/challenge-types/assumption.md - odd/challenge-types/observation.md (fallback: true) Architectural-writing overlay challenge types (klappy.dev additions for comparative-analysis and principle-extraction writing work) - odd/challenge-types/pattern-coinage.md - odd/challenge-types/comparative-positioning.md - odd/challenge-types/principle-extraction.md Supporting articles (coexist both domains in klappy.dev canon) - odd/challenge/base-prerequisites.md - odd/challenge/normative-vocabulary.md - odd/challenge/stakes-calibration.md (includes voice-dump mode which suppresses all challenge output as invariant) Evidence - docs/oddkit/evidence/challenge-governance-articles-commit.md captures the gauntlet run: preflight, AI voice cliches audit, author-identity check, derives-from path audit, Writing Canon gate per-article, session capture reference, open risks. Gauntlet notes - Writing Canon gate passed per-article (title, blockquote, summary section, headers, no buried claims) after remediation of missing Summary sections caught during the validate pass - AI voice cliches audit clean - One broken derives_from path caught and fixed (canon/epistemic-modes.md -> canon/definitions/epistemic-modes.md in stakes-calibration.md) - oddkit_gate returned NOT_READY due to the same hardcoded-logic problem this refactor solves — the gate's generic prereqs cannot see session state. Noted honestly; proceeding because materially met and documented. Co-authored-by: Claude --- .../challenge-governance-articles-commit.md | 91 ++++ odd/challenge-types/assumption.md | 73 +++ .../comparative-positioning.md | 76 ++++ .../how-to-write-challenge-types.md | 416 ++++++++++++++++++ odd/challenge-types/observation.md | 71 +++ odd/challenge-types/pattern-coinage.md | 75 ++++ odd/challenge-types/principle-extraction.md | 76 ++++ odd/challenge-types/proposal.md | 76 ++++ odd/challenge-types/strong-claim.md | 73 +++ odd/challenge/base-prerequisites.md | 45 ++ odd/challenge/normative-vocabulary.md | 80 ++++ odd/challenge/stakes-calibration.md | 101 +++++ 12 files changed, 1253 insertions(+) create mode 100644 docs/oddkit/evidence/challenge-governance-articles-commit.md create mode 100644 odd/challenge-types/assumption.md create mode 100644 odd/challenge-types/comparative-positioning.md create mode 100644 odd/challenge-types/how-to-write-challenge-types.md create mode 100644 odd/challenge-types/observation.md create mode 100644 odd/challenge-types/pattern-coinage.md create mode 100644 odd/challenge-types/principle-extraction.md create mode 100644 odd/challenge-types/proposal.md create mode 100644 odd/challenge-types/strong-claim.md create mode 100644 odd/challenge/base-prerequisites.md create mode 100644 odd/challenge/normative-vocabulary.md create mode 100644 odd/challenge/stakes-calibration.md diff --git a/docs/oddkit/evidence/challenge-governance-articles-commit.md b/docs/oddkit/evidence/challenge-governance-articles-commit.md new file mode 100644 index 00000000..7e8ff34b --- /dev/null +++ b/docs/oddkit/evidence/challenge-governance-articles-commit.md @@ -0,0 +1,91 @@ +# Gauntlet Evidence — Challenge Governance Articles Commit + +**Branch:** `feat/challenge-governance-articles` +**Date:** 2026-04-17 +**Scope:** 11 new governance articles under `odd/challenge-types/` and `odd/challenge/` +**Deliverable type:** Canon documents (governance only — no code, no UI, no runtime behavior change) + +--- + +## Definition of Done — Evidence + +### 1. Change Description + +Eleven governance articles committed to klappy.dev canon, setting up the oddkit_challenge refactor to mirror the PR #96 encode pattern (governance-driven via extraction contract rather than hardcoded source logic). + +- 1 meta governance article: `odd/challenge-types/how-to-write-challenge-types.md` +- 4 software-engineering default challenge types: `strong-claim`, `proposal`, `assumption`, `observation` (fallback) +- 3 architectural-writing overlay challenge types: `pattern-coinage`, `comparative-positioning`, `principle-extraction` +- 3 supporting articles: `base-prerequisites`, `normative-vocabulary`, `stakes-calibration` + +### 2. Verification Performed + +- `oddkit_preflight` surfaced three constraint documents to check against: `canon/constraints/ai-voice-cliches.md`, `canon/constraints/author-identity-language.md`, `canon/constraints/definition-of-done.md` +- AI voice clichés audit via grep against new articles for formulaic transitions, puffing, overclarification, summary clichés, bold-then-explain, and em-dash clustering +- Em-dash density compared against precedent `odd/encoding-types/how-to-write-encoding-types.md` +- Author identity language audit (no identity claims about Klappy in any article) +- `derives_from` path audit — every referenced path checked against repo filesystem +- Writing Canon checklist (8 tests) applied per-article: title test, blockquote test, metadata test, summary test, header scan test, no buried claims, axiom space test, ghost writer test +- Header scan output reviewed for each representative article + +### 3. Observed Behavior + +- AI voice clichés audit: zero hits on formulaic transitions, puffing, overclarification, summary clichés, bold-then-explain +- Em-dash density: new articles average 0.04–0.16 per line; precedent meta averages 0.11 per line — same neighborhood, no ticced clustering +- All 11 articles contain the required `## Summary — [subtitle]` section after the blockquote +- Header scan confirmed headers tell each document's story in sequence (no "Background / Discussion / Conclusion" generic forms) +- `derives_from` paths: all verified against repo except one caught error — `canon/epistemic-modes.md` corrected to `canon/definitions/epistemic-modes.md` in `stakes-calibration.md` + +### 4. Evidence Produced + +This file. Plus the git diff (11 new files under `odd/challenge-types/` and `odd/challenge/`). Plus the session journal at `/home/claude/session-journal-challenge-refactor.md` capturing the full DOLCHE derivation: 1 Decision, 4 Learnings, 1 Constraint, 1 Handoff. + +Visual proof: **N/A — no UI/interaction/layout change.** These are canon governance documents rendered by existing klappy.dev article templates (same as `odd/encoding-types/*` articles already in production). No new rendering path, no new visible state. + +### 5. Self-Audit Completed + +- **Intended outcome:** klappy.dev canon contains a complete governance set that the future `workers/src/orchestrate.ts` refactor can extract against — governance articles first, code refactor second, matching PR #96 order. +- **Constraints applied:** Writing Canon (progressive disclosure, summary sections, descriptive headers), AI voice clichés (no clustering of AI tells), author identity language (no translator claims about Klappy), definition-of-done (this file), frontmatter schema (booleans/integers/dates unquoted, strings with special chars quoted). +- **Decision rules followed:** mirror the PR #96 encode pattern; ship software-engineering defaults labeled honestly; use base-plus-overlay for prerequisites; multi-match semantics by design; governance before code. +- **Tradeoffs:** the defaults are software-flavored on purpose (alternative was "generic human-language defaults" that would serve every domain badly); both software and writing modes coexist in single supporting articles (alternative was separate canons per mode — heavier, not yet justified); no `## Summary` was originally drafted — caught by Writing Canon gate and remediated before commit. +- **Remaining risks:** detection-pattern overlap between new architectural-writing types and existing software types may produce noisier multi-matches until stakes-calibration trims by mode; `Priority` field semantics in Type Identity are under-specified for display ordering (flagged in session journal); no integration test against the live challenge tool yet — that comes with the workers/src/orchestrate.ts refactor. + +--- + +## Writing Canon Gate — Per-Article Results + +| Article | Title test | Blockquote test | Summary section | Header scan | +|---|---|---|---|---| +| how-to-write-challenge-types.md | ✓ | ✓ | ✓ | ✓ | +| strong-claim.md | ✓ | ✓ | ✓ | ✓ | +| proposal.md | ✓ | ✓ | ✓ | ✓ | +| assumption.md | ✓ | ✓ | ✓ | ✓ | +| observation.md | ✓ | ✓ | ✓ | ✓ | +| pattern-coinage.md | ✓ | ✓ | ✓ | ✓ | +| comparative-positioning.md | ✓ | ✓ | ✓ | ✓ | +| principle-extraction.md | ✓ | ✓ | ✓ | ✓ | +| base-prerequisites.md | ✓ | ✓ | ✓ | ✓ | +| normative-vocabulary.md | ✓ | ✓ | ✓ | ✓ | +| stakes-calibration.md | ✓ | ✓ | ✓ | ✓ | + +--- + +## OLDC+H — Session Capture Reference + +Full DOLCHE capture at `session-journal-challenge-refactor.md`. Summary: + +- **D** — challenge-refactor-full-governance-drafted: commit 11 articles mirroring PR #96 encode pattern +- **L** — framework-agnostic-defaults-are-not: the extraction contract is domain-agnostic but default articles are not, and labeling them honestly serves every domain better than pretending they are neutral +- **L** — voice-dump-mode-is-a-feature: suppressing challenge in voice-dump mode preserves raw thought flow +- **L** — tim-complaint-surfaces-architectural-insight: software-verbiage complaint was a structural signal that the framework needed domain-adaptation as a first-class capability +- **L** — klappy-two-modes-same-kb: operator mode and architectural-writing mode coexist in klappy.dev canon via multi-match and shared supporting articles +- **C** — voice-dump-suppression-invariant: the tool must not interfere with raw thought capture +- **H** — challenge-refactor-code-next: implement extraction contract in `workers/src/orchestrate.ts` mirroring PR #96 after this commit merges + +--- + +## Version Tracking + +- Branch: `feat/challenge-governance-articles` +- Post-merge: ledger entry capturing E0008 challenge-governance milestone +- Related PRs: depends-on conceptually from PR #96 (governance-driven encode); precedes future PR for `workers/src/orchestrate.ts` challenge refactor diff --git a/odd/challenge-types/assumption.md b/odd/challenge-types/assumption.md new file mode 100644 index 00000000..90856d4e --- /dev/null +++ b/odd/challenge-types/assumption.md @@ -0,0 +1,73 @@ +--- +uri: klappy://odd/challenge-types/assumption +title: "Challenge Type: Assumption" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type", "assumption"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge behavior for assumption type" +status: active +--- + +# Challenge Type: Assumption + +> An implicit premise treated as established. Assumptions are dangerous precisely because they are not examined — the work built on top of them silently inherits their uncertainty. Pressure here is about making the premise explicit and naming what would validate or invalidate it. + +--- + +## Summary — Make The Implicit Explicit Before It Compounds + +This type fires when an input rests on an unexamined premise — `assume`, `since`, `because`, `given that`, `of course`, `naturally`. Assumptions are the silent structural risks: if the premise is wrong, every claim built on top inherits the error. The challenge asks whether the assumption has been validated, what breaks if it is wrong, whether it is documented or merely implicit, and when it was last verified. Prerequisites check that the assumption is marked for validation, that its breakage impact is considered, and that its source is named. The goal is to surface the assumption from the background of the argument into the foreground, where it can be tested or explicitly accepted. + +--- + +## Type Identity + +| Field | Value | +|---|---| +| Slug | assumption | +| Name | Assumption | +| Priority | medium | + +--- + +## Detection Patterns + +``` +assume, assuming, presume, given that, since, because, if we, taking for granted, it's obvious that, naturally, of course +``` + +--- + +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| Has this assumption been validated with evidence? | baseline | +| What if this assumption is wrong — what breaks? | baseline | +| Is this assumption documented or just implicit? | elevated | +| When was this last verified against reality? | elevated | +| Is this a universal assumption or context-specific? | rigorous | + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| validation-marked | input contains "test", "verify", "validate", "check", "confirm" | "Assumption not marked for validation — assumptions without tests compound silently" | +| breakage-considered | input contains "if wrong", "break", "fail", "impact", "depends on" | "Dependency on this assumption not named — what else falls if this is false?" | +| source-of-assumption | input contains "from", "based on", "per", "according to", "historically" | "Source of the assumption not named — is this documented, observed, or inherited?" | + +--- + +## Suggested Reframings + +- Make explicit: "We are assuming X; we believe it because Y; we would test it by Z" +- Reframe as known-unknown: "X is assumed but unverified; impact if false is W" +- Promote to hypothesis: "Assumption X is treated as true until evidence from test T says otherwise" diff --git a/odd/challenge-types/comparative-positioning.md b/odd/challenge-types/comparative-positioning.md new file mode 100644 index 00000000..32b14294 --- /dev/null +++ b/odd/challenge-types/comparative-positioning.md @@ -0,0 +1,76 @@ +--- +uri: klappy://odd/challenge-types/comparative-positioning +title: "Challenge Type: Comparative Positioning" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type", "comparative-positioning", "writing-analysis"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge behavior for comparative-positioning claims — positioning work against a landscape of other work" +status: active +--- + +# Challenge Type: Comparative Positioning + +> A claim that locates this work against other work — existing approaches, adjacent projects, viral repositories, published papers, competing frameworks. Comparative positioning lives or dies on whether the comparison target is characterized honestly and freshly. The failure mode is strawmanning, selective citation, or comparing against a stale snapshot of something that has since evolved. + +--- + +## Summary — Honest Comparison Requires Fair Characterization And Current Evidence + +This type fires when an input positions work against a landscape — `unlike`, `similar to`, `existing approaches`, `in contrast to`, `prior work`, `state of the art`. The challenge asks whether the comparison target is fairly represented or strawmanned, whether the snapshot is current, whether the strongest version of the compared work was engaged, what each approach is actually trying to solve, and whether a better comparison exists that was overlooked. Prerequisites check that the target is specifically named, that it is actually described (not just alluded to), that freshness is established, and that shared ground is acknowledged before differences. The goal is to keep comparative claims honest against both the comparison target and the reader who may know that target better than the writer does. + +--- + +## Type Identity + +| Field | Value | +|---|---| +| Slug | comparative-positioning | +| Name | Comparative Positioning | +| Priority | high | + +--- + +## Detection Patterns + +``` +unlike, similar to, where x differs, existing approaches, the difference from, as opposed to, compared to, in contrast to, other frameworks, unlike other, what makes this different, prior work, state of the art, comparable to +``` + +--- + +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| Is the comparison target fairly represented or characterized in its weakest form? | baseline | +| When was the comparison target last examined? Has it changed since? | baseline | +| Have you engaged the strongest version of the compared work, or the most convenient? | elevated | +| What is the compared work trying to solve that yours isn't, and vice versa? | elevated | +| If the author of the compared work read this, would they recognize their own work in your description? | rigorous | +| Are you missing an adjacent work that would be a better comparison than the one you chose? | rigorous | + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| comparison-target-named | input contains a proper-noun reference to the compared work (project name, author, paper, repo) | "Comparison target not specifically named — 'existing approaches' is not a comparison" | +| target-accurately-characterized | input contains "their approach", "they do", "their design", "the x approach" with descriptive content | "Compared work not actually described — contrast requires describing what is being contrasted" | +| freshness-verified | input contains a date, version, or recency marker ("as of", "current", "recent", "latest", year) | "Freshness of the comparison not established — comparisons against stale snapshots are misleading" | +| shared-ground-acknowledged | input contains "similarly", "shared", "in common", "both", "where we agree" | "No shared ground acknowledged — honest comparison names similarities before differences" | + +--- + +## Suggested Reframings + +- Reframe with fairness: "X and my approach share A and B; they diverge on C, where X does Y and I do Z" +- Reframe with recency: "As of [date/version], X does Y; I'm aware this may have changed" +- Reframe with humility: "I may be mischaracterizing X; the claim holds against my reading of their docs as of [date]" +- Reframe to acknowledge strongest version: "The strongest defense of X is A; my claim is that even granting A, Z still holds" diff --git a/odd/challenge-types/how-to-write-challenge-types.md b/odd/challenge-types/how-to-write-challenge-types.md new file mode 100644 index 00000000..b018f1c2 --- /dev/null +++ b/odd/challenge-types/how-to-write-challenge-types.md @@ -0,0 +1,416 @@ +--- +uri: klappy://odd/challenge-types/how-to-write-challenge-types +title: "How to Write a Challenge Type Governance Article" +audience: docs +exposure: nav +tier: 1 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type-meta", "governance", "meta", "prompt-over-code", "template"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/principles/prompt-over-code.md, canon/principles/vodka-architecture.md, canon/constraints/epistemic-challenge.md, odd/encoding-types/how-to-write-encoding-types.md" +complements: "odd/challenge-types/strong-claim.md, odd/challenge-types/proposal.md, odd/challenge-types/assumption.md, odd/challenge-types/observation.md, odd/challenge/base-prerequisites.md, odd/challenge/normative-vocabulary.md, odd/challenge/stakes-calibration.md" +governs: "All challenge-type governance articles, server parsing of challenge-type docs, custom challenge type creation, multi-type matching semantics, base-plus-overlay prerequisite resolution" +status: active +--- + +# How to Write a Challenge Type Governance Article + +> Every challenge type in oddkit — default or custom — is defined by a governance article, not by server code. The server discovers these articles at challenge time and extracts detectors, question banks, prerequisite overlays, and reframings from what it finds. A single input may match multiple types at once; each match contributes. Universal prerequisites live in a base article that applies to every claim; type articles add overlays. Normative vocabulary and stakes calibration are governed by separate articles, not per type. Oddkit ships with a set of software-engineering defaults; every other domain — thought leadership from books, comparative architectural writing, translation consultancy, pastoral ministry — is expected to override with its own taxonomy. Write the articles well and the server handles the rest. No deployment. No code change. Prompt over code, applied to the vocabulary of challenge itself. + +--- + +## Summary — Four Article Kinds, Machine-Extractable Structure, Multi-Match Semantics + +Challenge governance is not a single document. It is a small set of coordinated articles, each with an explicit extraction contract: + +1. **Challenge-type articles** — one per type, discovered by the tag `challenge-type`. Each defines detection, questions, per-type prerequisite overlays, and reframings. +2. **Base prerequisites article** — one doc at `odd/challenge/base-prerequisites.md` defining universal checks applied to every claim regardless of type. +3. **Normative vocabulary article** — one doc at `odd/challenge/normative-vocabulary.md` defining the words whose presence in a retrieved canon quote signals a tension-bearing directive. +4. **Stakes calibration article** — one doc at `odd/challenge/stakes-calibration.md` defining how epistemic mode (`exploration` / `planning` / `execution`) maps to challenge depth. + +A challenge can match multiple types. Each matched type contributes questions, overlay prerequisites, and reframings. The server dedupes across contributions. Base prerequisites always apply. Stakes calibration trims the aggregated output. + +Missing sections and missing articles degrade gracefully. A type without a `## Detection Patterns` block still works if invoked explicitly. An absent stakes calibration article falls back to surfacing everything. An absent base prerequisites article means every check must live in a type article. A fully structured governance set produces the richest challenge. + +--- + +## Recommended File Layout + +``` +odd/challenge-types/strong-claim.md +odd/challenge-types/proposal.md +odd/challenge-types/assumption.md +odd/challenge-types/observation.md +odd/challenge-types/theological-claim.md (custom example) + +odd/challenge/base-prerequisites.md +odd/challenge/normative-vocabulary.md +odd/challenge/stakes-calibration.md +``` + +Challenge-type articles live under `odd/challenge-types/`. Supporting articles live under `odd/challenge/` because they are not per-type. + +--- + +## Recommended Frontmatter for Challenge-Type Articles + +Standard oddkit frontmatter with these conventions for best results: + +| Field | Convention | +|---|---| +| uri | `klappy://odd/challenge-types/{slug}` | +| tags | Must include `challenge-type` — this is the discovery tag | +| governs | Should reference `oddkit_challenge behavior for type {slug}` | +| derives_from | Should include `canon/constraints/epistemic-challenge.md` | +| fallback | Optional boolean — see Fallback Behavior | + +The `challenge-type` tag is the discovery mechanism. The server searches for articles tagged `challenge-type` at challenge time. Without this tag, the server will not discover the type — but nothing else breaks. + +--- + +## Recommended Sections for Challenge-Type Articles + +The server extracts from these section headers when present. Including them produces the best detection, question surfacing, and prerequisite checking. Missing sections degrade gracefully — the server uses what it finds. + +### Section: Blockquote (opening) + +The opening blockquote defines the type in one to three sentences. This is surfaced to the model as the type's description and appears in the challenge response alongside every other matching type's description. Keep it concise — it will appear next to every sibling's blockquote. + +### Section: `## Type Identity` + +A markdown table with exactly these rows: + +```markdown +## Type Identity + +| Field | Value | +|---|---| +| Slug | {kebab-case identifier, matches filename} | +| Name | {human-readable type name} | +| Priority | {relative priority — affects display order only, not exclusivity} | +``` + +The server extracts `Slug` and `Name`. `Slug` is the stable identifier used in responses and multi-match dedup keys. Challenge types have no single-letter code — they do not serialize into TSV. + +### Section: `## Detection Patterns` + +A code block of comma-separated trigger words and phrases that identify this claim type in input text: + +````markdown +## Detection Patterns + +When an input matches any of these patterns, the claim is tagged as {Name}: + +``` +must, always, never, guaranteed, impossible, certain, definitely, obviously, clearly +``` +```` + +The server compiles these into a case-insensitive word-boundary regex. Multiple types may match a single input — this is the intended design. A claim stating "We must always use X" may match both `strong-claim` (for "must") and a hypothetical `prescription` type (if one exists). Each matched type contributes its questions and overlays. + +### Section: `## Challenge Questions` + +A markdown table of questions this type surfaces when matched. Each row: a question and the stakes tier at which it applies. + +```markdown +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| What evidence would disprove this? | baseline | +| Under what conditions does this NOT hold? | baseline | +| Who or what would disagree with this, and why? | elevated | +``` + +The server extracts questions and tiers. Stakes calibration (governed separately) determines which tiers are surfaced for a given mode. Without a `Stakes tier` column, all questions surface at every mode. + +Tier names are not fixed by this article. They are defined by the stakes calibration article. A KB may use `baseline / elevated / rigorous`, or `low / medium / high`, or any scheme — as long as the tiers used in type articles match the tiers defined in the calibration article. + +### Section: `## Prerequisite Overlays` + +A markdown table of prerequisites the server checks *in addition to* the base prerequisites. Each row: a check name, the test, and an actionable gap message. + +```markdown +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| alternatives-considered | input contains "alternative", "instead", "option", or "rejected" | "No alternatives mentioned — single-option {name} lacks rigor" | +| risk-acknowledged | input contains "risk", "cost", "downside", or "tradeoff" | "No risks or costs acknowledged" | +``` + +Checks should be mechanically testable (substring, keyword presence, simple regex). Gap messages should be actionable — `"Add why this was chosen"` is better than `"rationale missing"`. The `{name}` placeholder is substituted with the type's Name at runtime. + +Overlays add to base prerequisites; they do not replace them. An input missing both `evidence-cited` (a base check) and `alternatives-considered` (a proposal overlay) receives both gap messages. + +### Section: `## Suggested Reframings` + +A bulleted list of reframings the server may surface when this type matches. These are literal strings the model can offer back to the claimant. + +```markdown +## Suggested Reframings + +- Reframe as hypothesis: "We believe X because Y, and would reconsider if Z" +- Add optionality: "We're choosing X over Y because Z, reversible until W" +``` + +--- + +## Fallback Behavior — What Happens When No Type's Detection Matches + +When an input matches no type's detection patterns, the server uses a fallback type. The fallback is determined by governance, not hardcoded. + +A type declares itself the fallback via frontmatter: + +```yaml +--- +uri: klappy://odd/challenge-types/observation +fallback: true +--- +``` + +Resolution order: + +1. If exactly one type has `fallback: true`, that type is the fallback. +2. If multiple types declare `fallback: true`, the first discovered alphabetically by filename wins. +3. If no type declares `fallback: true`, the server uses the first type discovered alphabetically. + +The recommended convention is to mark **observation** as the fallback. An unclassified input is a statement without a strong directive — semantically, that is what an observation is. This convention aligns with the encode fallback for Observation (O). + +--- + +## Multi-Match Semantics — One Claim, Many Types + +Challenge types are **explicitly multi-match by design**. A single input can and often should match several types — this is different from encoding, where single-type matches are the common case. + +When multiple types match, the server: + +1. Runs every type's detection regex against the input. Order does not affect correctness. +2. Aggregates questions across all matched types. +3. Aggregates prerequisite overlays across all matched types on top of base prerequisites. +4. Aggregates reframings across all matched types. +5. Dedupes by string equality within each category. +6. Surfaces the matched-type names in the response so the operator can see which types contributed. + +The `Priority` field in `## Type Identity` affects display order in the response, not whether a type participates. Every matched type participates. + +If no type's detection matches, the single fallback type runs alone. + +--- + +## Base Prerequisites — Applied to Every Claim + +The base prerequisites article at `odd/challenge/base-prerequisites.md` defines universal checks that run regardless of which types matched. It uses the same `## Prerequisite Overlays` table structure as type articles, but its contents apply to every challenge call. + +Example: + +```markdown +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| evidence-cited | input contains "evidence", "data", "measured", or "observed" | "No evidence cited — claims without evidence are assumptions" | +| source-named | input contains a citation marker (URL, quoted phrase, "per X") | "No source named — verifiability is unclear" | +``` + +If the base prerequisites article is absent, only type-overlay prerequisites run. The server logs a warning but does not fail. + +--- + +## Normative Vocabulary — Governed Separately + +The normative vocabulary article at `odd/challenge/normative-vocabulary.md` defines the words whose presence in a retrieved canon quote signals a tension-bearing directive. This replaces the hardcoded `normativePatterns` array in the current implementation. + +Structure: + +```markdown +## Normative Vocabulary + +| Word | Directive type | +|---|---| +| MUST | requirement | +| MUST NOT | prohibition | +| SHOULD | recommendation | +| SHOULD NOT | discouragement | +| NEVER | prohibition | +| ALWAYS | requirement | +``` + +The server compiles these into tension-detection regexes applied to retrieved canon quotes. Adding domain-specific normative language (`SHALL` for a formal-contracts KB, `PROHIBITED` for compliance, `ANATHEMA` for a theological KB) becomes a canon edit, not a deploy. + +If this article is absent, the server uses a minimal built-in fallback of `MUST` / `MUST NOT` / `SHOULD` / `SHOULD NOT` only — degraded but functional. + +--- + +## Stakes Calibration — How Mode Maps to Challenge Depth + +The stakes calibration article at `odd/challenge/stakes-calibration.md` defines how the `mode` parameter maps to challenge depth. This is the single largest behavioral lever currently unavailable to the runtime — the `mode` parameter exists today but only biases retrieval scoring, not what is surfaced. + +Structure: + +```markdown +## Stakes Calibration + +| Mode | Question tiers surfaced | Prerequisite strictness | Reframings surfaced | +|---|---|---|---| +| exploration | baseline | optional (warn only) | first 1 | +| planning | baseline, elevated | required (gap messages) | all | +| execution | baseline, elevated, rigorous | required + source-named | all, plus block-until-addressed | +``` + +The server reads the row matching the caller's `mode` and filters the aggregated output accordingly. Without a mode parameter, the server defaults to `planning` with low confidence — matching current behavior. + +If this article is absent, every question, prerequisite, and reframing is surfaced regardless of mode — degraded but functional. Challenge without calibration is still challenge; it is just uniformly loud. + +--- + +## Domain Adaptation — The Defaults Are Software-Flavored On Purpose + +Oddkit ships with a default set of challenge-type articles (`strong-claim`, `proposal`, `assumption`, `observation`) and supporting articles (`base-prerequisites`, `normative-vocabulary`, `stakes-calibration`) calibrated for software-engineering work. This is deliberate, not neutral. A generic "universal human-language" default would serve every domain equally badly; a domain-specific default at least serves one domain well and signals to the others that adaptation is expected. + +A KB in another domain is expected to override some or all of these articles. The override mechanism is simple: write articles with the same extraction contract under the KB's own `odd/challenge-types/` and `odd/challenge/` paths. When oddkit is pointed at that KB via `canon_url`, the KB's articles are what gets discovered. + +What follows are three worked patterns. The point is not that these are the only three. The point is that the framework holds across all three, and a fourth domain's steward can build its taxonomy by identifying its own analogs. + +### Pattern A: Software Engineering (Oddkit's Defaults) + +**Claim types:** strong-claim, proposal, assumption, observation. + +**Normative vocabulary:** RFC 2119 — `MUST`, `SHALL`, `SHOULD`, `MUST NOT`, `REQUIRED`, `PROHIBITED`. + +**Stakes modes:** `exploration`, `planning`, `execution` — the canon epistemic modes. + +**Reviewer discipline simulated:** an experienced engineer reviewing a design doc. "What alternatives were considered? What's the cost of being wrong? Is this reversible?" + +**Failure mode without this discipline:** shipping irreversible architectural decisions without examining alternatives; proposals that are actually assumptions dressed up as plans. + +### Pattern B: Thought Leadership from Books + +**Claim types:** attribution, synthesis, application, interpretation, extension, critique. A thought leader working from books rarely makes proposals in the software-engineering sense. They attribute ideas to authors, synthesize across works, apply frameworks from one domain to another, interpret passages, extend arguments, and critique. The default software taxonomy barely touches these moves. + +**Normative vocabulary:** not RFC 2119. Instead: the author's own emphasized terms, phrases like "the central point is," "the key move is," "consider carefully that" — the markers of load-bearing claims in prose. Also directive language from the source material itself, when the domain has sacred or authoritative texts: `commanded`, `forbidden`, `instructed`, `admonished`. + +**Stakes modes:** likely not `exploration / planning / execution` at all. Closer to the lifecycle of reading-and-writing — `note-taking`, `drafting`, `teaching-material`, `publication-ready`. A pressure-test appropriate for raw reading notes is not appropriate for a published teaching. + +**Reviewer discipline simulated:** a careful peer reviewer or developmental editor. "Is this your reading or the author's own claim? Does the passage actually support the paraphrase? Have you engaged the strongest version of the argument? Whose interpretive tradition does your reading fit within?" + +**Failure mode without this discipline:** quote-mining, paraphrase overreach, strawmanning, convenient interpretation. These are the failure modes that distinguish serious thought leadership from glorified note-passing. + +**Example articles a thought leadership KB would write:** +- `odd/challenge-types/attribution.md` — detection on `"X argues,"`, `"according to X,"`, `"as X writes"`; prerequisites on direct-quote vs paraphrase, page-or-chapter reference, passage context +- `odd/challenge-types/application.md` — detection on `"applying X to Y,"`, `"this framework helps us see,"`, `"if we take X's approach"`; prerequisites on original-domain-named, cross-domain-fit-argued +- `odd/challenge-types/interpretation.md` — detection on `"what this means is,"`, `"the deeper point,"`, `"really saying"`; prerequisites on context-of-passage, interpretive-tradition-named + +### Pattern C: Comparative Architectural Writing + +**Claim types:** pattern-coinage, comparative-positioning, principle-extraction, retroactive-sense-making, predictive-architectural-claim. Architectural writing in the tech/systems domain makes a third distinct set of moves. It names patterns ("Vodka Architecture"), positions work against a landscape (viral repos, published papers, adjacent products), extracts principles from experience ("Use Only What Hurts"), retroactively makes sense of one's own trajectory ("what we built is E0007 realized"), and predicts where things are going ("TruthKit is where oddkit is going"). + +**Normative vocabulary:** architectural load-bearing terms — `invariant`, `forcing function`, `non-negotiable`, `always`, `never`, `the test is`, `the unlock is`, `only`, `pure`. These are exactly the terms where a writer locks a position and therefore where a careful reviewer should pressure-test. + +**Stakes modes:** not exploration/planning/execution. Closer to the writing lifecycle — `voice-dump`, `drafting`, `peer-review-ready`, `canon-tier-2`, `canon-tier-1`, `published-essay`. A voice-dump deserves almost no pressure. A canon-tier-1 claim deserves maximum pressure because it becomes load-bearing for future work. A published essay deserves hostile-reader simulation. + +**Reviewer discipline simulated:** a rigorous systems-thinking peer reviewer. "Is the coinage novel, or does the literature already name this? Is the comparison target fairly represented or strawmanned? Is the principle derived from one case or several? Does the retroactive sense-making fit the actual timeline? Has this survived a hostile reader?" + +**Failure mode without this discipline:** reinventing named patterns, strawmanning the comparison, overreaching from single cases, post-hoc rationalization of happy accidents, self-citation loops where every source shares a URL root. + +**Example articles a comparative-architectural-writing KB would write:** +- `odd/challenge-types/pattern-coinage.md` — detection on introducing a novel named pattern; prerequisites on prior-art-searched, novelty-argued, term-defined-precisely +- `odd/challenge-types/comparative-positioning.md` — detection on `"like X but,"`, `"unlike X,"`, `"existing approaches"`, `"the difference is"`; prerequisites on comparison-target-accurately-characterized, freshness-of-comparison-verified, strongest-version-engaged +- `odd/challenge-types/principle-extraction.md` — detection on quoted aphorisms, principle-naming language; prerequisites on derived-from-multiple-cases, counter-examples-considered, scope-named + +### How a KB Steward Builds Their Own Taxonomy + +A domain steward does not need to invent a taxonomy from scratch with a blank template. The fastest path is: + +1. **Point oddkit at the target KB** via `canon_url`. +2. **Read the meta governance doc** (this document) via `oddkit_get`. +3. **Gather three to five examples of the KB's own best work** — essays, papers, notes, decisions, whatever the KB produces. +4. **Ask an LLM (with oddkit access) to study the examples and name the claim patterns** the work actually makes and the failure modes a good reviewer would pressure-test. +5. **Draft type articles from the named patterns**, using this document as the extraction-contract template. The LLM can produce first drafts; the steward reviews and corrects the vocabulary. +6. **Iterate.** A question that lands wrong is a canon edit, not a code change. A new claim pattern noticed in later work is a new article. + +Two or three working sessions produce a taxonomy calibrated to the KB's actual practice. Every session after is content addition, type refinement, or lifecycle evolution. + +--- + +## Writing a Custom Challenge Type + +Custom types follow the identical extraction contract. A Bible translation KB adding a `translation-claim` type writes: + +``` +odd/challenge-types/translation-claim.md +``` + +With: + +- frontmatter tag `challenge-type` +- a `## Type Identity` table with `Slug: translation-claim` +- detection patterns (`"the Greek says", "the Hebrew means", "this renders", "consensus translations"`) +- challenge questions (`"What do consultant translators render?"`, `"What does the community check group say?"`, `"What does the back-translation reveal?"`) +- prerequisite overlays (`original-language-cited`, `community-checked`, `consultant-reviewed`) +- reframings (`"Reframe as exegetical claim: 'The text in context Y appears to teach X, consistent with translations A and B'"`) + +The server discovers the new type on the next challenge call and includes its contributions whenever its detection patterns match. No server change. No deploy. + +**Conventions for custom types:** + +- Slug should be kebab-case, unique within the KB +- Detection patterns should be specific enough not to false-match generic types +- Prerequisite overlays should be mechanically checkable +- Custom types are scoped to the KB that defines them — they do not affect other KBs + +--- + +## The Server's Extraction Contract + +At challenge time, the server: + +1. Searches for articles tagged `challenge-type`. Caches results keyed by `canonUrl`, matching the encode pattern from PR #96. +2. For each challenge-type article, extracts when present: + - Slug and Name from `## Type Identity` + - Trigger patterns from `## Detection Patterns`, compiled into regex + - Question/tier pairs from `## Challenge Questions` + - Overlay prerequisites from `## Prerequisite Overlays` + - Reframings from `## Suggested Reframings` + - Fallback flag from frontmatter +3. Fetches `odd/challenge/base-prerequisites.md` (if present) and extracts base prerequisites. +4. Fetches `odd/challenge/normative-vocabulary.md` (if present) and compiles tension-detection regex. +5. Fetches `odd/challenge/stakes-calibration.md` (if present) and builds the mode-to-filter table. +6. Runs every type's detection regex against the input; collects all matches. +7. Retrieves canon quotes for the input (existing BM25 retrieval path) and scans them against the compiled normative regex for tensions. +8. Aggregates questions, prerequisites, and reframings across matches and base; dedupes. +9. Filters by stakes calibration for the caller's mode. +10. Returns the response with matched-type names, aggregated questions, prerequisites with gap messages, reframings, retrieved canon tensions, and the extracted type definitions — teaching the model what governs the behavior. + +Missing sections and missing supporting articles degrade gracefully. A type without detection patterns can still be invoked explicitly. An absent stakes calibration article surfaces everything. An absent base prerequisites article means per-type overlays stand alone. Richer governance means richer challenge. + +--- + +## Cache Invalidation + +Per-canon caches — mirroring the encode fix in PR #96 — must be cleared in `runCleanupStorage`: + +- `cachedChallengeTypes` keyed by `canonUrl` +- `cachedBasePrerequisites` keyed by `canonUrl` +- `cachedNormativeVocabulary` keyed by `canonUrl` +- `cachedStakesCalibration` keyed by `canonUrl` + +Without this, governance doc edits require worker isolate recycle to take effect. That failure mode was identified and fixed during the encode refactor and must not be repeated. + +--- + +## Discoverability + +This article exists so that any search for "how to write challenge type," "custom challenge type," "add new challenge type," "challenge governance template," "challenge extension format," or "extend oddkit_challenge" surfaces this guide. + +--- + +## See Also + +- [How to Write an Encoding Type Governance Article](klappy://odd/encoding-types/how-to-write-encoding-types) — the parallel structure for encode; this article mirrors it and borrows its graceful-degradation semantics +- [Epistemic Challenge](klappy://canon/epistemic-challenge) — the operating constraints this tool operationalizes +- [Prompt Over Code](klappy://canon/principles/prompt-over-code) — the principle this entire mechanism implements +- [Vodka Architecture](klappy://canon/principles/vodka-architecture) — why the server stays thin +- [oddkit_challenge tool](oddkit://tools/challenge) — the tool this governance shapes diff --git a/odd/challenge-types/observation.md b/odd/challenge-types/observation.md new file mode 100644 index 00000000..93a57e4f --- /dev/null +++ b/odd/challenge-types/observation.md @@ -0,0 +1,71 @@ +--- +uri: klappy://odd/challenge-types/observation +title: "Challenge Type: Observation" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type", "observation"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge behavior for observation type and fallback routing" +fallback: true +status: active +--- + +# Challenge Type: Observation + +> A report of something seen or noticed, offered without strong claim or directive. Observations are the lightest-pressure claim type. They deserve sample-representativeness and context questions, not adversarial scrutiny. This is the fallback type — inputs that match no other type route here. + +--- + +## Summary — The Lightest-Pressure Type And The Fallback Destination + +This type fires when an input reports what was seen, noticed, or experienced — `noticed`, `observed`, `seems`, `appears`, `found that` — without making a strong claim or directive. Observations deserve light pressure, not adversarial scrutiny. The challenge asks about sample size, context, and whether the observation reflects a single data point or a pattern. Prerequisites check that sample size and context are noted. This is also the declared fallback type: inputs that match no other type's detection patterns route here by convention, because an unclassified input is effectively a statement without a strong directive. + +--- + +## Type Identity + +| Field | Value | +|---|---| +| Slug | observation | +| Name | Observation | +| Priority | low | + +--- + +## Detection Patterns + +``` +noticed, saw, observed, seems, appears, looks like, found that, turns out, happened, occurred +``` + +--- + +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| Is this observation based on a representative sample? | baseline | +| What context might change this observation? | baseline | +| Is this one data point, or a pattern? | elevated | +| Who else has observed this, and do they agree? | rigorous | + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| sample-size-noted | input contains "once", "twice", "every", "several", "many", "all", "few", numeric quantifier | "Sample size or frequency not noted — single observations and patterns deserve different weight" | +| context-noted | input contains "when", "where", "under", "during", "after", "in" | "Context of the observation not named — observations are context-bound until proven otherwise" | + +--- + +## Suggested Reframings + +- Reframe with context: "In situation X, I observed Y; outside X, I have not checked" +- Distinguish instance from pattern: "This happened once" vs "This happens consistently under condition C" diff --git a/odd/challenge-types/pattern-coinage.md b/odd/challenge-types/pattern-coinage.md new file mode 100644 index 00000000..8bd63045 --- /dev/null +++ b/odd/challenge-types/pattern-coinage.md @@ -0,0 +1,75 @@ +--- +uri: klappy://odd/challenge-types/pattern-coinage +title: "Challenge Type: Pattern Coinage" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type", "pattern-coinage", "writing-analysis"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge behavior for pattern-coinage claims — naming novel patterns or concepts" +status: active +--- + +# Challenge Type: Pattern Coinage + +> A move that names a novel pattern, concept, or architecture and asks the reader to adopt the name. Coinage is high-stakes because a successful coinage becomes vocabulary others inherit; a failed coinage reinvents what another domain already named or promotes noise into canon. Pressure here is about novelty, precision, and survival against hostile readers. + +--- + +## Summary — Naming Moves Are High-Stakes And Deserve Prior-Art Discipline + +This type fires when an input introduces a novel named pattern — `coining`, `introduce the term`, `i call this`, `let's call this`, `the pattern is called`. Coinage is load-bearing: a successful term becomes vocabulary others inherit, and a failed term either reinvents an existing name or promotes noise into canon. The challenge asks whether the term already exists in another field, whether it is precise enough that two careful readers would apply it the same way, what the term excludes, whether a hostile reader would accept the need for a new name, and whether the coinage will hold up over time. Prerequisites check that prior art was searched, that the term is defined, that novelty is argued, and that scope is named. The goal is to force coinage to earn its place in vocabulary rather than default into it. + +--- + +## Type Identity + +| Field | Value | +|---|---| +| Slug | pattern-coinage | +| Name | Pattern Coinage | +| Priority | high | + +--- + +## Detection Patterns + +``` +coining, coined, introduce the term, i call this, what we're seeing here is, the pattern is called, name this, what i mean by, the term i use, let's call this, this is what i mean by +``` + +--- + +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| Does this already have a name in another field or literature? | baseline | +| Is the term precise enough that two careful readers would apply it the same way? | baseline | +| What does this term exclude? A term that means everything means nothing. | elevated | +| Would a hostile reader understand why this needs its own name rather than an existing one? | elevated | +| Would this coinage still make sense five years from now, or is it tied to a transient context? | rigorous | +| What happens if the coinage spreads and is misapplied? What's the containment? | rigorous | + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| prior-art-searched | input contains "not the same as", "distinct from", "unlike", "exists as" or equivalent prior-art acknowledgment | "No prior art acknowledged — is this genuinely new, or renaming something that already exists?" | +| term-defined-precisely | input contains "means", "refers to", "defined as", "by this i mean" | "Term not precisely defined — coinages without definitions become vibes" | +| novelty-argued | input contains "new", "novel", "not previously", "first", "no existing", or equivalent | "Novelty not argued — why is a new term needed rather than an existing one?" | +| scope-named | input contains "applies to", "in the context of", "when", "for cases where" | "Scope of the coinage not named — what does this term cover and what does it not?" | + +--- + +## Suggested Reframings + +- Reframe with prior-art check: "I'm calling this X; the closest prior art I found is Y, and the distinction is Z" +- Reframe with narrow scope: "In the context of A, I name the pattern X; I'm not claiming X holds outside A" +- Reframe as working vocabulary: "I'll use X as shorthand here; if a better name exists or emerges, I'll adopt it" diff --git a/odd/challenge-types/principle-extraction.md b/odd/challenge-types/principle-extraction.md new file mode 100644 index 00000000..bd046fbf --- /dev/null +++ b/odd/challenge-types/principle-extraction.md @@ -0,0 +1,76 @@ +--- +uri: klappy://odd/challenge-types/principle-extraction +title: "Challenge Type: Principle Extraction" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type", "principle-extraction", "writing-analysis"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge behavior for principle-extraction claims — elevating a heuristic from experience into canon" +status: active +--- + +# Principle Extraction + +> A move that lifts a heuristic out of specific experience and elevates it to a principle — a named rule, an aphorism, an invariant the writer intends others to adopt. The failure mode is over-generalization: treating a pattern that held in one or two cases as universal. Principles are load-bearing once canonized; pressure here is about sample size, counter-examples, and scope. + +--- + +## Summary — Principles Are Load-Bearing And Demand Proportional Sample Size + +This type fires when an input elevates a heuristic to a principle — `the principle is`, `the rule is`, `the test is`, `the lesson is`, `invariant`, `forcing function`, `non-negotiable`. Principles become canon once stated, so the challenge asks how many cases the principle rests on, what counter-example was considered and rejected, what the scope is, and under what conditions the principle would be retracted. Prerequisites check that the principle is anchored in multiple cases, that counter-examples were considered, that scope is named, and that a retraction condition exists. The governing question is whether the principle is surviving because it is true or because its failure mode has not yet been encountered. The goal is to prevent single-case patterns from masquerading as universal rules. + +--- + +## Type Identity + +| Field | Value | +|---|---| +| Slug | principle-extraction | +| Name | Principle Extraction | +| Priority | high | + +--- + +## Detection Patterns + +``` +the principle is, the rule is, the test is, the lesson is, what this teaches us, the takeaway is, always, never, the real cost, the whole point, the key insight, invariant, forcing function, non-negotiable, only what, pure +``` + +--- + +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| How many distinct cases does this principle rest on? One, two, many? | baseline | +| What's a counter-example you considered and rejected? | baseline | +| What's the scope? Does this hold for a specific domain, or are you making a universal claim? | elevated | +| Under what conditions would you retract this principle? | elevated | +| Is this principle surviving because it's true, or because you haven't encountered its failure yet? | rigorous | +| If a respected peer challenged this principle, what's the strongest defense — and where does the defense crack? | rigorous | + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| derived-from-multiple-cases | input contains "case", "example", "instance", "time", or plural enumerations | "Principle not anchored to multiple cases — one observation is a pattern; one pattern is a principle only with more cases" | +| counter-examples-considered | input contains "except", "unless", "fails when", "not when", "counter-example", "doesn't hold" | "No counter-examples considered — a principle without acknowledged limits is a bias in formalwear" | +| scope-named | input contains "in", "for", "when", "applies to", "within", "in the context of" | "Scope not named — universal claims in particular domains are the classic overreach" | +| retraction-condition | input contains "would retract", "would revise", "if X then", "change my view", "reconsider" | "No retraction condition named — a principle that cannot be falsified is a preference, not a principle" | + +--- + +## Suggested Reframings + +- Reframe as scoped rule: "In the context of C, the pattern I've seen is X; outside C, I don't know" +- Reframe with sample: "Across cases A, B, and D, I observe X; I treat this as a principle within that class" +- Reframe as working heuristic: "My current working rule is X; I'd revise it if I encountered W" +- Reframe as hypothesis under test: "I'm treating X as a principle; the cases that would challenge it are A, B, C, and I'll watch for them" diff --git a/odd/challenge-types/proposal.md b/odd/challenge-types/proposal.md new file mode 100644 index 00000000..8cff0398 --- /dev/null +++ b/odd/challenge-types/proposal.md @@ -0,0 +1,76 @@ +--- +uri: klappy://odd/challenge-types/proposal +title: "Challenge Type: Proposal" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type", "proposal"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, canon/principles/irreversibility-is-the-real-cost.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge behavior for proposal type" +status: active +--- + +# Challenge Type: Proposal + +> A future-oriented plan or recommendation. Proposals carry their weight in the options they close. Pressure here is about alternatives considered, reversibility, and what would need to be true for the proposal to fail. + +--- + +## Summary — Pressure Scales With Irreversibility + +This type fires when an input projects a plan or recommendation — `should`, `plan to`, `propose`, `recommend`, `will`, `let's`. Proposals close options, so the challenge asks about the options that were considered and rejected, the cost of being wrong, the reversibility of the commitment, and what would need to be true for the proposal to fail. Prerequisites check that alternatives were explored, risks were acknowledged, reversibility was addressed, and success criteria were named. Irreversible proposals get the most scrutiny; experimental proposals get the least. The goal is to move proposals from unexamined intentions to defensible choices. + +--- + +## Type Identity + +| Field | Value | +|---|---| +| Slug | proposal | +| Name | Proposal | +| Priority | high | + +--- + +## Detection Patterns + +``` +should, plan to, going to, will, propose, suggest, recommend, let's, want to, intend to, aim to, we'll, i'll +``` + +--- + +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| What's the cost of being wrong here? | baseline | +| What alternatives were considered and rejected, and why? | baseline | +| What would need to be true for this to fail? | elevated | +| Is this reversible? If not, what's the point of no return? | elevated | +| Who benefits if this succeeds? Who bears the cost if it fails? | rigorous | +| What is the smallest version of this we could test first? | rigorous | + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| alternatives-considered | input contains "alternative", "instead", "option", "considered", "rejected" | "No alternatives mentioned — single-option proposals lack rigor" | +| risk-acknowledged | input contains "risk", "cost", "downside", "tradeoff", "expense" | "No risks or costs acknowledged" | +| reversibility-named | input contains "reversible", "reversal", "undo", "rollback", "temporary", "trial" | "Reversibility not addressed — irreversible proposals deserve proportional scrutiny" | +| success-criteria | input contains "success", "works", "done when", "measured by" | "No success criteria named — how will you know if this worked?" | + +--- + +## Suggested Reframings + +- Add optionality: "We're choosing X over Y because Z, reversible until W" +- Reframe with stakes: "This is a one-way door; we are committing because A, B, C" +- Reframe as experiment: "We'll try X for duration D; success looks like M; we'll abandon if N" +- Address canon tensions directly before proceeding diff --git a/odd/challenge-types/strong-claim.md b/odd/challenge-types/strong-claim.md new file mode 100644 index 00000000..d2cf5f2f --- /dev/null +++ b/odd/challenge-types/strong-claim.md @@ -0,0 +1,73 @@ +--- +uri: klappy://odd/challenge-types/strong-claim +title: "Challenge Type: Strong Claim" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "challenge-type", "strong-claim"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge behavior for strong-claim type" +status: active +--- + +# Challenge Type: Strong Claim + +> A definitive statement offered without qualification. Strong claims foreclose doubt and deserve the most pressure, because their cost of being wrong is proportional to the confidence they project. + +--- + +## Summary — Maximum Pressure For Claims Without Hedges + +This type fires when an input uses definitive vocabulary — `must`, `always`, `never`, `guaranteed`, `impossible`, `certain` — that forecloses doubt and invites the reader to adopt the conclusion without qualification. The pressure this type applies scales with the absoluteness of the language: questions ask what evidence would disprove the claim, what conditions break it, and who would disagree. Prerequisites check that strength of claim matches strength of evidence, that scope is named, and that some disconfirmer is acknowledged. The goal is not to weaken confident claims but to ensure their confidence is load-bearing rather than rhetorical. + +--- + +## Type Identity + +| Field | Value | +|---|---| +| Slug | strong-claim | +| Name | Strong Claim | +| Priority | high | + +--- + +## Detection Patterns + +``` +must, always, never, guaranteed, impossible, certain, definitely, obviously, clearly, undeniably, without question, proven, fact +``` + +--- + +## Challenge Questions + +| Question | Stakes tier | +|---|---| +| What evidence would disprove this? | baseline | +| Under what conditions does this NOT hold? | baseline | +| Who or what would disagree with this, and why? | elevated | +| What is the strongest version of the opposing view? | elevated | +| If this were wrong, how would you know? | rigorous | + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| evidence-for-strength | input contains "because", "data", "measured", "studies", "evidence" | "Strong claim lacks explicit evidence — strength of claim should match strength of evidence" | +| scope-named | input contains "in", "when", "for", "under" scoping language | "Strong claim has no scope — universal claims are almost always false at the edges" | +| disconfirmer-acknowledged | input contains "unless", "except", "assuming", "given" | "No disconfirmer acknowledged — what would change this claim?" | + +--- + +## Suggested Reframings + +- Reframe as hypothesis: "We believe X because Y, and would reconsider if Z" +- Reframe with scope: "In contexts A and B, X holds; outside those, we have not tested" +- Reframe as falsifiable: "If we observed W, we would retract this claim" diff --git a/odd/challenge/base-prerequisites.md b/odd/challenge/base-prerequisites.md new file mode 100644 index 00000000..e93daece --- /dev/null +++ b/odd/challenge/base-prerequisites.md @@ -0,0 +1,45 @@ +--- +uri: klappy://odd/challenge/base-prerequisites +title: "Challenge Base Prerequisites" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "base-prerequisites", "universal"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge universal prerequisite checks applied regardless of matched type" +status: active +--- + +# Challenge Base Prerequisites + +> Universal prerequisites applied to every claim, regardless of which challenge types matched. Base prerequisites are checks so fundamental that their absence is a gap in any claim — evidence, sources, and signaled confidence. Type overlays add domain-specific checks on top of these; they do not replace them. + +--- + +## Summary — Three Universal Checks That Run On Every Challenge + +This article governs three prerequisite checks that run on every oddkit_challenge invocation regardless of which types matched: evidence cited, source named, and confidence signaled. Type overlays add domain-specific checks in addition to these; they do not replace them. The check vocabularies are deliberately broad (`saw`, `heard`, `read`, `observed`, `example`, `case`, `data`) because evidence has different surface forms across domains — measurements in engineering, quotes in writing, testimony in ministry. Missing base prerequisites produce explicit gap messages in the response. If this article is absent, the server proceeds with type-overlay prerequisites only and logs a warning. + +--- + +## Prerequisite Overlays + +| Prerequisite | Check | Gap message if missing | +|---|---|---| +| evidence-cited | input contains "evidence", "saw", "observed", "noticed", "heard", "read", "example", "case", "instance", "data", "measured" | "No evidence cited — a claim without grounding in something observed, read, or experienced is just a belief" | +| source-named | input contains "per", "according to", "from", "source:", a URL, a quoted phrase, a proper-noun attribution, "who said", "where i read" | "No source named — verifiability is unclear; where does this come from?" | +| confidence-signaled | input contains "believe", "think", "know", "suspect", "certain", "tentative", "confident", "unsure", or an explicit confidence marker | "Confidence level not signaled — is this a guess, a working belief, or an established fact?" | + +--- + +## Notes + +These prerequisites run on every challenge invocation, added to any overlays contributed by matched types. The server dedupes gap messages across base and overlays before surfacing. + +The check vocabularies here are deliberately broad — "saw," "observed," "noticed," "heard," "read," "example," "case" — because evidence has different surface forms in different domains. In software engineering, evidence looks like measurements, tests, and studies. In thought leadership, evidence looks like quotes, passages, and case studies. In ministry, evidence looks like what was witnessed or heard in the community. The base article accepts all of these; domain-specific articles narrow further if needed. + +If this article is absent, the server proceeds with type overlays only and logs a warning. diff --git a/odd/challenge/normative-vocabulary.md b/odd/challenge/normative-vocabulary.md new file mode 100644 index 00000000..bff50feb --- /dev/null +++ b/odd/challenge/normative-vocabulary.md @@ -0,0 +1,80 @@ +--- +uri: klappy://odd/challenge/normative-vocabulary +title: "Challenge Normative Vocabulary" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "normative-vocabulary", "tensions"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/constraints/epistemic-challenge.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge tension detection — the vocabulary that signals a directive or load-bearing claim in retrieved canon quotes" +status: active +--- + +# Challenge Normative Vocabulary + +> The words whose presence in a retrieved canon quote signals a tension-bearing directive or a load-bearing architectural claim. When canon says MUST, SHOULD, NEVER, or names something as an invariant, forcing function, or the real cost, the tool should surface that quote as a potential tension with the input being challenged. This vocabulary is governed here, not hardcoded in the server. + +--- + +## Summary — Two Vocabularies, One Table, Case-Sensitive RFC Plus Case-Insensitive Architectural + +This article governs the vocabulary oddkit_challenge uses to detect tension-bearing language in retrieved canon quotes. Two sets coexist in one table: RFC 2119 directive language matched case-sensitively (UPPERCASE `MUST`, `SHALL`, `NEVER`, `REQUIRED`, `PROHIBITED`) because capitalization signals intentional directive weight, and architectural-writing load-bearing phrases matched case-insensitively (`invariant`, `forcing function`, `non-negotiable`, `the test is`, `the unlock is`, `pure`) because these phrases carry weight regardless of case. The server compiles these into regex applied to retrieved canon quote previews. Adding domain-specific vocabulary is a canon edit, not a code change — a compliance KB adds `VIOLATES`, a theological KB adds `ANATHEMA`, a contracts KB adds `SHALL CAUSE`. If this article is absent, the server falls back to a minimal built-in set. + +--- + +## Normative Vocabulary + +### Directive Language (RFC 2119 and Related) + +| Word | Directive type | +|---|---| +| MUST | requirement | +| MUST NOT | prohibition | +| SHOULD | recommendation | +| SHOULD NOT | discouragement | +| SHALL | requirement | +| SHALL NOT | prohibition | +| REQUIRED | requirement | +| PROHIBITED | prohibition | +| NEVER | prohibition | +| ALWAYS | requirement | +| CANNOT | prohibition | +| DO NOT | prohibition | + +### Architectural Writing Load-Bearing Terms + +| Phrase | Directive type | +|---|---| +| invariant | invariant-claim | +| forcing function | forcing-function-claim | +| non-negotiable | non-negotiable-claim | +| the test is | test-claim | +| the unlock is | unlock-claim | +| the real cost | cost-claim | +| the whole point | purpose-claim | +| the key insight | insight-claim | +| only what hurts | scope-limit-claim | +| pure | purity-claim | + +--- + +## Notes + +The server compiles these tables into a case-sensitive word-boundary regex applied to retrieved canon quote previews. A match produces a tension entry with the directive type, citation, and quote. + +Case sensitivity is intentional for the RFC 2119 row — UPPERCASE use signals intentional directive language, while lowercase `must` in prose rarely carries the same weight. The architectural-writing phrases are mixed-case and matched case-insensitively; `"the test is"` in a draft essay carries load-bearing weight regardless of capitalization. + +The two sections coexist in one table because klappy.dev canon contains both software-engineering governance (where RFC 2119 dominates) and architectural writing (where the load-bearing phrases dominate). A KB focused on a single domain can prune to one section, and a KB in a third domain adds its own. + +Adding domain-specific vocabulary is a canon edit: + +- A formal-contracts KB might add `SHALL CAUSE`, `WARRANTS`, `COVENANTS` +- A compliance KB might add `VIOLATES`, `BREACHES`, `NON-COMPLIANT` +- A theological KB might add `ANATHEMA`, `HERETICAL`, `commanded`, `forbidden` +- A thought-leadership-from-books KB might add `the central point`, `the key move`, `load-bearing` + +If this article is absent, the server falls back to a minimal built-in set of `MUST`, `MUST NOT`, `SHOULD`, `SHOULD NOT` only. diff --git a/odd/challenge/stakes-calibration.md b/odd/challenge/stakes-calibration.md new file mode 100644 index 00000000..db2bbce3 --- /dev/null +++ b/odd/challenge/stakes-calibration.md @@ -0,0 +1,101 @@ +--- +uri: klappy://odd/challenge/stakes-calibration +title: "Challenge Stakes Calibration" +audience: docs +exposure: nav +tier: 2 +voice: neutral +stability: semi_stable +tags: ["odd", "oddkit", "challenge", "stakes-calibration", "proportional-pressure", "epistemic-modes"] +epoch: E0008 +date: 2026-04-16 +derives_from: "canon/definitions/epistemic-modes.md, canon/constraints/epistemic-challenge.md, canon/principles/irreversibility-is-the-real-cost.md, odd/challenge-types/how-to-write-challenge-types.md" +governs: "oddkit_challenge mode-to-depth mapping — which question tiers, prerequisite strictness, and reframings surface for a given caller mode" +status: active +--- + +# Challenge Stakes Calibration + +> Proportional pressure is a canon commitment. A voice-dump deserves different treatment than a published essay. A one-line experiment deserves different treatment than an irreversible production deploy. This article defines the mapping from epistemic mode to challenge depth — which question tiers surface, how strict prerequisite checking is, and how many reframings are offered. The `mode` parameter on oddkit_challenge reads this table to filter output. + +--- + +## Summary — Nine Modes Across Two Lifecycles, With Voice-Dump As Suppression Invariant + +This article governs how the `mode` parameter on oddkit_challenge filters output. Nine modes coexist: software-engineering modes (`exploration`, `planning`, `execution`) from canon epistemic-modes, and architectural-writing modes (`voice-dump`, `drafting`, `peer-review-ready`, `canon-tier-2`, `canon-tier-1`, `published-essay`) from the klappy.dev writing lifecycle. Each mode names which question tiers to surface (baseline, elevated, rigorous), how strict prerequisite checking is (optional, required, required-plus-source-named), and whether reframings surface (none, first one, all, all-plus-block-until-addressed). Voice-dump mode suppresses all challenge output as a structural invariant — some modes exist for getting thoughts out of the head, and pressure-testing at that stage damages the mode. Default is `planning` when no mode is supplied. If this article is absent, the server surfaces everything at every mode — challenge without calibration is still challenge, just uniformly loud. + +--- + +## Stakes Calibration + +| Mode | Question tiers surfaced | Prerequisite strictness | Reframings surfaced | +|---|---|---|---| +| exploration | baseline | optional (warn only) | first 1 | +| planning | baseline, elevated | required (gap messages) | all | +| execution | baseline, elevated, rigorous | required plus source-named | all, plus block-until-addressed | +| voice-dump | none (suppress all challenge) | optional (warn only) | none | +| drafting | baseline | optional (warn only) | first 1 | +| peer-review-ready | baseline, elevated | required (gap messages) | all | +| canon-tier-2 | baseline, elevated | required (gap messages) | all | +| canon-tier-1 | baseline, elevated, rigorous | required plus source-named | all | +| published-essay | baseline, elevated, rigorous | required plus source-named | all, plus block-until-addressed | + +--- + +## Mode Families + +### Software Engineering Modes (canon epistemic-modes) + +- **exploration** — widening the space of ideas. Heavy challenge here kills good ideas before they mature. +- **planning** — narrowing to a path. This is where tradeoffs get examined. +- **execution** — committing. The cost of being wrong is highest; scrutiny is highest. + +### Architectural Writing Modes (klappy.dev writing lifecycle) + +- **voice-dump** — raw thinking, transcribed. Challenge here is counterproductive; suppress and let thought flow. +- **drafting** — shaping a voice-dump into an argument. Light challenge only. +- **peer-review-ready** — the draft is leaving the author's head and entering review. Baseline and elevated checks apply. +- **canon-tier-2** — content is being added to canon but not as a load-bearing foundation. Same treatment as peer-review-ready. +- **canon-tier-1** — content will be load-bearing for future work. Rigorous treatment including source-named strictness. +- **published-essay** — the work is going to an external audience, including hostile readers. Maximum scrutiny, including reframings as block-until-addressed so the author explicitly chooses to publish despite any open gaps. + +--- + +## Tier Vocabulary + +This table uses three tiers: `baseline`, `elevated`, `rigorous`. Challenge-type articles assign each question to a tier in their `## Challenge Questions` table. Tier names are coordinated across this article and every type article — if this article changes the tier names, type articles must follow. + +The mode `voice-dump` surfaces `none` — the tool suppresses all challenge questions regardless of which types matched. This is a feature: some modes are for getting thoughts out of the head, and pressure-testing at that stage damages the very function of the mode. + +--- + +## Strictness Modes + +- **optional (warn only):** Missing prerequisites produce advisory notes; the challenge does not flag them as blocking. +- **required (gap messages):** Missing prerequisites produce explicit gap messages in the response. +- **required plus source-named:** All required gaps surface, plus the `source-named` check is escalated from advisory to blocking. When stakes are high, sources matter. + +--- + +## Reframing Surfacing + +- **none:** No reframings surface. Used for voice-dump mode. +- **first 1:** Only the first reframing from each matched type is surfaced. Keeps exploration and drafting unhurried. +- **all:** All reframings from all matched types surface, deduped. +- **all, plus block-until-addressed:** All reframings surface, and the response includes a posture marker indicating the claim should not proceed until reframings are addressed or explicitly declined. + +--- + +## Default Behavior + +When the `mode` parameter is not provided, the server defaults to `planning` with a low-confidence marker. This matches existing behavior and keeps calibration safe-by-default: neither too loose (exploration) nor too strict (execution) when the caller hasn't said which mode they're in. + +If this article is absent entirely, the server surfaces every question at every mode, runs prerequisites as advisory, and surfaces all reframings. Challenge without calibration is still challenge — it is just uniformly loud. + +--- + +## Notes + +The software-engineering modes and the architectural-writing modes coexist in one table because klappy.dev canon supports both kinds of work. A KB with a single-domain workflow can prune to its own set. + +A KB whose workflow uses different lifecycle stages (`ideation / design / build / ship` for product development; `note-taking / drafting / teaching-material` for thought leadership from books; `drafting / community-check / consultant-check / publication` for Bible translation) can rename modes here — the server treats mode names as strings matched against the `mode` parameter.