From 7d1e844a8622d30797db2e2bf93edd0ab6a8316b Mon Sep 17 00:00:00 2001 From: "Claude (Klappy session)" Date: Wed, 29 Apr 2026 13:08:45 +0000 Subject: [PATCH] canon: add Search-Corpus Boundary section to core-governance-baseline (E0008.5) Establishes the contract for what the search corpus indexes when knowledge_base_url is set. Defaults to overlay + required-baseline; opt-in to merged via include_full_baseline: true. Operationalizes klappy://canon/principles/scoped-truth and closes the gap named in klappy/ptxprint-mcp's oddkit-kb-isolation-feature-request handoff: project KBs were having their own canon outranked in BM25 by the unrelated baseline content of co-located repos. Companion code change ships in klappy/oddkit as a separate PR that cites this section. CHANGELOG bumps to 0.36.0 (Epoch 8.5). --- canon/CHANGELOG.md | 12 +++++ canon/constraints/core-governance-baseline.md | 45 +++++++++++++++++++ 2 files changed, 57 insertions(+) diff --git a/canon/CHANGELOG.md b/canon/CHANGELOG.md index 7ff6f8fd..14fbb3fe 100644 --- a/canon/CHANGELOG.md +++ b/canon/CHANGELOG.md @@ -18,6 +18,18 @@ This changelog tracks changes to the **Canon pack** as a whole. The Canon uses **pack-level versioning** (one version number) rather than per-file versioning. Per-file versions are intentionally omitted to reduce ceremony and prevent metadata rot. +## 0.36.0 — 2026-04-29 + +**Search-Corpus Boundary — Project-KB Visibility (Epoch 8.5)** + +One canon section ships ahead of the corresponding `klappy/oddkit` code change. Establishes E0008.5 (Search-Corpus Boundary) as a sub-epoch of E0008. Defines what the search corpus indexes when `knowledge_base_url` is set, restricts the default to overlay + required-baseline, and names `include_full_baseline: true` as the explicit opt-in to merged. The companion code change in `klappy/oddkit` follows in a separate PR that cites this section. + +The behavior the next code release targets is project-KB visibility: a project's own canon stops being outranked in BM25 by hundreds of unrelated baseline documents from a co-located repo. The measurements driving this change are recorded in `klappy/ptxprint-mcp` at `canon/handoffs/oddkit-kb-isolation-feature-request.md`. Reproduction in the present session against `oddkit` v0.26.0 confirmed those numbers (catalog total 586 with 21 overlay vs 566 baseline; klappy.dev-only queries surfaced klappy.dev hits ranked above the project's own canon). + +### Changed — Canon + +- **Core Governance Baseline — added §"Search-Corpus Boundary — Scoped Retrieval When `knowledge_base_url` Is Set"** (`canon/constraints/core-governance-baseline.md`) — Tier 1, neutral, semi-stable. New section between §"What Ships in the Baseline" and §"Build-Time Invariants" with six sub-sections: why scoping defaults to on, opt-in to merged, scope-vs-resolution boundary, affected-tools table, cache-key rule, telemetry fields. Cross-references `klappy://canon/principles/scoped-truth` (the principle this operationalizes), `klappy://canon/principles/dry-canon-says-it-once` (which the prior unscoped behavior violated), and Runtime Invariant #5 of this same document (which scoping must preserve). + ## 0.35.0 — 2026-04-20 **Post-4.7 Adaptation Suite — Operator-Attention Calibration (Epoch 8.4)** diff --git a/canon/constraints/core-governance-baseline.md b/canon/constraints/core-governance-baseline.md index 05c725a7..b43a1f3f 100644 --- a/canon/constraints/core-governance-baseline.md +++ b/canon/constraints/core-governance-baseline.md @@ -166,6 +166,51 @@ The split test: if a tool cannot return a coherent response without the file, it --- +## Search-Corpus Boundary — Scoped Retrieval When `knowledge_base_url` Is Set + +The split between *required-baseline* (§"Required in Baseline") and *canon-only* (§"Canon-Only (Never Bundled)") classifies what the worker bundles. It also classifies what the **search corpus** indexes when a project KB is set. + +When `knowledge_base_url` is set, the search corpus default is **overlay + required-baseline only** — not overlay + the entire baseline repo. The required-baseline files are the floor every tool needs; the canon-only files (`writings/`, `apocrypha/`, `odd/ledger/`, encoding-types, challenge-types, gate variants) belong only to the project that authored them. Indexing them into a third-party project KB drowns the project's own canon in unrelated noise — the failure shape `klappy://canon/principles/scoped-truth` names as the anti-pattern of unscoped governance. + +### Why Scoping Defaults to On + +A project KB exists because the project has its own canon. The agent searches it because the project's canon is the right answer to the project's questions. Merging the entire baseline into that search corpus inverts the design: in measured probes against `klappy/ptxprint-mcp`, the 566 baseline docs outranked the project's 21 canon docs in BM25 for queries the project's canon was authored to answer. The project KB's content was present and correct; it was simply outvoted. + +Scoping is a default, not a hard wall. Required-baseline still travels with every project KB — `axioms.md`, `orientation.md`, `definition-of-done.md`, `writing-canon.md`, `telemetry-governance.md`, and `stakes-calibration.md` are present in the search index for every consumer. Tools that depend on those files (orient, challenge, validate, preflight, telemetry_policy) keep working unchanged. What stops surfacing in scoped mode is the broader baseline: another project's writings, another project's session ledgers, another project's apocrypha. Those were never required for tool function — this document already classifies them as canon-only — they just happened to be inside the same baseline repo and to land in the search index by accident of co-location. + +### Opt-In to Merged + +Callers who genuinely want the merged corpus pass `include_full_baseline: true` on the relevant action. This restores the prior behavior: overlay + full baseline indexed together, with arbitration favoring overlay on path/URI conflicts. Use cases include a project that intentionally wants to surface cross-domain hits, a debugging session reproducing a prior result, and the default-KB consumer where there is no overlay to scope to (in which case the parameter is a no-op). + +When `knowledge_base_url` is unset, default behavior is unchanged: the baseline is the canon, `include_full_baseline` defaults to `true`, and there is nothing to scope. + +### Scope Applies Only to the Search Index — Not to Per-File Resolution + +The boundary defined here applies to the **search corpus** — the set of documents the worker BM25-indexes for ranked retrieval. It does **not** override the per-required-file resolution stack defined in §"The Resolution Stack." `oddkit_get` for a required-baseline URI still walks live-canon → bundled-baseline → fail-loud as documented. `oddkit_orient`, `oddkit_challenge`, and `oddkit_validate` still read their required governance files via the resolver, not the search index. The fact that `writings/the-intern.md` is excluded from a scoped search corpus does not mean it disappears from `oddkit_get`'s resolution surface; it means it does not compete for ranking slots against the project KB's own canon. + +This separation is what keeps Runtime Invariant #5 (`baseline path is never user-configurable`) intact. The baseline floor — the files that must always be available regardless of caller — is unchanged. Only the search corpus, which is a derived surface assembled per-call from canon and baseline together, becomes scope-sensitive. + +### Affected Tools + +| Tool | Default when `knowledge_base_url` is set | Accepts `include_full_baseline`? | +|---|---|---| +| `oddkit_search` | Overlay + required-baseline | Yes | +| `oddkit_catalog` | Overlay + required-baseline (counts and category aggregations scoped accordingly) | Yes | +| `oddkit_preflight` | Overlay + required-baseline | Yes | +| `oddkit_orient` | Unchanged — governance reads, not search index | No | +| `oddkit_get` | Unchanged — per-file resolution stack | No | +| `oddkit_challenge`, `oddkit_validate`, `oddkit_gate`, `oddkit_encode` | Unchanged — governance reads only | No | + +### Cache Key Includes Scope + +The compiled search index is content-addressed by `(baselineSha, knowledgeBaseSha, scope)`. A scoped index and a merged index against the same KB have distinct cache keys; neither poisons the other. + +### Telemetry + +The telemetry envelope adds `search_scope`, `overlay_doc_count`, and `baseline_doc_count` on ranked actions. The maintainer can detect (a) whether scoped is the dominant default in the wild, (b) whether `include_full_baseline=true` is being adopted intentionally, and (c) whether any consumer is silently capturing baseline content into their search corpus — the failure shape §"Failure Modes OF This Contract" already names. + +--- + ## Build-Time Invariants The worker build process must enforce these invariants. A build that violates any of them fails before produce.