Skip to content

canon: Search-Corpus Boundary in core-governance-baseline (E0008.5)#155

Merged
klappy merged 1 commit into
mainfrom
canon/search-corpus-boundary-e0008-5
Apr 29, 2026
Merged

canon: Search-Corpus Boundary in core-governance-baseline (E0008.5)#155
klappy merged 1 commit into
mainfrom
canon/search-corpus-boundary-e0008-5

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 29, 2026

Summary

Adds a Search-Corpus Boundary section to canon/constraints/core-governance-baseline.md (tier 1) defining what the worker BM25-indexes when knowledge_base_url is set. Default becomes overlay + required-baseline; the prior merged behavior is opt-in via include_full_baseline: true.

This is the canon-side of E0008.5 (Search-Corpus Boundary, Project-KB Visibility). The companion code change in klappy/oddkit follows in a separate PR that cites this section. Per canon/constraints/governance-change-discipline.md, canon precedes code.

Why

Reproduced today against oddkit v0.26.0 with knowledge_base_url=https://github.com/klappy/ptxprint-mcp:

  • oddkit_catalog: total 586 (overlay 21, baseline 566) — i.e., 96.4% of the indexed corpus came from klappy.dev, not the project KB
  • oddkit_search for a klappy.dev-only query (release validation gate Bugbot Sonnet validator): top hit was canon/constraints/release-validation-gate.md from klappy.dev ranked above every project doc, labeled source: "baseline"
  • Trace showed the worker fetching https://github.com/klappy/klappy.dev/archive/main.zip (19 MB) during a search keyed to ptxprint-mcp

These numbers match the prior measurement session captured in klappy/ptxprint-mcp at canon/handoffs/oddkit-kb-isolation-feature-request.md (584/19/566 then; 586/21/566 now — overlay grew by two docs, contamination unchanged). That handoff is the design doc; this PR codifies the contract its Option C names.

The failure shape is the one klappy://canon/principles/scoped-truth already names: a single knowledge base serving every context, where unrelated domains compete for ranking slots and the project KB's own canon gets outvoted.

What changed

  • canon/constraints/core-governance-baseline.md — new H2 section between §"What Ships in the Baseline" and §"Build-Time Invariants" with six sub-sections: why scoping defaults to on, opt-in to merged, scope-vs-resolution boundary, affected-tools table, cache-key rule, telemetry fields.
  • canon/CHANGELOG.md — 0.36.0 entry tagged Epoch 8.5.

No frontmatter changes to existing docs. Runtime Invariant #5 (baseline path is never user-configurable) is preserved — scoping is a search-index property, not a baseline-floor change.

Affected tools (per the new table)

oddkit_search, oddkit_catalog, oddkit_preflight get the new default + opt-in. oddkit_orient, oddkit_get, oddkit_challenge, oddkit_validate, oddkit_gate, oddkit_encode are unchanged — they read governance via the per-file resolver, not the search index.

Code PR sequencing

Per the operator's directive: canon merges first, then the klappy/oddkit PR cites this. The code PR will:

  1. Add workers/baseline/MANIFEST.json enumerating the six required-baseline files this section names (closes Build-Time Invariant feat(agent-skill): v1.3 PRD Elicitation Enhancement #4).
  2. Add include_full_baseline?: boolean to oddkit_search, oddkit_catalog, oddkit_preflight schemas.
  3. Update arbitrateEntries / index build in workers/src/zip-baseline-fetcher.ts to filter baseline against the manifest when scoped.
  4. Update cache key to (baselineSha, knowledgeBaseSha, scope).
  5. Add the three telemetry fields named in §"Telemetry."
  6. Re-validate against klappy/ptxprint-mcp (expect catalog total to drop from ~586 to ~27).
  7. Ship under release-validation-gate: Bugbot must reach completed; orchestrate.ts/governance-read changes get an independent Sonnet 4.6 validator before promotion.

Note

Medium Risk
Documentation-only change, but it declares a new default retrieval contract for oddkit_search/oddkit_catalog/oddkit_preflight; downstream implementations may change behavior and ranking expectations.

Overview
Introduces a new Tier-1 Search-Corpus Boundary section in canon/constraints/core-governance-baseline.md that defines scoped retrieval when knowledge_base_url is set: default search corpus becomes overlay + required-baseline, with include_full_baseline: true as the explicit opt-in to the prior merged behavior.

The new contract clarifies scope vs per-file resolution, lists affected tools, and specifies cache-key/telemetry fields to distinguish scoped vs merged indexing. Updates canon/CHANGELOG.md with a 0.36.0 (Epoch 8.5) entry documenting the behavioral intent and upcoming companion oddkit code change.

Reviewed by Cursor Bugbot for commit 7d1e844. Bugbot is set up for automated code reviews on this repo. Configure here.

… (E0008.5)

Establishes the contract for what the search corpus indexes when
knowledge_base_url is set. Defaults to overlay + required-baseline;
opt-in to merged via include_full_baseline: true.

Operationalizes klappy://canon/principles/scoped-truth and closes the
gap named in klappy/ptxprint-mcp's oddkit-kb-isolation-feature-request
handoff: project KBs were having their own canon outranked in BM25 by
the unrelated baseline content of co-located repos.

Companion code change ships in klappy/oddkit as a separate PR that
cites this section.

CHANGELOG bumps to 0.36.0 (Epoch 8.5).
@github-actions
Copy link
Copy Markdown

Canon Quality — oddkit_audit

No dead klappy:// references or legacy link patterns found in writings/. 39 files scanned.

Spec: klappy://docs/oddkit/specs/oddkit-audit · Workflow: .github/workflows/canon-quality.yml · Run: #11

@klappy klappy merged commit 766e1ab into main Apr 29, 2026
2 checks passed
@klappy klappy deleted the canon/search-corpus-boundary-e0008-5 branch April 29, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant