Skip to content

kadubon/cimt-kernel

Repository files navigation

cimt-kernel

cimt-kernel is an Apache-2.0 reference kernel for:

Takahashi, K. (2026). Affordance-Compiled Intelligence: Observable-Only Cognitive Impedance Matching for No-Meta LLM-Integrated Systems. Zenodo. https://doi.org/10.5281/zenodo.20116149

The repository is for engineers who want to run and inspect the paper's core formal machinery. It is intentionally small, deterministic, schema-first, and portable. It is not a production AI agent and it is not a full implementation of the paper.

Intuition

CIMT asks a narrow question:

For a fixed model-policy, did a world-side compiler improve a declared target-channel claim without hiding protected failures, leakage, or missing evidence?

An LLM-integrated system is more than a model call. It also contains tools, retrievers, wrappers, validators, parsers, authority rules, rollback paths, context summaries, gates, logs, and reducers. CIMT treats this world-facing surface as an executable affordance field.

An affordance compiler changes that surface. It can turn ambient tools and informal logs into typed handles, trusted manifests, observable preconditions, authority scopes, target-firewall evidence, receipt obligations, conformance envelopes, deterministic reducers, and debt certificates.

The key discipline is no-meta and observable-only:

  • no hidden ground truth;
  • no privileged human evaluator;
  • no privileged LLM judge;
  • no benchmark treated as absolute truth;
  • no candidate-defined success field used to promote that candidate.

Every evaluator is a named measurement channel with declared scope, interface, receipts, and debt or bridge treatment. A target-channel win is evidence for the declared target-channel claim, not proof of a broader truth.

What This Repository Implements

The stable src/cimt kernel implements the conservative scalar reference path used by the paper's miniature examples:

paired_lower_bound - conservative_additive_bounded_debt > threshold

It includes:

  • portable claim and evidence objects;
  • strict JSON schemas for core wire objects;
  • deterministic canonical JSON commitments;
  • observable ledger and receipt-obligation checks;
  • target, reference, and declared-validator channel separation;
  • typed target-firewall certificates;
  • fixed-sample paired Hoeffding lower certificates;
  • certificate events and validity-budget rows;
  • bounded debt with observable certificate provenance;
  • exact-zero certificates for forbidden coordinates;
  • rare-event upper certificates for zero observed failures;
  • evidence dependency graph coverage checks for strict profiles;
  • model-policy and runtime conformance envelope checks;
  • diagnostic cognitive impedance objects;
  • deterministic RAG, code-agent, and macro-handle examples.

The contribution is a readable kernel and portable boundary, not a deployment system. Other languages should treat schemas/ and the example JSON files as the interoperability target.

Enhanced Research Kernel

The repository also contains an optional enhanced layer:

src/cimt/            stable scalar reference kernel
src/cimt_enhanced/   vector/statistical research kernel
schemas/             stable scalar schema boundary
schemas_enhanced/    enhanced schema index and exported schemas
experiment/          deterministic offline experiment fixtures

Use cimt when you want the minimal reference path. Use cimt-enhanced when you want to inspect the broader machinery that a fuller CIMT implementation would need: declarative vector debt, monotone vector profile predicates, certificate-event routing, validity-budget ledgers, model-policy catalogs, receipt sufficiency, dependency graph coverage, bridge/reference obligations, candidate-search correction, simultaneous-validity hooks, typed scope closure, runtime conformance, receipt availability, and overlap/disjointness proof objects. It also includes formal-core proof objects for observable ablation attribution, macro reliability, repair drift, summary loss, handle capacity, compiler obligation extraction, effect-safe composition, gate conformance, metadata non-interference, authority, and taint. The enhanced statistical layer now also replays blocked and clustered paired Hoeffding certificates, exact small sign-flip randomization certificates, observable shift certificates, IPM/MMD transfer certificates, domain-classifier transfer certificates, conformal risk-control certificates, and claim-owned holdout/no-leakage requirements. The evidence-substrate layer now adds claim-owned probability-space requirements, source-bound certificate events and budget rows, concrete finite e-process product replay, holdout-ledger correction certificates, replay-coupled and distributional-service model-policy envelope checks, runtime access receipts, monitor-totality certificates, and deterministic dynamic graph evidence. The enhanced profile language is portable JSON, not Python code: it uses positive DNF (any_of branches containing all_of monotone atoms) over certified improvement bounds, projected bounded debt, protected floors, and exact-zero certificates. Debt projection is explicit and conservative; non-additive overlap still needs claim-owned proof certificates.

The enhanced path recomputes graph, receipt, target-firewall, runtime-conformance, scope-closure, zero-certificate, model-policy, and validity-budget checks from replayable evidence objects. Claim-owned obligations cannot be weakened by evidence: required graph nodes, receipt rules, coordinate registry, target-firewall reads, zero-certificate reads, runtime artifacts, and overlap certificates are declared by the claim.

The enhanced layer is still not a full implementation of the paper. It does not add hidden evaluators or live model judging. It makes more of the paper's formal interfaces executable while keeping production infrastructure claims out of scope.

What is now closer to the paper:

  • bridge claims can require symmetric candidate and baseline disagreement certificates against a named reference channel;
  • fixed-family candidate search applies Bonferroni correction, and declared alpha-spending search checks committed stopping and cumulative alpha;
  • simultaneous-validity requirements can name event roles and require a joint-validity certificate over claim-required events;
  • protected floors can remain boolean evidence for compatibility or require certificate-backed scope/horizon/read evidence;
  • strict scope closure, runtime conformance, time-indexed receipt availability, and non-additive overlap policies use typed claim-owned requirements and evidence-side proof objects.
  • probability-space sources for samplers, randomizers, calibration splits, holdouts, stopping rules, filtrations, and model-policy episodes can be bound to certificate events and validity-budget rows;
  • concrete finite e-process product certificates can be replayed from committed nonnegative e-values, filtration, stopping rule, threshold, covered event IDs, and budget rows;
  • holdout-ledger candidate-search correction can be replayed from a declared holdout ID, query-ledger commitment, no-leakage event, filtration, reuse budget, and covered comparisons;
  • replay-coupled and distributional-service model-policy envelope claims can require archived prompt/tool/seed/response commitments or drift-monitor debt;
  • runtime access receipts and monitor-totality certificates can support deterministic dependency-graph construction and reject undeclared target/protected reads;
  • blocked, clustered, and small exact randomization paired certificates are recomputed from typed committed inputs;
  • observable shift and transfer claims require claim-owned requirements, deployment or calibration artifacts, bounded transfer debt, and budgeted certificate events;
  • holdout/no-leakage requirements for target-firewall and candidate-search contexts are checked from claim-owned reads and artifact manifests;
  • component-attribution claims require predeclared ablation schemes, budgeted comparison events, and residual interaction debt when needed;
  • macro reliability is represented through typed failure-mode covers and explicit union-bound or conditional rules;
  • repair, summary, capacity, compiler-obligation, gate, metadata, authority, taint, and effect-composition claims are checked as formal evidence objects before the vector profile predicate can accept.

Still external to this repository:

  • durable ledger and receipt storage;
  • real holdout custody;
  • provider drift guarantees;
  • production sandbox and wrapper assurance;
  • domain validator correctness;
  • proof assistant integration;
  • operational incident response.

For practical deployment work, read docs/productization_gap_checklist.md. That checklist separates what this repository can replay from the operational systems implementers still have to build: ledger retention, holdout custody, model-policy drift monitoring, sandbox conformance, dynamic dependency graph extraction, exact-zero proof systems, scope closure, and domain calibrators.

What This Repository Does Not Implement

This repository does not provide:

  • live LLM calls;
  • live benchmark execution;
  • real sandboxing or policy enforcement;
  • production receipt retention or escrow;
  • model-provider drift monitoring;
  • arbitrary Python or candidate-supplied promotion predicates;
  • arbitrary vector monoids and non-additive overlap calculi without proof certificates;
  • confidence-sequence and martingale certificates beyond explicit fail-closed interfaces;
  • e-process methods beyond the concrete finite product replay rule;
  • holdout-ledger operations beyond deterministic replay of committed query ledgers, no-leakage events, filtrations, and reuse budgets;
  • production shift monitoring, deployment-sample custody, or provider drift guarantees;
  • automatic proof of ablation causality, macro reliability, repair contraction, summary equivalence, handle capacity, or compiler correctness;
  • production affordance compiler synthesis;
  • a proof assistant;
  • a real-world safety certificate.

Those are extension targets. The reference kernel keeps them visible as interfaces, objects, tests, and documentation boundaries.

Do Not Use This To Claim

Do not use an accepted example decision to claim real-world safety, semantic truth, deployment readiness, superiority of a model, or correctness outside the declared claim object. Do not treat a target-channel win as a substitute for missing receipts, protected floors, conformance, model-policy evidence, scope closure, or exact-zero certificates. Common anti-patterns and expected fail-closed outcomes are listed in docs/misuse_cases.md.

Try It

Install dependencies and run the test suite:

uv sync --extra dev
uv run pytest

Run the accepted RAG miniature:

uv run cimt example rag --pretty

Replay the checked-in JSON evidence:

uv run cimt promote examples/rag/claim.json examples/rag/evidence.json --pretty

Try the code-agent and macro-handle examples:

uv run cimt example code-agent --decision-only --pretty
uv run cimt example macro-handle --n 12 --p 0.08 --r 0.02 --q-n 0.03 --bounded-debt 0.01 --pretty

Validate objects and inspect schemas:

uv run cimt claim validate examples/rag/claim.json --pretty
uv run cimt evidence validate examples/rag/evidence.json --pretty
uv run cimt schema export --pretty

Try the enhanced vector RAG fixture:

uv run cimt-enhanced example vector-rag --decision-only --pretty
uv run cimt-enhanced catalog validate experiment/ollama_gemma4_e4b_strong_baseline/policy_catalog.json --pretty
uv run cimt-enhanced schema export --pretty

Run the portability checks:

uv run pytest tests/test_portability.py

Reader Path

For a quick technical review:

  1. Run uv run cimt example rag --pretty.
  2. Read examples/rag/claim.json and examples/rag/evidence.json.
  3. Read docs/theory_guardrails.md.
  4. Read docs/paper_mapping.md.
  5. Read docs/implementation_audit.md.
  6. Read docs/misuse_cases.md.
  7. Read docs/enhanced_kernel.md.
  8. Read docs/productization_gap_checklist.md.
  9. Read docs/extension_guide.md before adding a new domain.

The most important function is cimt.promotion.evaluate_promotion. It returns accepted, rejected, or non_promotable for exactly the declared claim object.

Decision Flow

ClaimObject
  scope, target channel, workload, fixed model-policy, budget,
  protected boundary, receipt tier, estimand, profile, alpha
        |
        v
EvidenceObject
  ledgers, receipts, manifests, reducers, certificate events,
  dependency graph, validity budget, conformance envelopes,
  target-firewall certificate, declared debt
        |
        v
Certificates and debt
  paired lower bound, bounded debt, protected floors,
  exact-zero certificates for forbidden coordinates
        |
        v
Profile artifact
  scalar predicate in `cimt`, declarative monotone vector predicate in
  `cimt_enhanced`
        |
        v
accepted | rejected | non_promotable

Strict profiles fail closed. Missing ledger indexes, receipts, reducer commitments, validity-budget rows, debt source certificates, dependency graph coverage, model-policy envelopes, runtime conformance, scope closure, target-firewall evidence, paired target evidence, or exact-zero certificates become non_promotable for the original claim.

In the enhanced layer, the claim owns promotion obligations: required graph nodes, receipt rules, debt registry, target-firewall reads, zero-certificate reads, runtime artifacts, and overlap certificate requirements. Evidence supplies proof material through manifests, receipts, certificate events, graph nodes, and catalogs; it cannot reduce its own obligation surface.

Core Distinctions

Target channel vs truth. A target channel is an instrument. It may be a test suite, verifier, simulator, review process, proof checker, or deployment signal. It is never absolute truth. Stronger reference or deployment claims need bridge certificates or declared debt.

Exact zero vs zero observed. Zero observed failures can produce bounded rare-event debt. Exact zero requires design exclusion, deterministic invariant, exhaustive finite cover, or fail-closed monitor evidence.

Debt provenance vs assertion. In strict profiles, positive bounded debt must cite an observable certificate event and a validity-budget row. A number without provenance is not promotion evidence.

Impedance vs promotion. Cognitive impedance can explain why a compiler helps, but impedance reduction alone cannot promote a claim. Promotion still requires target-channel evidence and protected checks.

Missing evidence vs success. Missing required evidence must become bounded debt, inadmissible debt, unsupported obligation, or non-promotability. It must not silently narrow or strengthen the claim.

Examples

The examples are deterministic miniature certificates from the paper.

  • examples/rag: target success is AnswerCheck AND CitationCheck AND NoOverlap. The accepted case has N=4000, delta_hat=0.061, paired alpha 0.005, scalar debt 0.006, and exact-zero leakage/firewall evidence.
  • examples/code_agent: target success is FullTests AND Lint AND Build. Selected tests may guide repair, but they cannot become target evidence unless precommitted, symmetric, and holdout-independent.
  • examples/macro_handle: computes (1 - p)^n, 1 - r - q_n, and (1 - r)(1 - q_n). The union-bound form does not claim independence.

These examples teach the calculus. They do not certify any real RAG system, code-editing agent, repository, model, or deployment.

Repository Layout

src/cimt/           scalar reference Python bindings and deterministic reducers
src/cimt_enhanced/  declarative vector/statistical research kernel
schemas/            portable scalar JSON Schema boundary
schemas_enhanced/   enhanced schema index and exported schemas
examples/           portable scalar claim/evidence/profile/decision JSON
tests/              scalar semantic guardrail tests
tests_enhanced/     enhanced semantic guardrail tests
docs/               architecture, mapping, audit, extension, and examples

Extending The Kernel

Start with evidence, not agent behavior:

  1. declare a narrow target channel;
  2. declare a claim object;
  3. define reducer-read artifact I/O manifests;
  4. define receipt obligations;
  5. construct the dependency graph;
  6. emit bounded, inadmissible, or unsupported debt coordinates;
  7. attach observable certificate events and validity-budget rows;
  8. add exact-zero checkers for forbidden coordinates;
  9. write non-promotable tests for every missing-evidence path.

Live model calls and live benchmarks should come only after the evidence boundary is stable and replayable.

Release Checks

Before publishing:

uv run ruff check .
uv run ruff format --check .
uv run mypy src
uv run pytest --cov=cimt --cov=cimt_enhanced
uv run cimt example rag --pretty
uv run cimt promote examples/rag/claim.json examples/rag/evidence.json --pretty
uv run cimt schema export
uv run cimt-enhanced example vector-rag --decision-only --pretty
uv run cimt-enhanced schema export

Also run the public-surface checks in docs/security_quality_audit.md.