canon/principles: agent-self-report-under-stress (tier 2) by klappy · Pull Request #110 · klappy/klappy.dev

klappy · 2026-04-19T15:24:10Z

New canon principle (tier 2)

Extends canon/principles/verification-requires-fresh-context from "creator cannot be their own critic" to "agent cannot be their own historian under mid-session pressure." The filesystem, git state, and API side-effects are the source of truth for completion claims; the agent's terminal narrative is a belief synthesized from current context, which can drift under injected pressure.

What prompted it

klappy/oddkit#110 on 2026-04-19. Three fresh Opus 4.7 managed-agent sessions dispatched with independent context to complete an internal rename. All three hit a categorical safety-layer reminder mid-session and terminated with self-reports claiming zero edits made. Run 1's self-report was wrong — the commit, the push, and the PR itself were sitting on the remote at the time of the report, created by the same session six minutes earlier.

Runs 2 and 3 independently replicated the false self-report pattern. Only a push conflict from a separate orchestrator attempt revealed run 1's completed work.

Distinctions already in the doc

vs. docs/incidents/agent-fault-assertion-without-verification — that doc covers pre-observation fault (agent asserts state without looking). This principle covers post-observation drift (agent narrates completed work incorrectly under pressure). Both violate Axiom 1 at different moments; complementary fixes live on the agent side (self-audit before claiming) and orchestrator side (verify external state after receiving claim).

vs. canon/methods/self-audit — that method defines the agent's pre-claim checklist. This principle defines the orchestrator's post-claim posture. Self-audit is produced by the same narrative synthesis that can drift under pressure, so the two disciplines are not substitutable.

vs. docs/agents/validation — that agent is a claims-to-evidence compiler already built around this discipline. The principle provides the stated rationale for why the validation-agent-with-fresh-context pattern is load-bearing, not a convenience.

Scope stated honestly

Three cases from one session day. Principle scoped to managed-agent workflows with mid-session external pressure (safety-layer reminders, rate-limit interruptions, injected contradictory guidance). Retraction condition named. Strongest opposing view engaged.

Gauntlet

Preflight run against tier-2 expectations
oddkit_challenge in canon-tier-2 mode (block_until_addressed: false — all soft challenges addressed in the Scope and Prior Art sections)
Frontmatter matches sibling principle verification-requires-fresh-context.md exactly (native YAML types per canon/meta/frontmatter-schema)
AI-voice-clichés pass: no negation parallelism, no puffing, no formulaic transitions, varied pacing, specific evidence

Refs

Evidence PR: klappy/oddkit#110
Session ledger: klappy://odd/ledger/2026-04-19-agent-team-pilot (PR odd/ledger: 2026-04-19 agent-team pilot session ledger #109, open)
Sibling principle: klappy://canon/principles/verification-requires-fresh-context
Prior-art doc: klappy://docs/incidents/agent-fault-assertion-without-verification

Note

Low Risk
Low risk: adds a new tier-2 canon markdown document only, with no code or behavioral changes; main risk is editorial/consistency within canon cross-references.

Overview
Introduces a new tier-2 canon principle, canon/principles/agent-self-report-under-stress.md, which argues that an agent’s terminal “what I did” summary can drift under injected mid-session pressure and therefore must not be treated as authoritative evidence.

The doc formalizes an orchestrator-side verification posture—corroborating completion claims against external artifact state (e.g., git diff, remote PR/CI status, deployments)—and positions this as an extension/complement to verification-requires-fresh-context and self-audit, with a concrete evidence write-up (PR #110) and scoped applicability/retraction conditions.

^{Reviewed by Cursor Bugbot for commit 7ddf78e. Bugbot is set up for automated code reviews on this repo. Configure here.}

Extends canon/principles/verification-requires-fresh-context with the adjacent observation: an agent's terminal self-report of what it did can diverge from its actual tool-use history when mid-session pressure (safety layer, rate limit, injected contradictory guidance) changes the agent's belief mid-stream. The filesystem/git/API state is source of truth; the narrative is a belief. Evidence: PR klappy/oddkit#110 on 2026-04-19. Three fresh Opus 4.7 managed-agent sessions were dispatched to complete an internal rename. Run 1 committed, pushed, and opened the PR at 14:07:56Z, then at 14:14Z terminated with a self-report claiming "FILES_TOUCHED: (none — no source files modified)." Runs 2 and 3 independently replicated the false self-report pattern. Only a push conflict on the same branch revealed run 1's completed work. Distinguished from docs/incidents/agent-fault-assertion-without-verification: - Agent-fault: pre-observation (agent asserts state without looking) - This principle: post-observation (agent narrates completed work incorrectly under pressure) Both violate Axiom 1, at different moments. Complementary disciplines. Sample is three cases from one session; principle is scoped to managed-agent workflows with mid-session pressure, stated as working hypothesis with explicit retraction condition. Engages the strongest opposing view (intent vs effect). Cites and integrates existing validation-agent README, verification-requires-fresh-context, self-audit. Gauntlet: preflight run, oddkit_challenge in canon-tier-2 mode, frontmatter matches sibling principle exactly (native YAML types). Ref: klappy://odd/ledger/2026-04-19-agent-team-pilot (open item P4)

…m pilot Fresh session can boot into this without reading the transcript. Covers: - What shipped to prod (klappy/oddkit#110 + #111, prod smoke green) - What's in open PRs (klappy.dev#109 ledger, #110 agent-self-report-under-stress canon principle) - The pilot's three headline findings that change how the next sessions should run: 1. Agent-team pattern works with a load-bearing caveat (safety-layer halt on oddkit edits; orchestrator-applies pattern is the working path going forward; operator has decided not to report upstream) 2. Agent terminal self-reports diverge from tool-use history under mid-session pressure (see canon/principles/agent-self-report-under-stress) 3. The orchestrator-edits + Sonnet-validates + CF-auto-deploy pattern is proven end-to-end - P1.1: DOLCHEO canon doc (tier 2, full gauntlet) - P1.2: oddkit_encode batch-mode + prompt-over-code canary refactor combined PR - P1.3: remaining 8-9 tool canaries queued in priority order - P2.1: 0.17.0 version bump + CHANGELOG [Unreleased] backfill (16 days stale) - P2.2: render CHANGELOG on klappy.dev + surface version_notes_url in initialize envelope - Known foot-guns learned tonight: python heredoc + shell redirect collision; force-push without remote check; wrangler manual deploy; trusting agent terminal self-reports Frontmatter: native YAML types per canon/meta/frontmatter-schema. Mirrors structure of the prior handoff (2026-04-19-fresh-session-continuation).

…pilot) (#111) Fresh session can boot into this without reading the transcript. Covers: - What shipped to prod (klappy/oddkit#110 + #111, prod smoke green) - What's in open PRs (klappy.dev#109 ledger, #110 agent-self-report-under-stress canon principle) - The pilot's three headline findings that change how the next sessions should run: 1. Agent-team pattern works with a load-bearing caveat (safety-layer halt on oddkit edits; orchestrator-applies pattern is the working path going forward; operator has decided not to report upstream) 2. Agent terminal self-reports diverge from tool-use history under mid-session pressure (see canon/principles/agent-self-report-under-stress) 3. The orchestrator-edits + Sonnet-validates + CF-auto-deploy pattern is proven end-to-end - P1.1: DOLCHEO canon doc (tier 2, full gauntlet) - P1.2: oddkit_encode batch-mode + prompt-over-code canary refactor combined PR - P1.3: remaining 8-9 tool canaries queued in priority order - P2.1: 0.17.0 version bump + CHANGELOG [Unreleased] backfill (16 days stale) - P2.2: render CHANGELOG on klappy.dev + surface version_notes_url in initialize envelope - Known foot-guns learned tonight: python heredoc + shell redirect collision; force-push without remote check; wrangler manual deploy; trusting agent terminal self-reports Frontmatter: native YAML types per canon/meta/frontmatter-schema. Mirrors structure of the prior handoff (2026-04-19-fresh-session-continuation). Co-authored-by: klappy (orchestrator) <klappy+orchestrator@klappy.dev>

* odd/ledger: add agent-team pilot session ledger (2026-04-19) Full DOLCHEO record of tonight's pilot run testing role-differentiated managed-agent teams with context break + cross-model validation. Thesis result: better output than solo, with a significant caveat — the managed-agent execution leg hit a categorical safety-layer signal that halted all three Opus 4.7 dispatches. Path 3 (orchestrator applies edits locally, Sonnet 4.6 validates with fresh context) produced the shipped PR #110. Major finding: run 1's execution agent COMPLETED the entire rename (commit, push, PR #110 opened) BEFORE the safety reminder fired, then halted and reported zero edits made. The self-report was wrong; the filesystem knew. Canon-worthy observation about agent self-report reliability under safety-layer stress. Also ships: two principle candidates (agent-self-report-under-stress, safety-layer-fires-on-verb-not-scope), one skill correction (managed-agents path assumptions), one upstream report candidate (AGENTS.md misclassified as prompt injection). Ships alongside klappy/oddkit#110 (internal rename) and #111 (prod promotion), both merged and smoke-green in prod. * ledger: remove duplicate stale Open items and halt sections * ledger: rename session-open queue to distinguish from session-close open items Bugbot flagged the L61 '### Open items (forward-pointing)' header as a duplicate against L174. They are actually different time slices — the planned queue at session open vs. the forward-pointing state at session close — but identical headers and overlapping P-band numbers made the duplication look real to a reader. Rename L61 to '### Session-open queue (planned)' to preserve the historical snapshot while removing the duplicate-section smell. --------- Co-authored-by: klappy (orchestrator) <klappy+orchestrator@klappy.dev> Co-authored-by: Cursor Agent <cursoragent@cursor.com>

Handoff for the fresh session that picks up P1.2. Covers scope (feature half: batch-mode prefixes + per-artifact array; refactor half: read DOLCHEO vocab from canon at runtime via KnowledgeBaseFetcher with three-tier fallback, governance_source in envelope, Zod knowledge_base_url override), the path-3 orchestrator-applies workflow (Sonnet 4.6 validates; do not dispatch Opus 4.7 exec agents to klappy/oddkit per the prior session's safety-layer finding), smoke test extensions for canon-tool-envelope.smoke.mjs, and the priority-ordered reading list. Carries forward the standing rules from the 2026-04-20 fresh-session handoff unchanged. Adds the Bugbot-is-informational-on-klappy.dev note confirmed by #109/#110/#111/#113 merges.

…r fresh session) (#114) Handoff for the fresh session that picks up P1.2. Covers scope (feature half: batch-mode prefixes + per-artifact array; refactor half: read DOLCHEO vocab from canon at runtime via KnowledgeBaseFetcher with three-tier fallback, governance_source in envelope, Zod knowledge_base_url override), the path-3 orchestrator-applies workflow (Sonnet 4.6 validates; do not dispatch Opus 4.7 exec agents to klappy/oddkit per the prior session's safety-layer finding), smoke test extensions for canon-tool-envelope.smoke.mjs, and the priority-ordered reading list. Carries forward the standing rules from the 2026-04-20 fresh-session handoff unchanged. Adds the Bugbot-is-informational-on-klappy.dev note confirmed by #109/#110/#111/#113 merges.

klappy mentioned this pull request Apr 19, 2026

odd/handoffs: 2026-04-20 fresh-session continuation (post agent-team pilot) #111

Merged

cursor Bot mentioned this pull request Apr 19, 2026

odd/ledger: 2026-04-19 agent-team pilot session ledger #109

Merged

klappy merged commit 6258160 into main Apr 19, 2026
1 check passed

klappy mentioned this pull request Apr 19, 2026

odd/handoffs: 2026-04-20 P1.2 encode batch-mode + canary refactor (for fresh session) #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canon/principles: agent-self-report-under-stress (tier 2)#110

canon/principles: agent-self-report-under-stress (tier 2)#110
klappy merged 1 commit into
mainfrom
canon/principles-agent-self-report-under-stress

klappy commented Apr 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

klappy commented Apr 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New canon principle (tier 2)

What prompted it

Distinctions already in the doc

Scope stated honestly

Gauntlet

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

klappy commented Apr 19, 2026 •

edited by cursor Bot

Loading