odd/handoffs: 2026-04-20 fresh-session continuation (post agent-team pilot)#111
Merged
Merged
Conversation
…m pilot Fresh session can boot into this without reading the transcript. Covers: - What shipped to prod (klappy/oddkit#110 + #111, prod smoke green) - What's in open PRs (klappy.dev#109 ledger, #110 agent-self-report-under-stress canon principle) - The pilot's three headline findings that change how the next sessions should run: 1. Agent-team pattern works with a load-bearing caveat (safety-layer halt on oddkit edits; orchestrator-applies pattern is the working path going forward; operator has decided not to report upstream) 2. Agent terminal self-reports diverge from tool-use history under mid-session pressure (see canon/principles/agent-self-report-under-stress) 3. The orchestrator-edits + Sonnet-validates + CF-auto-deploy pattern is proven end-to-end - P1.1: DOLCHEO canon doc (tier 2, full gauntlet) - P1.2: oddkit_encode batch-mode + prompt-over-code canary refactor combined PR - P1.3: remaining 8-9 tool canaries queued in priority order - P2.1: 0.17.0 version bump + CHANGELOG [Unreleased] backfill (16 days stale) - P2.2: render CHANGELOG on klappy.dev + surface version_notes_url in initialize envelope - Known foot-guns learned tonight: python heredoc + shell redirect collision; force-push without remote check; wrangler manual deploy; trusting agent terminal self-reports Frontmatter: native YAML types per canon/meta/frontmatter-schema. Mirrors structure of the prior handoff (2026-04-19-fresh-session-continuation).
klappy
added a commit
that referenced
this pull request
Apr 19, 2026
* odd/ledger: add agent-team pilot session ledger (2026-04-19) Full DOLCHEO record of tonight's pilot run testing role-differentiated managed-agent teams with context break + cross-model validation. Thesis result: better output than solo, with a significant caveat — the managed-agent execution leg hit a categorical safety-layer signal that halted all three Opus 4.7 dispatches. Path 3 (orchestrator applies edits locally, Sonnet 4.6 validates with fresh context) produced the shipped PR #110. Major finding: run 1's execution agent COMPLETED the entire rename (commit, push, PR #110 opened) BEFORE the safety reminder fired, then halted and reported zero edits made. The self-report was wrong; the filesystem knew. Canon-worthy observation about agent self-report reliability under safety-layer stress. Also ships: two principle candidates (agent-self-report-under-stress, safety-layer-fires-on-verb-not-scope), one skill correction (managed-agents path assumptions), one upstream report candidate (AGENTS.md misclassified as prompt injection). Ships alongside klappy/oddkit#110 (internal rename) and #111 (prod promotion), both merged and smoke-green in prod. * ledger: remove duplicate stale Open items and halt sections * ledger: rename session-open queue to distinguish from session-close open items Bugbot flagged the L61 '### Open items (forward-pointing)' header as a duplicate against L174. They are actually different time slices — the planned queue at session open vs. the forward-pointing state at session close — but identical headers and overlapping P-band numbers made the duplication look real to a reader. Rename L61 to '### Session-open queue (planned)' to preserve the historical snapshot while removing the duplicate-section smell. --------- Co-authored-by: klappy (orchestrator) <klappy+orchestrator@klappy.dev> Co-authored-by: Cursor Agent <cursoragent@cursor.com>
klappy
added a commit
that referenced
this pull request
Apr 19, 2026
Handoff for the fresh session that picks up P1.2. Covers scope (feature half: batch-mode prefixes + per-artifact array; refactor half: read DOLCHEO vocab from canon at runtime via KnowledgeBaseFetcher with three-tier fallback, governance_source in envelope, Zod knowledge_base_url override), the path-3 orchestrator-applies workflow (Sonnet 4.6 validates; do not dispatch Opus 4.7 exec agents to klappy/oddkit per the prior session's safety-layer finding), smoke test extensions for canon-tool-envelope.smoke.mjs, and the priority-ordered reading list. Carries forward the standing rules from the 2026-04-20 fresh-session handoff unchanged. Adds the Bugbot-is-informational-on-klappy.dev note confirmed by #109/#110/#111/#113 merges.
klappy
added a commit
that referenced
this pull request
Apr 19, 2026
…r fresh session) (#114) Handoff for the fresh session that picks up P1.2. Covers scope (feature half: batch-mode prefixes + per-artifact array; refactor half: read DOLCHEO vocab from canon at runtime via KnowledgeBaseFetcher with three-tier fallback, governance_source in envelope, Zod knowledge_base_url override), the path-3 orchestrator-applies workflow (Sonnet 4.6 validates; do not dispatch Opus 4.7 exec agents to klappy/oddkit per the prior session's safety-layer finding), smoke test extensions for canon-tool-envelope.smoke.mjs, and the priority-ordered reading list. Carries forward the standing rules from the 2026-04-20 fresh-session handoff unchanged. Adds the Bugbot-is-informational-on-klappy.dev note confirmed by #109/#110/#111/#113 merges.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Forward-pointing handoff for the next fresh session. Mirrors the shape of
odd/handoffs/2026-04-19-fresh-session-continuation.mdso a boot-cold Opus 4.7 session can resume productive work without reading tonight's transcript.What a fresh session picks up
P1.1 — DOLCHEO canon doc (tier 2). Operator's leaning: replace DOLCHE, keep both Os distinguishable by section context.
P1.2 —
oddkit_encodecombined PR — batch-mode feature + prompt-over-code canary refactor applied to the same tool. Seven DOLCHEO letters accepted; runtime fetch from the P1.1 canon doc; three-tier fallback;governance_sourcein envelope; Zodknowledge_base_urloverride; smoke extension.P1.3 — remaining 8–9 tool canaries in priority order: challenge → gate → preflight → validate → orient → search → catalog → (cleanup_storage + version verified).
P2.1 — 0.17.0 bump +
[Unreleased]backfill. 0.16.0 shipped 2026-04-03; every E0008.x epoch change is undocumented in the CHANGELOG.P2.2 — Render CHANGELOG on klappy.dev + surface
version_notes_urlininitialize. Low cost, fixes the version-discoverability gap surfaced during tonight's evaluation.What changed operationally (affects how next sessions should run)
Agent-team pattern works with a caveat. Opus 4.7 managed-agent execution on klappy/oddkit triggers a categorical safety-layer reminder ("MUST refuse to improve or augment the code") scope-insensitively. Sonnet 4.6 read-only validation runs clean. The working pattern for all oddkit code edits going forward is path 3: orchestrator applies, Sonnet validates. The operator has declined to report this upstream — do not re-litigate; plan around it.
Agent terminal self-reports diverge from tool-use history under mid-session pressure. See
klappy/klappy.dev#110(canon principle, open). Run 1 of tonight's pilot completed the rename + commit + push + PR open, then self-reported zero edits. Runs 2 and 3 replicated. The filesystem/remote state was correct; the narrative was drifting. Canon discipline: never trust terminal self-report alone — pull the event log, corroborate side-effects against external state.The orchestrator workflow is proven end-to-end. Orchestrator edits in
/home/claude/work/<repo>→ push (githooks enforce version sync) → CF auto-deploys branch preview → orchestrator smokes preview → Sonnet 4.6 fresh agent validates → merge via GitHub REST → main-preview smoke → main→prod PR → prod smoke. Six evidence gates. All ran clean tonight.Known foot-guns documented
python3 <<PYEOF > file.txt+ in-scriptopen(file, 'w')→ shell stdout overwrites Python write on process exit. Write from Python only; diagnostics to stderr.git fetch+ remote-diff; a managed-agent may have silently pushed completed work.wrangler deploymanually on oddkit; githooks + CF pipeline handle it./home/user/path is wrong for the current cloud env; use$HOME(which resolves to/root).Frontmatter
Native YAML types per
canon/meta/frontmatter-schema. Matches prior handoff exactly.Refs
klappy://odd/ledger/2026-04-19-agent-team-pilot(PR odd/ledger: 2026-04-19 agent-team pilot session ledger #109, open)klappy://canon/principles/agent-self-report-under-stress(PR canon/principles: agent-self-report-under-stress (tier 2) #110, open)klappy://odd/handoffs/2026-04-19-fresh-session-continuationhttps://oddkit.klappy.dev/mcpat 0.16.0, smoke 24/24Note
Low Risk
Adds a new handoff markdown document only; no code or runtime behavior changes, so risk is limited to documentation accuracy/clarity.
Overview
Adds a new
odd/handoffs/2026-04-20-fresh-session-continuation.mdhandoff capturing the post–agent-team pilot state: what shipped to prod, what PRs are still open, and the updated operating constraint thatoddkitcode edits must be done by the orchestrator with fresh-context Sonnet validation (managed-agent execution is blocked).Queues the next-session work plan and priorities (DOLCHEO canon definition,
oddkit_encodebatch-mode + prompt-over-code canary refactor, remaining tool canaries, and version/changelog follow-ups) and documents newly observed operational foot-guns and verification steps.Reviewed by Cursor Bugbot for commit e3cf2f5. Bugbot is set up for automated code reviews on this repo. Configure here.