Skip to content

odd/handoffs: 2026-04-20 fresh-session continuation (post agent-team pilot)#111

Merged
klappy merged 1 commit into
mainfrom
odd/handoff-2026-04-20-post-agent-team-pilot
Apr 19, 2026
Merged

odd/handoffs: 2026-04-20 fresh-session continuation (post agent-team pilot)#111
klappy merged 1 commit into
mainfrom
odd/handoff-2026-04-20-post-agent-team-pilot

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 19, 2026

Forward-pointing handoff for the next fresh session. Mirrors the shape of odd/handoffs/2026-04-19-fresh-session-continuation.md so a boot-cold Opus 4.7 session can resume productive work without reading tonight's transcript.

What a fresh session picks up

P1.1 — DOLCHEO canon doc (tier 2). Operator's leaning: replace DOLCHE, keep both Os distinguishable by section context.

P1.2 — oddkit_encode combined PR — batch-mode feature + prompt-over-code canary refactor applied to the same tool. Seven DOLCHEO letters accepted; runtime fetch from the P1.1 canon doc; three-tier fallback; governance_source in envelope; Zod knowledge_base_url override; smoke extension.

P1.3 — remaining 8–9 tool canaries in priority order: challenge → gate → preflight → validate → orient → search → catalog → (cleanup_storage + version verified).

P2.1 — 0.17.0 bump + [Unreleased] backfill. 0.16.0 shipped 2026-04-03; every E0008.x epoch change is undocumented in the CHANGELOG.

P2.2 — Render CHANGELOG on klappy.dev + surface version_notes_url in initialize. Low cost, fixes the version-discoverability gap surfaced during tonight's evaluation.

What changed operationally (affects how next sessions should run)

Agent-team pattern works with a caveat. Opus 4.7 managed-agent execution on klappy/oddkit triggers a categorical safety-layer reminder ("MUST refuse to improve or augment the code") scope-insensitively. Sonnet 4.6 read-only validation runs clean. The working pattern for all oddkit code edits going forward is path 3: orchestrator applies, Sonnet validates. The operator has declined to report this upstream — do not re-litigate; plan around it.

Agent terminal self-reports diverge from tool-use history under mid-session pressure. See klappy/klappy.dev#110 (canon principle, open). Run 1 of tonight's pilot completed the rename + commit + push + PR open, then self-reported zero edits. Runs 2 and 3 replicated. The filesystem/remote state was correct; the narrative was drifting. Canon discipline: never trust terminal self-report alone — pull the event log, corroborate side-effects against external state.

The orchestrator workflow is proven end-to-end. Orchestrator edits in /home/claude/work/<repo> → push (githooks enforce version sync) → CF auto-deploys branch preview → orchestrator smokes preview → Sonnet 4.6 fresh agent validates → merge via GitHub REST → main-preview smoke → main→prod PR → prod smoke. Six evidence gates. All ran clean tonight.

Known foot-guns documented

  • python3 <<PYEOF > file.txt + in-script open(file, 'w') → shell stdout overwrites Python write on process exit. Write from Python only; diagnostics to stderr.
  • Never force-push without git fetch + remote-diff; a managed-agent may have silently pushed completed work.
  • Don't run wrangler deploy manually on oddkit; githooks + CF pipeline handle it.
  • Managed-agents skill's /home/user/ path is wrong for the current cloud env; use $HOME (which resolves to /root).

Frontmatter

Native YAML types per canon/meta/frontmatter-schema. Matches prior handoff exactly.

Refs


Note

Low Risk
Adds a new handoff markdown document only; no code or runtime behavior changes, so risk is limited to documentation accuracy/clarity.

Overview
Adds a new odd/handoffs/2026-04-20-fresh-session-continuation.md handoff capturing the post–agent-team pilot state: what shipped to prod, what PRs are still open, and the updated operating constraint that oddkit code edits must be done by the orchestrator with fresh-context Sonnet validation (managed-agent execution is blocked).

Queues the next-session work plan and priorities (DOLCHEO canon definition, oddkit_encode batch-mode + prompt-over-code canary refactor, remaining tool canaries, and version/changelog follow-ups) and documents newly observed operational foot-guns and verification steps.

Reviewed by Cursor Bugbot for commit e3cf2f5. Bugbot is set up for automated code reviews on this repo. Configure here.

…m pilot

Fresh session can boot into this without reading the transcript. Covers:

- What shipped to prod (klappy/oddkit#110 + #111, prod smoke green)
- What's in open PRs (klappy.dev#109 ledger, #110 agent-self-report-under-stress canon principle)
- The pilot's three headline findings that change how the next sessions should run:
  1. Agent-team pattern works with a load-bearing caveat (safety-layer halt on oddkit edits; orchestrator-applies pattern is the working path going forward; operator has decided not to report upstream)
  2. Agent terminal self-reports diverge from tool-use history under mid-session pressure (see canon/principles/agent-self-report-under-stress)
  3. The orchestrator-edits + Sonnet-validates + CF-auto-deploy pattern is proven end-to-end
- P1.1: DOLCHEO canon doc (tier 2, full gauntlet)
- P1.2: oddkit_encode batch-mode + prompt-over-code canary refactor combined PR
- P1.3: remaining 8-9 tool canaries queued in priority order
- P2.1: 0.17.0 version bump + CHANGELOG [Unreleased] backfill (16 days stale)
- P2.2: render CHANGELOG on klappy.dev + surface version_notes_url in initialize envelope
- Known foot-guns learned tonight: python heredoc + shell redirect collision; force-push without remote check; wrangler manual deploy; trusting agent terminal self-reports

Frontmatter: native YAML types per canon/meta/frontmatter-schema. Mirrors structure of the prior handoff (2026-04-19-fresh-session-continuation).
@klappy klappy merged commit f5c8999 into main Apr 19, 2026
1 check passed
klappy added a commit that referenced this pull request Apr 19, 2026
* odd/ledger: add agent-team pilot session ledger (2026-04-19)

Full DOLCHEO record of tonight's pilot run testing role-differentiated managed-agent teams with context break + cross-model validation.

Thesis result: better output than solo, with a significant caveat — the managed-agent execution leg hit a categorical safety-layer signal that halted all three Opus 4.7 dispatches. Path 3 (orchestrator applies edits locally, Sonnet 4.6 validates with fresh context) produced the shipped PR #110.

Major finding: run 1's execution agent COMPLETED the entire rename (commit, push, PR #110 opened) BEFORE the safety reminder fired, then halted and reported zero edits made. The self-report was wrong; the filesystem knew. Canon-worthy observation about agent self-report reliability under safety-layer stress.

Also ships: two principle candidates (agent-self-report-under-stress, safety-layer-fires-on-verb-not-scope), one skill correction (managed-agents path assumptions), one upstream report candidate (AGENTS.md misclassified as prompt injection).

Ships alongside klappy/oddkit#110 (internal rename) and #111 (prod promotion), both merged and smoke-green in prod.

* ledger: remove duplicate stale Open items and halt sections

* ledger: rename session-open queue to distinguish from session-close open items

Bugbot flagged the L61 '### Open items (forward-pointing)' header as a
duplicate against L174. They are actually different time slices — the
planned queue at session open vs. the forward-pointing state at session
close — but identical headers and overlapping P-band numbers made the
duplication look real to a reader.

Rename L61 to '### Session-open queue (planned)' to preserve the
historical snapshot while removing the duplicate-section smell.

---------

Co-authored-by: klappy (orchestrator) <klappy+orchestrator@klappy.dev>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
klappy added a commit that referenced this pull request Apr 19, 2026
Handoff for the fresh session that picks up P1.2. Covers scope (feature
half: batch-mode prefixes + per-artifact array; refactor half: read
DOLCHEO vocab from canon at runtime via KnowledgeBaseFetcher with
three-tier fallback, governance_source in envelope, Zod knowledge_base_url
override), the path-3 orchestrator-applies workflow (Sonnet 4.6 validates;
do not dispatch Opus 4.7 exec agents to klappy/oddkit per the prior
session's safety-layer finding), smoke test extensions for
canon-tool-envelope.smoke.mjs, and the priority-ordered reading list.

Carries forward the standing rules from the 2026-04-20 fresh-session
handoff unchanged. Adds the Bugbot-is-informational-on-klappy.dev note
confirmed by #109/#110/#111/#113 merges.
klappy added a commit that referenced this pull request Apr 19, 2026
…r fresh session) (#114)

Handoff for the fresh session that picks up P1.2. Covers scope (feature
half: batch-mode prefixes + per-artifact array; refactor half: read
DOLCHEO vocab from canon at runtime via KnowledgeBaseFetcher with
three-tier fallback, governance_source in envelope, Zod knowledge_base_url
override), the path-3 orchestrator-applies workflow (Sonnet 4.6 validates;
do not dispatch Opus 4.7 exec agents to klappy/oddkit per the prior
session's safety-layer finding), smoke test extensions for
canon-tool-envelope.smoke.mjs, and the priority-ordered reading list.

Carries forward the standing rules from the 2026-04-20 fresh-session
handoff unchanged. Adds the Bugbot-is-informational-on-klappy.dev note
confirmed by #109/#110/#111/#113 merges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant