feat(token-efficiency): register mismatch corrections, carrying cost budgeting, and docs cleanup by armstrongl · Pull Request #1 · nuggylib/compound-engineering-plugin

armstrongl · 2026-04-12T01:08:37Z

Summary

This branch delivers the first concrete execution phase of the token efficiency initiative, targeting three areas that compound to reduce token waste across plugin skills.

Register mismatch correction methodology: Introduces a systematic framework for identifying and fixing mismatches between a skill's described purpose (in its description/trigger) and its actual prompt content. Applies corrections to the top 7 highest-impact skills (ce-compound, ce-plan, ce-review, ce-work, ce-work-beta, ce-compound-refresh, orchestrating-swarms), reducing wasted tokens from skills that trigger in the wrong context or bloat their prompts with off-register content.
Carrying cost budgeting plan: Adds a structured plan for measuring and managing the ongoing token cost of documentation and artifacts that persist in the repo. Establishes a framework for deciding when stale docs should be archived or removed based on their carrying cost vs. reference value.
Stale documentation cleanup: Removes 53 completed brainstorm and plan documents (18 brainstorms + 35 plans) that were no longer serving as active references. These documents had been fully executed and their outcomes captured in implemented code, commit history, and solution docs. This directly reduces context noise and repo clutter.

What these changes mean

Token efficiency in an agent plugin isn't about micro-optimizing individual prompts. It's about ensuring the entire system, from skill descriptions that control when skills trigger, to the documents that persist in the repo and consume context, operates at the right level of precision. Register mismatches cause skills to activate unnecessarily or waste tokens on irrelevant content when they do activate. Stale docs impose a carrying cost every time an agent reads the repo structure. This PR addresses both structural sources of waste.

Changes by commit

sync(docs) - Expanded token efficiency ideation with cross-domain ideas and sequencing
docs(register-mismatch) - Conceptual explainer for register mismatch correction
feat(register-mismatch) - Correction methodology with pattern classification and savings estimates
feat(register-mismatch) - Applied correction methodology to top 7 skills
feat(carrying-cost) - Carrying cost budgeting plan and meta-plan tracking
chore(docs) - Removed 53 completed brainstorms and plans

Test plan

Verify corrected skills trigger appropriately in context (not over-triggering)
Run bun test to confirm no regressions from skill content changes
Spot-check that removed docs are not referenced by any remaining files
Review register mismatch methodology for applicability to remaining skills

Copilot

Pull request overview

Adds reference documentation to standardize “tutorial register → specification register” rewrites across compound-engineering skill files, including a stakeholder explainer and expanded token-efficiency ideation/sequencing.

Changes:

Added a detailed correction methodology with pattern taxonomy, decision framework, examples, and updated savings estimates.
Added a stakeholder-facing explainer describing the concept and expected impact.
Expanded the token-efficiency ideation doc with new ideas, rejections, and an implementation sequencing plan.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
docs/references/register-mismatch-correction-methodology.md	New methodology: pattern classes, rewrite rules, examples, and corrected savings estimates.
docs/references/register-mismatch-correction-explainer.md	New explainer for stakeholders describing register mismatch correction and projected impact.
docs/ideation/2026-04-08-token-efficiency-ideation.md	Expanded ideation with additional ideas, dependency graph, and updated sequencing content.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-12T23:26:56Z

+Do not use `gh pr diff` as the review scope after checkout.
+<!-- why: gh pr diff reflects remote PR state only, missing local fix commits until pushed -->


The HTML comment placement rule says the  line should be immediately before the constraint, but Worked Borderline Example A shows the comment after the constraint. Please align the example with the rule (or adjust the rule if both placements are acceptable) to avoid contributors copying an inconsistent pattern.

Suggested change

Do not use `gh pr diff` as the review scope after checkout.





Do not use `gh pr diff` as the review scope after checkout.

Copilot · 2026-04-12T23:26:57Z

+*Classification:* Borderline compliance-aiding. Establishes a conceptual distinction (severity vs routing) that could help correct application of routing rules. The routing rules table is explicit and self-contained.
+*Rewritten (105 bytes):*
+```
+<!-- why: severity and routing are orthogonal axes -- severity = urgency, routing = actor + mutation permission -->
+```


These before/after examples show a “rewritten” block that contains only an HTML comment and no resulting specification-register instruction. That makes the example incomplete and conflicts with the stated format (“rewritten text”)—either include the corresponding spec instruction that this comment justifies, or mark the rewrite as a deletion if no instruction should remain.

Copilot · 2026-04-12T23:26:57Z

+**Example 2 -- Borderline (severity signal)**
+
+*Original (63 bytes):*
+> A finding pointing to the wrong line is worse than no finding.
+
+*Pattern class:* Motivational framing
+*Classification:* Borderline. Functions as a severity signal that might calibrate verification thoroughness.
+*Rewritten (80 bytes):*
+```
+<!-- why: wrong line numbers are worse than no finding -- they misdirect the developer -->
+```


This before/after example’s “Rewritten” output includes only an HTML comment and omits the actual directive (e.g., the associated instruction to verify line numbers). Please include the specification-register instruction alongside the comment so the example demonstrates the intended transformation end-to-end.

Copilot · 2026-04-12T23:26:57Z

+**After (specification register):**
+> Pass the path list to `project-standards` in a `<standards-paths>` block in its review context. The persona reads files itself, targeting sections relevant to changed file types.
+
+The stripped sentences ("This keeps the orchestrator's work cheap..." and "avoids bloating...") explain *why* the design works this way. Valuable for a human maintainer, zero value for Claude executing the instruction. In the rewrite, that rationale moves to an HTML comment (`<!-- why: ... -->`) -- invisible to the model, preserved for contributors.


This explainer claims HTML comments are “invisible to the model,” but HTML comments are tokenized as plain text in the context window. Please rephrase this to reflect that comments are still seen/tokenized by the model (even if they’re intended for humans) to avoid misleading readers about token/cost impact.

Suggested change

The stripped sentences ("This keeps the orchestrator's work cheap..." and "avoids bloating...") explain *why* the design works this way. Valuable for a human maintainer, zero value for Claude executing the instruction. In the rewrite, that rationale moves to an HTML comment (``) -- invisible to the model, preserved for contributors.

The stripped sentences ("This keeps the orchestrator's work cheap..." and "avoids bloating...") explain *why* the design works this way. Valuable for a human maintainer, but not needed in the main instruction Claude should follow. In the rewrite, that rationale can move to an HTML comment (``) for contributors, with the caveat that comments are still part of the text seen/tokenized by the model unless a preprocessing step strips them first.

Copilot · 2026-04-12T23:26:57Z

+4. **Lower cost per session.** Tokens cost money. A 20-30% reduction across 55KB of ce-review content, carried across a 30+ message review session, adds up.
+
+## The estimated impact
+
+| Skill | Current size | Est. reduction | Savings |
+|-------|-------------|---------------|---------|
+| ce-review | 55KB | 20-25% | 11-14KB |
+| ce-compound-refresh | 48KB | 20-25% | 10-12KB |
+| orchestrating-swarms | 48KB | 25-30% | 12-14KB |
+| ce-plan | 42KB | 20-25% | 8-10KB |
+| ce-work-beta | 32KB | 15-20% | 5-6KB |
+| ce-compound | 31KB | 20-25% | 6-8KB |
+| ce-work | 27KB | 15-20% | 4-5KB |
+| **Top 7 total** | **283KB** | | **~56-69KB** |
+
+That 56-69KB reduction is *per session, per message*. In a 30-message ce-review session, that's 1.7-2MB of tokens you're not paying for.


The methodology document in this PR corrects the projected savings to ~8–13% tutorial-register and ~19.6–25.4KB net savings across the top 7 skills, but this explainer still presents the older 20–30% reduction and ~56–69KB savings figures. Please update the estimates here (or clearly label them as superseded/early projections) so stakeholders aren’t left with conflicting numbers.

Suggested change

4. **Lower cost per session.** Tokens cost money. A 20-30% reduction across 55KB of ce-review content, carried across a 30+ message review session, adds up.

## The estimated impact

| Skill | Current size | Est. reduction | Savings |

|-------|-------------|---------------|---------|

| ce-review | 55KB | 20-25% | 11-14KB |

| ce-compound-refresh | 48KB | 20-25% | 10-12KB |

| orchestrating-swarms | 48KB | 25-30% | 12-14KB |

| ce-plan | 42KB | 20-25% | 8-10KB |

| ce-work-beta | 32KB | 15-20% | 5-6KB |

| ce-compound | 31KB | 20-25% | 6-8KB |

| ce-work | 27KB | 15-20% | 4-5KB |

| **Top 7 total** | **283KB** | | **~56-69KB** |

That 56-69KB reduction is *per session, per message*. In a 30-message ce-review session, that's 1.7-2MB of tokens you're not paying for.

4. **Lower cost per session.** Tokens cost money. The current methodology projects roughly **8-13% tutorial-register savings**, or about **19.6-25.4KB net savings across the top 7 skills**. Carried across a 30+ message review session, that still adds up.

## The estimated impact

| Scope | Current size | Est. reduction | Savings |

|-------|--------------|----------------|---------|

| **Top 7 skills total** | **283KB** | **~8-13% tutorial-register** | **~19.6-25.4KB net** |

That **~19.6-25.4KB** reduction is the current projected **net savings across the top 7 skills**. Because loaded skill content is re-sent throughout a session, those savings compound across repeated messages even though they are smaller than the earlier exploratory estimate.

Copilot · 2026-04-12T23:26:57Z

+
+1. **Universal impact**: Affects all 43 skills, not just one
+2. **Improves behavior**: Removing hedging markers ("You might want to...") INCREASES instruction compliance -- compression that improves quality
+3. **Incrementally shippable**: Start with ce-review (55KB, ~11-16KB reduction), then ce-plan, ce-work. See value after the first skill


This section still cites the older register-mismatch savings numbers (~56–85KB corpus; ~11–16KB for ce-review). In this PR, the methodology document’s corrected sampling suggests materially lower net savings (~19.6–25.4KB across the top 7). Please update these figures (or annotate them as pre-correction estimates) to keep the ideation doc internally consistent with the newer evidence.

Suggested change

3. **Incrementally shippable**: Start with ce-review (55KB, ~11-16KB reduction), then ce-plan, ce-work. See value after the first skill

3. **Incrementally shippable**: Start with ce-review (55KB) and then expand to ce-plan, ce-work, and the rest of the top 7. Corrected sampling in the methodology doc suggests lower net savings than the earlier pre-correction estimate: ~19.6-25.4KB across the top 7 overall, not the older ~11-16KB-for-ce-review / ~56-85KB-corpus figures

Copilot · 2026-04-12T23:26:58Z

+Verify each cited line number against the file content.
+<!-- why: wrong line numbers are worse than no finding -- they misdirect the developer -->


This worked example places the  comment after the instruction, but later the document standardizes on putting the comment immediately before the constraint. Please make the examples consistent with the stated rule to avoid contributors copying mixed patterns.

Suggested change

Verify each cited line number against the file content.





Verify each cited line number against the file content.

armstrongl · 2026-04-14T21:34:48Z

@copilot apply changes based on the comments in this thread

Copilot · 2026-04-14T21:38:07Z

@copilot apply changes based on the comments in this thread

Applied all changes from the review thread in commit 7658600:

Comment placement (methodology lines 133-134, 181-182): moved  to appear immediately before its constraint in both Borderline worked examples, consistent with the stated rule.
Incomplete Rewritten blocks (lines 244-248, 292-302): added the missing spec-register instruction to each block so examples demonstrate the full transformation end-to-end (comment + directive, not comment alone).
"Invisible to the model" (explainer line 46): replaced with accurate language clarifying comments are still tokenized; the value is preserving contributor intent, not hiding content from Claude.
Stale savings figures (explainer lines 60-75, ideation doc line 632): updated from the old 20-30% / 56-69KB numbers to the corrected ~8-13% tutorial-register / ~19.6-25.4KB net savings across top 7 skills, consistent with the methodology document.

armstrongl · 2026-04-20T00:03:04Z

New: document-review write-once dispatch + compact returns

Added 5 commits applying write-once dispatch and compact returns to the document-review skill (following ce-review's proven pattern):

37f9c55 Add _meta.return_tiers to findings-schema.json + "Sub-Agent Communication Patterns" to AGENTS.md
db7d17a Restructure subagent template for dispatch-context + artifact write + compact return
0f43cd6 Add run ID generation, dispatch context assembly, lean prompt dispatch to SKILL.md; replace @ inlines with backtick refs
7734ea2 Update synthesis pipeline with evidence batch-load, artifact-based dedup, headless evidence output
e2a5996 Fix suggested_fix optionality alignment across files (review finding)

Estimated savings: ~63-75KB per 5-agent document review session.

Plan: docs/plans/2026-04-19-002-refactor-compact-returns-generalization-plan.md

Extract inline content from agent-native-architecture (18.3KB -> 5.0KB), orchestrating-swarms (35.6KB -> 5.0KB), and dspy-ruby (21.5KB -> 5.0KB) into queryable reference files. Combined SKILL.md reduced from 75.4KB to 14.9KB (80% reduction). 19 new reference files created.

armstrongl · 2026-04-21T15:51:55Z

@copilot resolve the merge conflicts in this pull request

Copilot · 2026-04-21T16:39:11Z

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in commit db2cc3e. 957/958 tests pass.

Changes made:

ce-code-review/SKILL.md: used main's version as the base (which has the complete behavioral contracts), then applied the Script-First Extraction changes from our branch — scripts/resolve-pr-base.sh reference replacing inline PR base resolution code, correct mode:report-only cannot switch messages, 7b tie-break rule
ce-work/SKILL.md: restored the Settings Resolution Chain config keys table (including work_delegate_decision); updated frontend-design → ce-frontend-design
ce-code-review/scripts/resolve-pr-base.sh: added (moved from ce-review/scripts/)
src/utils/legacy-cleanup.ts: fixed retired-skill handling — when a prompt's current skill is gone, fall back to historical aliases so stale ce-work-beta wrappers are still cleaned up
Tests: removed 3 ce-work-beta tests (skill intentionally deleted), updated path references for renamed skill directories (git-clean-gone-branches → ce-clean-gone-branches, ce-review/scripts/ → ce-code-review/scripts/), fixed bare-remote HEAD setup in the resolve-context test

The 1 remaining failure (resolve-pr-base.sh > base repo matches non-origin remote (fork)) is a pre-existing SSH timeout in this sandbox — the script attempts a real git fetch against the fake git@github.com:UpstreamOrg/repo.git URL before falling back to local refs.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

github.com (HTTP Only)
- Triggering command: /usr/bin/ssh /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com git-upload-pack 'UpstreamOrg/repo.git' --local credential.helper ndor/bin/bash (packet block)
- Triggering command: /usr/bin/ssh /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com git-upload-pack 'UpstreamOrg/repo.git' son git rgo/bin/bash get --global if (a[i] && a-m git kt5v�� add origin p/bin/bash (packet block)
- Triggering command: /usr/bin/ssh /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com git-upload-pack 'UpstreamOrg/repo.git' 6 bash de/node/bin/bash k /nonexistent/gh -context.sh e/git /hom�� /usr/bin/tail bash (packet block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

armstrongl · 2026-04-21T18:12:41Z

@copilot resolve the merge conflicts in this pull request

…-plan tracking Add the implementation plan for idea EveryInc#20 (Carrying Cost Budgeting) which introduces a `bun run skill:stats` command that ranks all skills and agents by carrying cost (`file_size x estimated_tool_calls`) instead of raw file size. The plan was refined through three parallel review passes (coherence, feasibility, scope) that applied 14 auto-fixes: - Collapsed two-file architecture to a single self-contained script - Broadened tool detection from verb+tool patterns (near-zero recall on real content) to capitalized tool names at word boundaries - Replaced loop scope-based multiplication with additive weights - Added `disable-model-invocation` frontmatter filter for tutorial skills - Clarified sort key as single `system_cost` metric (not two competing metrics) - Documented `@`-expansion undercount as a known limitation The meta-plan tracking table is updated: idea EveryInc#20 brainstorm and plan columns both marked done. Next in the roadmap: S4 (brainstorm ideas EveryInc#9, EveryInc#6, EveryInc#5, EveryInc#10 in parallel).

Deletes 18 brainstorm requirement docs and 35 implementation plan docs that have been fully executed and are no longer needed as active references. These documents served their purpose during feature development and their outcomes are captured in the implemented code, commit history, and solution docs. This reduces repo clutter and context noise, aligning with the token-improvements initiative to minimize carrying cost of stale documentation.

…complete examples, stale figures Agent-Logs-Url: https://github.com/nuggylib/compound-engineering-plugin/sessions/b847cedf-523d-43f5-a329-ddb882388459 Co-authored-by: armstrongl <29762984+armstrongl@users.noreply.github.com>

ce-work-beta was a parallel experiment that diverged minimally from ce-work. Absorb its unique content (Codex delegation, shipping workflow) into ce-work and remove the beta skill entirely. Removes ~816 bytes of structural duplication.

… body Enforce the 250-char description guideline from AGENTS.md. Trigger phrases, implementation details, and feature lists move to a When to Use section in the skill body. Agent descriptions condensed similarly. Adds the 250-char limit to the AGENTS.md compliance checklist.

Add always-loaded budget tracking to release:validate. Reports character usage against the budget limit and surfaces warnings for oversized skills. Updates meta execution plan tracking.

Remove "current year is 2026" from 10 files (3 skills, 7 agents) -- models receive date through system context. Replace TodoWrite with TaskCreate/ TaskUpdate/TaskList in ce-work. Preserves enforcement mention in project-standards-reviewer.

…lation New staleness module checks for hardcoded year stamps (hard fail), deprecated tool refs like TodoWrite (hard fail), oversized skills without references (warn), and boilerplate density (warn). Wired into release:validate alongside existing guardrails. 22 tests.

…ills and agents Phase B (Unit 2) of the Dead Content Elimination Audit. Condenses verbose cross-platform boilerplate to shorter canonical forms across 34 files (26 skills, 5 references, 7 agents), saving ~3,675 bytes. Three categories of boilerplate were condensed: **2a — Question-tool boilerplate (26 files, 34 occurrences)** The verbose pattern naming each platform's question tool with full context strings — e.g. "`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini" — was condensed to the shorter parenthetical form: (AskUserQuestion / request_user_input / ask_user) Standard fallback sentences ("If no question tool is available, present numbered options in chat and wait for the user's reply before proceeding") were shortened to "Fallback: present numbered options and wait for a reply." Unique fallback behaviors (e.g., frontend- design's "assume partial mode", ce-setup's multiSelect instructions) were preserved verbatim. **2b — Native tool hints (10 files, 15 occurrences)** Verbose native tool descriptions like "Use the native file-search/glob tool (e.g., Glob in Claude Code)" were condensed to "Use native file-search (e.g., Glob)". Tool Selection footer paragraphs in 5 research agents were shortened from ~200 chars to ~120 chars while retaining the same instruction. **2c — Repo-relative warnings and other boilerplate (2 files)** Duplicate repo-relative path warnings in ce-brainstorm and ce-plan were condensed. The primary IMPORTANT block was shortened in ce-plan to reference the detailed Planning Rules section instead of repeating the full explanation. - Every condensed instance retains the same behavioral instruction: tool names, fallback behaviors, and cross-platform equivalents are all preserved. Only the verbose framing was removed. - AGENTS.md (the source pattern) was not modified. - Condensed forms survive `bun run convert --to codex` — verified by running the converter and confirming the short forms appear in the codex output. - All 3 test failures in `bun test` are pre-existing (confirmed by running on the clean pre-change state). - `bun run release:validate` passes. Part of: Dead Content Elimination Audit (plan EveryInc#4) Prior commits: c79f06d (Unit 1), 819d3c9 (Unit 3), 1facadb (Unit 6)

Apply four compression patterns (category-name, enumeration, example, process) across 20 non-T1 review agents and 5 top skills. Graduated three-phase execution with ablation validation at each phase. Total: 90,771 bytes saved (33% of 273,888B instruction corpus). Combined with EveryInc#19+EveryInc#27: ~127KB total agent+skill reduction. Phase 1 (P1 category-name): 12,977B across 17 agents. Phase 2 (P2+P4 enum+example): 21,565B across 10 files. Phase 3 (P3 process): 57,812B across 11 files. Ablation noise floor calibrated at 0.47 composite (identical-content self-comparison), rendering the plan's 0.95 threshold unreachable with single-run evaluations. All scores within noise floor range. 762/764 tests pass (2 pre-existing resolve-base.sh failures).

Extract deterministic shell recipes from three skills into co-located Bash scripts, replacing inline shell blocks in SKILL.md with single-line script invocations. This moves procedural logic out of the LLM context window so it executes deterministically instead of being carried (and re-tokenized) across every tool call in a session. Scripts created/modified: - git-clean-gone-branches: added `delete` subcommand to existing clean-gone script. SKILL.md Step 3 shrinks from 12 lines of inline shell (worktree check, force-remove, branch -D loop) to 3 lines referencing `bash scripts/clean-gone delete <branches>`. - git-commit-push-pr: new scripts/resolve-context.sh consolidates the 4-fallback default-branch cascade (origin/HEAD -> gh repo view -> common names -> hardcoded) and the 4-priority base-branch/remote detection (PR metadata -> remote default -> gh -> common names) into one script invocation with optional short-circuit flags. SKILL.md Steps 1 and 6 now parse structured KEY:value output instead of embedding multi-step shell cascades. - ce-review: new scripts/resolve-pr-base.sh handles the PR-path fork-safe remote resolution that was previously a 17-line inline shell block with template variables. Accepts --base and --base-repo as CLI arguments, outputs BASE:<sha> or ERROR:<message>. Standalone and branch paths remain unchanged (still use references/resolve-base.sh). All scripts follow established conventions: set -euo pipefail, structured text output (KEY:value), exit 0 with ERROR: prefix for failures, bash 3.2+ compatible (no associative arrays, no bash 4+ features). Test infrastructure: - Extracted shared test helpers (gitEnv, runCommand, runGit, initRepo, commitFile, writeExecutable, createStubBin) from resolve-base-script tests into tests/helpers/setup-test-repo.ts. - Added 18 golden-output tests across 3 new test files covering discovery, deletion, flag short-circuiting, fork resolution, error paths, and edge cases (shallow clones, missing remotes, worktree cleanup). Net SKILL.md reduction: ~1,492 bytes across 3 skills. Below the plan's 8,050B projection because replacement invocation instructions consume space, but the carrying cost improvement is the real win: deterministic shell logic now executes outside the context window rather than being re-tokenized on every message. 780/782 tests pass (2 pre-existing failures in resolve-base-script.test.ts due to git 2.53.0 --no-tags fetch refspec behavior change, unrelated).

Add 4-column cartouche table (agent-name, trigger, output, focus) at the top of Phase 1 listing all 6 agents ce-plan dispatches. Replace 5 inline Task dispatch call lines with lean references to agents by name from the cartouche table. Surrounding orchestration logic stays inline unchanged.

…able Replace 4 separate pipe-delimited reviewer tables with a single 21-agent cartouche table. Refactor persona-catalog.md to retain only detailed selection criteria and 5 selection rules, removing summary tables now redundant with the cartouche. Use FQNs consistently in both the cartouche and persona-catalog criteria entries.

Replace multi-paragraph activation criteria and agent list with a single 7-agent cartouche table (2 always-on, 5 conditional). Extract detailed criteria to references/persona-routing.md with bulk lookup instruction and graceful fallback for read failures.

… tables Extract Phase 1 inline task specs to references/research-tasks.md. Consolidate Phase 3 routing table and Applicable Specialized Agents catalog into a single 10-agent cartouche table, eliminating partial overlap between the two listings.

Replace 8 inline sub-agent prompts (~4,806B) with an 8-row cartouche table. Extract all prompts with audit tasks and output format templates to references/audit-prompts.md. SKILL.md drops from 7,956B to 4,102B.

Add cartouche table listing all dispatched agents. Extract Phase 1 context scan prompt and Phase 2 ideation dispatch blocks to references/dispatch-prompts.md. Keep conditional orchestration logic inline. Description bytes drop from ~5,124B to ~2,023B.

Add Cartouche Format subsection to the Skill Compliance Checklist documenting the 4-column table format, trigger field conventions, extraction threshold for detailed criteria, and bulk lookup pattern.

…lures Add explicit fallback instructions to ce-review, ce-compound, ce-ideate, and agent-native-audit for when their extracted reference files cannot be read. Matches the pattern already present in document-review and satisfies the plan's System-Wide Impact error propagation requirement.

…nication patterns Add _meta.return_tiers to document-review findings-schema.json documenting evidence as the only detail-tier field (all other finding fields are merge-tier). Add "Sub-Agent Communication Patterns" section to AGENTS.md documenting the write-once dispatch + compact returns pattern for multi-agent skills.

…ce dispatch Transform subagent-template.md into dispatch-context template with: - Artifact write instruction (.context/ path with {run_id}/{reviewer_name}.json) - Compact return contract (merge-tier fields only, evidence omitted) - R9 fallback (full inline returns if artifact write fails) - Updated read-only constraint permitting artifact writes - Removed {document_content} slot (agents Read from document path) - Variable reference table documenting pre-resolved vs per-agent variables

… SKILL.md Replace inline dispatch with write-once dispatch context assembly: - Generate run ID, assemble dispatch context (template + schema), write to disk - Lean prompt per agent (~1.5KB) instead of full inlined template (~8-25KB) - R8 fallback to inline dispatch if context write fails - Replace @ inlines with backtick path references (~7.5KB saved per message) - Add cleanup instruction after Phase 5 completion - Pass run ID to synthesis pipeline for artifact-based evidence loading

…ed evidence Add evidence batch-load step (3.3) before dedup to load evidence from per-agent artifact files. Update dedup to union evidence from artifacts using reviewer name + content fingerprint matching. Update headless output to include evidence lines per finding. Add R9 fallback detection for inline evidence when artifacts are missing. Fix step numbering and cross-references after insertion.

…rn spec Mark suggested_fix as "(when present)" in the subagent template compact return field list to match the schema's _meta.return_tiers and the synthesis validation step which correctly omits it from required fields. Also mark plan as completed with all unit checkboxes checked.

…cuted

Introduces taskmd as the project's task management system with: - .taskmd.yaml: Project-level configuration defining scopes for all repo surfaces (cli, compound-engineering, coding-tutor, marketplace, tests, docs, ci, scripts), ULID-based ID generation, solo workflow mode, and worklog support. - tasks/CLAUDE.md: Agent-facing instructions for working with taskmd, covering file format, CLI commands, task lifecycle (start/complete), dependency handling, phases, worklogs, and validation. - tasks/TASKMD_SPEC.md: Trimmed specification reference documenting all frontmatter fields, status flow, verify checks, file organization conventions, phase configuration, and validation rules. This enables structured task tracking for the token-improvements work and provides agents with the context needed to create, manage, and verify tasks within the repo's established scopes.

Extract inline content from agent-native-architecture (18.3KB -> 5.0KB), orchestrating-swarms (35.6KB -> 5.0KB), and dspy-ruby (21.5KB -> 5.0KB) into queryable reference files. Combined SKILL.md reduced from 75.4KB to 14.9KB (80% reduction). 19 new reference files created.

… justified Evaluated decision point: prior phases reduced carrying cost but always-loaded budget remains at 86%. Module unbundling would drop core-only to 57% but requires namespace changes across ~20 files, 5 marketplace entries, and converter testing for 10 targets. Deferred until budget nears capacity or platform adds native module support.

armstrongl requested a review from Copilot April 12, 2026 23:24

Copilot started reviewing on behalf of armstrongl April 12, 2026 23:25 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

armstrongl changed the title ~~feat(register-mismatch): add correction methodology and savings estimates~~ feat(token-efficiency): carrying cost budgeting plan, register mismatch methodology, and skill rewrites Apr 14, 2026

armstrongl changed the title ~~feat(token-efficiency): carrying cost budgeting plan, register mismatch methodology, and skill rewrites~~ feat(token-efficiency): register mismatch corrections, carrying cost budgeting, and docs cleanup Apr 14, 2026

Copilot started work on behalf of armstrongl April 14, 2026 21:34 View session

Copilot finished work on behalf of armstrongl April 14, 2026 21:39

armstrongl force-pushed the token-improvements branch from b5db8f9 to f2fec95 Compare April 15, 2026 22:01

Copilot started work on behalf of armstrongl April 21, 2026 15:52 View session

Copilot finished work on behalf of armstrongl April 21, 2026 16:40

Copilot started work on behalf of armstrongl April 21, 2026 18:12 View session

armstrongl and others added 11 commits April 21, 2026 16:56

feat(ce-work): merge ce-work-beta into ce-work

957a5a5

ce-work-beta was a parallel experiment that diverged minimally from ce-work. Absorb its unique content (Codex delegation, shipping workflow) into ce-work and remove the beta skill entirely. Removes ~816 bytes of structural duplication.

feat(validate): wire token guardrails into release validation pipeline

2b74060

Add always-loaded budget tracking to release:validate. Reports character usage against the budget limit and surfaces warnings for oversized skills. Updates meta execution plan tracking.

feat(ce-work): add codex delegation workflow reference from merged beta

2bd97a1

feat(validate): add token guardrails implementation and tests

c0f9ffa

armstrongl added 28 commits April 21, 2026 17:08

sync(NEXT.md): update NEXT.md

acf1c3a

sync(NEXT.md): update NEXT.md

ac018c6

refactor(agent-native-audit): extract 8 inline prompts to reference file

4b05f90

Replace 8 inline sub-agent prompts (~4,806B) with an 8-row cartouche table. Extract all prompts with audit tasks and output format templates to references/audit-prompts.md. SKILL.md drops from 7,956B to 4,102B.

refactor(skill-compliance): codify cartouche format in AGENTS.md

56eedab

Add Cartouche Format subsection to the Skill Compliance Checklist documenting the 4-column table format, trigger field conventions, extraction threshold for detailed criteria, and bulk lookup pattern.

sync(NEXT.md): update NEXT.md and meta-plan for EveryInc#28 completion

2b13b65

sync(NEXT.md): consolidate session progress and clarify next steps

5a37153

sync(NEXT.md): mark EveryInc#11 compact returns generalization as exe…

b06485d

…cuted

sync(NEXT.md): update NEXT.md

1f80325

sync(docs,NEXT.md): update 2 files

da47b7d

sync(.taskmd.yaml,tasks): update 19 files

6ef7241

sync(tasks): update 21 files

626bf86

sync(tasks): update 21 files

2d368ff

armstrongl force-pushed the token-improvements branch from 7882a08 to 2d368ff Compare April 21, 2026 22:30

sync(NEXT.md): update NEXT.md

d6ad60f

		Do not use `gh pr diff` as the review scope after checkout.
		<!-- why: gh pr diff reflects remote PR state only, missing local fix commits until pushed -->

	3. Incrementally shippable: Start with ce-review (55KB, ~11-16KB reduction), then ce-plan, ce-work. See value after the first skill
	3. Incrementally shippable: Start with ce-review (55KB) and then expand to ce-plan, ce-work, and the rest of the top 7. Corrected sampling in the methodology doc suggests lower net savings than the earlier pre-correction estimate: ~19.6-25.4KB across the top 7 overall, not the older ~11-16KB-for-ce-review / ~56-85KB-corpus figures

		Verify each cited line number against the file content.
		<!-- why: wrong line numbers are worse than no finding -- they misdirect the developer -->

Conversation

armstrongl commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What these changes mean

Changes by commit

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

armstrongl commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

armstrongl commented Apr 20, 2026

New: document-review write-once dispatch + compact returns

Uh oh!

armstrongl commented Apr 21, 2026

Uh oh!

Copilot AI commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

armstrongl commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

armstrongl commented Apr 12, 2026 •

edited

Loading

Copilot AI commented Apr 21, 2026 •

edited

Loading