Skip to content

feat(token-efficiency): register mismatch corrections, carrying cost budgeting, and docs cleanup#1

Open
armstrongl wants to merge 52 commits intomainfrom
token-improvements
Open

feat(token-efficiency): register mismatch corrections, carrying cost budgeting, and docs cleanup#1
armstrongl wants to merge 52 commits intomainfrom
token-improvements

Conversation

@armstrongl
Copy link
Copy Markdown

@armstrongl armstrongl commented Apr 12, 2026

Summary

This branch delivers the first concrete execution phase of the token efficiency initiative, targeting three areas that compound to reduce token waste across plugin skills.

  • Register mismatch correction methodology: Introduces a systematic framework for identifying and fixing mismatches between a skill's described purpose (in its description/trigger) and its actual prompt content. Applies corrections to the top 7 highest-impact skills (ce-compound, ce-plan, ce-review, ce-work, ce-work-beta, ce-compound-refresh, orchestrating-swarms), reducing wasted tokens from skills that trigger in the wrong context or bloat their prompts with off-register content.

  • Carrying cost budgeting plan: Adds a structured plan for measuring and managing the ongoing token cost of documentation and artifacts that persist in the repo. Establishes a framework for deciding when stale docs should be archived or removed based on their carrying cost vs. reference value.

  • Stale documentation cleanup: Removes 53 completed brainstorm and plan documents (18 brainstorms + 35 plans) that were no longer serving as active references. These documents had been fully executed and their outcomes captured in implemented code, commit history, and solution docs. This directly reduces context noise and repo clutter.

What these changes mean

Token efficiency in an agent plugin isn't about micro-optimizing individual prompts. It's about ensuring the entire system, from skill descriptions that control when skills trigger, to the documents that persist in the repo and consume context, operates at the right level of precision. Register mismatches cause skills to activate unnecessarily or waste tokens on irrelevant content when they do activate. Stale docs impose a carrying cost every time an agent reads the repo structure. This PR addresses both structural sources of waste.

Changes by commit

  1. sync(docs) - Expanded token efficiency ideation with cross-domain ideas and sequencing
  2. docs(register-mismatch) - Conceptual explainer for register mismatch correction
  3. feat(register-mismatch) - Correction methodology with pattern classification and savings estimates
  4. feat(register-mismatch) - Applied correction methodology to top 7 skills
  5. feat(carrying-cost) - Carrying cost budgeting plan and meta-plan tracking
  6. chore(docs) - Removed 53 completed brainstorms and plans

Test plan

  • Verify corrected skills trigger appropriately in context (not over-triggering)
  • Run bun test to confirm no regressions from skill content changes
  • Spot-check that removed docs are not referenced by any remaining files
  • Review register mismatch methodology for applicability to remaining skills

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds reference documentation to standardize “tutorial register → specification register” rewrites across compound-engineering skill files, including a stakeholder explainer and expanded token-efficiency ideation/sequencing.

Changes:

  • Added a detailed correction methodology with pattern taxonomy, decision framework, examples, and updated savings estimates.
  • Added a stakeholder-facing explainer describing the concept and expected impact.
  • Expanded the token-efficiency ideation doc with new ideas, rejections, and an implementation sequencing plan.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
docs/references/register-mismatch-correction-methodology.md New methodology: pattern classes, rewrite rules, examples, and corrected savings estimates.
docs/references/register-mismatch-correction-explainer.md New explainer for stakeholders describing register mismatch correction and projected impact.
docs/ideation/2026-04-08-token-efficiency-ideation.md Expanded ideation with additional ideas, dependency graph, and updated sequencing content.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +133 to +134
Do not use `gh pr diff` as the review scope after checkout.
<!-- why: gh pr diff reflects remote PR state only, missing local fix commits until pushed -->
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML comment placement rule says the <!-- why: ... --> line should be immediately before the constraint, but Worked Borderline Example A shows the comment after the constraint. Please align the example with the rule (or adjust the rule if both placements are acceptable) to avoid contributors copying an inconsistent pattern.

Suggested change
Do not use `gh pr diff` as the review scope after checkout.
<!-- why: gh pr diff reflects remote PR state only, missing local fix commits until pushed -->
<!-- why: gh pr diff reflects remote PR state only, missing local fix commits until pushed -->
Do not use `gh pr diff` as the review scope after checkout.

Copilot uses AI. Check for mistakes.
Comment on lines +244 to +248
*Classification:* Borderline compliance-aiding. Establishes a conceptual distinction (severity vs routing) that could help correct application of routing rules. The routing rules table is explicit and self-contained.
*Rewritten (105 bytes):*
```
<!-- why: severity and routing are orthogonal axes -- severity = urgency, routing = actor + mutation permission -->
```
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These before/after examples show a “rewritten” block that contains only an HTML comment and no resulting specification-register instruction. That makes the example incomplete and conflicts with the stated format (“rewritten text”)—either include the corresponding spec instruction that this comment justifies, or mark the rewrite as a deletion if no instruction should remain.

Copilot uses AI. Check for mistakes.
Comment on lines +292 to +302
**Example 2 -- Borderline (severity signal)**

*Original (63 bytes):*
> A finding pointing to the wrong line is worse than no finding.

*Pattern class:* Motivational framing
*Classification:* Borderline. Functions as a severity signal that might calibrate verification thoroughness.
*Rewritten (80 bytes):*
```
<!-- why: wrong line numbers are worse than no finding -- they misdirect the developer -->
```
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This before/after example’s “Rewritten” output includes only an HTML comment and omits the actual directive (e.g., the associated instruction to verify line numbers). Please include the specification-register instruction alongside the comment so the example demonstrates the intended transformation end-to-end.

Copilot uses AI. Check for mistakes.
**After (specification register):**
> Pass the path list to `project-standards` in a `<standards-paths>` block in its review context. The persona reads files itself, targeting sections relevant to changed file types.

The stripped sentences ("This keeps the orchestrator's work cheap..." and "avoids bloating...") explain *why* the design works this way. Valuable for a human maintainer, zero value for Claude executing the instruction. In the rewrite, that rationale moves to an HTML comment (`<!-- why: ... -->`) -- invisible to the model, preserved for contributors.
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explainer claims HTML comments are “invisible to the model,” but HTML comments are tokenized as plain text in the context window. Please rephrase this to reflect that comments are still seen/tokenized by the model (even if they’re intended for humans) to avoid misleading readers about token/cost impact.

Suggested change
The stripped sentences ("This keeps the orchestrator's work cheap..." and "avoids bloating...") explain *why* the design works this way. Valuable for a human maintainer, zero value for Claude executing the instruction. In the rewrite, that rationale moves to an HTML comment (`<!-- why: ... -->`) -- invisible to the model, preserved for contributors.
The stripped sentences ("This keeps the orchestrator's work cheap..." and "avoids bloating...") explain *why* the design works this way. Valuable for a human maintainer, but not needed in the main instruction Claude should follow. In the rewrite, that rationale can move to an HTML comment (`<!-- why: ... -->`) for contributors, with the caveat that comments are still part of the text seen/tokenized by the model unless a preprocessing step strips them first.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +75
4. **Lower cost per session.** Tokens cost money. A 20-30% reduction across 55KB of ce-review content, carried across a 30+ message review session, adds up.

## The estimated impact

| Skill | Current size | Est. reduction | Savings |
|-------|-------------|---------------|---------|
| ce-review | 55KB | 20-25% | 11-14KB |
| ce-compound-refresh | 48KB | 20-25% | 10-12KB |
| orchestrating-swarms | 48KB | 25-30% | 12-14KB |
| ce-plan | 42KB | 20-25% | 8-10KB |
| ce-work-beta | 32KB | 15-20% | 5-6KB |
| ce-compound | 31KB | 20-25% | 6-8KB |
| ce-work | 27KB | 15-20% | 4-5KB |
| **Top 7 total** | **283KB** | | **~56-69KB** |

That 56-69KB reduction is *per session, per message*. In a 30-message ce-review session, that's 1.7-2MB of tokens you're not paying for.
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methodology document in this PR corrects the projected savings to ~8–13% tutorial-register and ~19.6–25.4KB net savings across the top 7 skills, but this explainer still presents the older 20–30% reduction and ~56–69KB savings figures. Please update the estimates here (or clearly label them as superseded/early projections) so stakeholders aren’t left with conflicting numbers.

Suggested change
4. **Lower cost per session.** Tokens cost money. A 20-30% reduction across 55KB of ce-review content, carried across a 30+ message review session, adds up.
## The estimated impact
| Skill | Current size | Est. reduction | Savings |
|-------|-------------|---------------|---------|
| ce-review | 55KB | 20-25% | 11-14KB |
| ce-compound-refresh | 48KB | 20-25% | 10-12KB |
| orchestrating-swarms | 48KB | 25-30% | 12-14KB |
| ce-plan | 42KB | 20-25% | 8-10KB |
| ce-work-beta | 32KB | 15-20% | 5-6KB |
| ce-compound | 31KB | 20-25% | 6-8KB |
| ce-work | 27KB | 15-20% | 4-5KB |
| **Top 7 total** | **283KB** | | **~56-69KB** |
That 56-69KB reduction is *per session, per message*. In a 30-message ce-review session, that's 1.7-2MB of tokens you're not paying for.
4. **Lower cost per session.** Tokens cost money. The current methodology projects roughly **8-13% tutorial-register savings**, or about **19.6-25.4KB net savings across the top 7 skills**. Carried across a 30+ message review session, that still adds up.
## The estimated impact
| Scope | Current size | Est. reduction | Savings |
|-------|--------------|----------------|---------|
| **Top 7 skills total** | **283KB** | **~8-13% tutorial-register** | **~19.6-25.4KB net** |
That **~19.6-25.4KB** reduction is the current projected **net savings across the top 7 skills**. Because loaded skill content is re-sent throughout a session, those savings compound across repeated messages even though they are smaller than the earlier exploratory estimate.

Copilot uses AI. Check for mistakes.

1. **Universal impact**: Affects all 43 skills, not just one
2. **Improves behavior**: Removing hedging markers ("You might want to...") INCREASES instruction compliance -- compression that improves quality
3. **Incrementally shippable**: Start with ce-review (55KB, ~11-16KB reduction), then ce-plan, ce-work. See value after the first skill
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section still cites the older register-mismatch savings numbers (~56–85KB corpus; ~11–16KB for ce-review). In this PR, the methodology document’s corrected sampling suggests materially lower net savings (~19.6–25.4KB across the top 7). Please update these figures (or annotate them as pre-correction estimates) to keep the ideation doc internally consistent with the newer evidence.

Suggested change
3. **Incrementally shippable**: Start with ce-review (55KB, ~11-16KB reduction), then ce-plan, ce-work. See value after the first skill
3. **Incrementally shippable**: Start with ce-review (55KB) and then expand to ce-plan, ce-work, and the rest of the top 7. Corrected sampling in the methodology doc suggests lower net savings than the earlier pre-correction estimate: ~19.6-25.4KB across the top 7 overall, not the older ~11-16KB-for-ce-review / ~56-85KB-corpus figures

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +182
Verify each cited line number against the file content.
<!-- why: wrong line numbers are worse than no finding -- they misdirect the developer -->
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked example places the <!-- why: ... --> comment after the instruction, but later the document standardizes on putting the comment immediately before the constraint. Please make the examples consistent with the stated rule to avoid contributors copying mixed patterns.

Suggested change
Verify each cited line number against the file content.
<!-- why: wrong line numbers are worse than no finding -- they misdirect the developer -->
<!-- why: wrong line numbers are worse than no finding -- they misdirect the developer -->
Verify each cited line number against the file content.

Copilot uses AI. Check for mistakes.
@armstrongl armstrongl changed the title feat(register-mismatch): add correction methodology and savings estimates feat(token-efficiency): carrying cost budgeting plan, register mismatch methodology, and skill rewrites Apr 14, 2026
@armstrongl armstrongl changed the title feat(token-efficiency): carrying cost budgeting plan, register mismatch methodology, and skill rewrites feat(token-efficiency): register mismatch corrections, carrying cost budgeting, and docs cleanup Apr 14, 2026
@armstrongl
Copy link
Copy Markdown
Author

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown

Copilot AI commented Apr 14, 2026

@copilot apply changes based on the comments in this thread

Applied all changes from the review thread in commit 7658600:

  • Comment placement (methodology lines 133-134, 181-182): moved <!-- why: ... --> to appear immediately before its constraint in both Borderline worked examples, consistent with the stated rule.
  • Incomplete Rewritten blocks (lines 244-248, 292-302): added the missing spec-register instruction to each block so examples demonstrate the full transformation end-to-end (comment + directive, not comment alone).
  • "Invisible to the model" (explainer line 46): replaced with accurate language clarifying comments are still tokenized; the value is preserving contributor intent, not hiding content from Claude.
  • Stale savings figures (explainer lines 60-75, ideation doc line 632): updated from the old 20-30% / 56-69KB numbers to the corrected ~8-13% tutorial-register / ~19.6-25.4KB net savings across top 7 skills, consistent with the methodology document.

@armstrongl
Copy link
Copy Markdown
Author

New: document-review write-once dispatch + compact returns

Added 5 commits applying write-once dispatch and compact returns to the document-review skill (following ce-review's proven pattern):

  • 37f9c55 Add _meta.return_tiers to findings-schema.json + "Sub-Agent Communication Patterns" to AGENTS.md
  • db7d17a Restructure subagent template for dispatch-context + artifact write + compact return
  • 0f43cd6 Add run ID generation, dispatch context assembly, lean prompt dispatch to SKILL.md; replace @ inlines with backtick refs
  • 7734ea2 Update synthesis pipeline with evidence batch-load, artifact-based dedup, headless evidence output
  • e2a5996 Fix suggested_fix optionality alignment across files (review finding)

Estimated savings: ~63-75KB per 5-agent document review session.

Plan: docs/plans/2026-04-19-002-refactor-compact-returns-generalization-plan.md

armstrongl added a commit that referenced this pull request Apr 21, 2026
Extract inline content from agent-native-architecture (18.3KB -> 5.0KB),
orchestrating-swarms (35.6KB -> 5.0KB), and dspy-ruby (21.5KB -> 5.0KB)
into queryable reference files. Combined SKILL.md reduced from 75.4KB to
14.9KB (80% reduction). 19 new reference files created.
@armstrongl
Copy link
Copy Markdown
Author

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown

Copilot AI commented Apr 21, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in commit db2cc3e. 957/958 tests pass.

Changes made:

  • ce-code-review/SKILL.md: used main's version as the base (which has the complete behavioral contracts), then applied the Script-First Extraction changes from our branch — scripts/resolve-pr-base.sh reference replacing inline PR base resolution code, correct mode:report-only cannot switch messages, 7b tie-break rule
  • ce-work/SKILL.md: restored the Settings Resolution Chain config keys table (including work_delegate_decision); updated frontend-designce-frontend-design
  • ce-code-review/scripts/resolve-pr-base.sh: added (moved from ce-review/scripts/)
  • src/utils/legacy-cleanup.ts: fixed retired-skill handling — when a prompt's current skill is gone, fall back to historical aliases so stale ce-work-beta wrappers are still cleaned up
  • Tests: removed 3 ce-work-beta tests (skill intentionally deleted), updated path references for renamed skill directories (git-clean-gone-branchesce-clean-gone-branches, ce-review/scripts/ce-code-review/scripts/), fixed bare-remote HEAD setup in the resolve-context test

The 1 remaining failure (resolve-pr-base.sh > base repo matches non-origin remote (fork)) is a pre-existing SSH timeout in this sandbox — the script attempts a real git fetch against the fake git@github.com:UpstreamOrg/repo.git URL before falling back to local refs.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • github.com (HTTP Only)
    • Triggering command: /usr/bin/ssh /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com git-upload-pack &#39;UpstreamOrg/repo.git&#39; --local credential.helper ndor/bin/bash (packet block)
    • Triggering command: /usr/bin/ssh /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com git-upload-pack &#39;UpstreamOrg/repo.git&#39; son git rgo/bin/bash get --global if (a[i] &amp;&amp; a-m git kt5v�� add origin p/bin/bash (packet block)
    • Triggering command: /usr/bin/ssh /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com git-upload-pack &#39;UpstreamOrg/repo.git&#39; 6 bash de/node/bin/bash k /nonexistent/gh -context.sh e/git /hom�� /usr/bin/tail bash (packet block)

If you need me to access, download, or install something from one of these locations, you can either:

@armstrongl
Copy link
Copy Markdown
Author

@copilot resolve the merge conflicts in this pull request

armstrongl and others added 11 commits April 21, 2026 16:56
…-plan tracking

Add the implementation plan for idea EveryInc#20 (Carrying Cost Budgeting) which
introduces a `bun run skill:stats` command that ranks all skills and agents
by carrying cost (`file_size x estimated_tool_calls`) instead of raw file size.

The plan was refined through three parallel review passes (coherence,
feasibility, scope) that applied 14 auto-fixes:

- Collapsed two-file architecture to a single self-contained script
- Broadened tool detection from verb+tool patterns (near-zero recall on
  real content) to capitalized tool names at word boundaries
- Replaced loop scope-based multiplication with additive weights
- Added `disable-model-invocation` frontmatter filter for tutorial skills
- Clarified sort key as single `system_cost` metric (not two competing
  metrics)
- Documented `@`-expansion undercount as a known limitation

The meta-plan tracking table is updated: idea EveryInc#20 brainstorm and plan
columns both marked done. Next in the roadmap: S4 (brainstorm ideas
EveryInc#9, EveryInc#6, EveryInc#5, EveryInc#10 in parallel).
Deletes 18 brainstorm requirement docs and 35 implementation plan docs
that have been fully executed and are no longer needed as active
references. These documents served their purpose during feature
development and their outcomes are captured in the implemented code,
commit history, and solution docs.

This reduces repo clutter and context noise, aligning with the
token-improvements initiative to minimize carrying cost of stale
documentation.
…complete examples, stale figures

Agent-Logs-Url: https://github.com/nuggylib/compound-engineering-plugin/sessions/b847cedf-523d-43f5-a329-ddb882388459

Co-authored-by: armstrongl <29762984+armstrongl@users.noreply.github.com>
ce-work-beta was a parallel experiment that diverged minimally from ce-work.
Absorb its unique content (Codex delegation, shipping workflow) into ce-work
and remove the beta skill entirely. Removes ~816 bytes of structural duplication.
… body

Enforce the 250-char description guideline from AGENTS.md. Trigger phrases,
implementation details, and feature lists move to a When to Use section in
the skill body. Agent descriptions condensed similarly. Adds the 250-char
limit to the AGENTS.md compliance checklist.
Add always-loaded budget tracking to release:validate. Reports character
usage against the budget limit and surfaces warnings for oversized skills.
Updates meta execution plan tracking.
Remove "current year is 2026" from 10 files (3 skills, 7 agents) -- models
receive date through system context. Replace TodoWrite with TaskCreate/
TaskUpdate/TaskList in ce-work. Preserves enforcement mention in
project-standards-reviewer.
…lation

New staleness module checks for hardcoded year stamps (hard fail),
deprecated tool refs like TodoWrite (hard fail), oversized skills without
references (warn), and boilerplate density (warn). Wired into
release:validate alongside existing guardrails. 22 tests.
…ills and agents

Phase B (Unit 2) of the Dead Content Elimination Audit. Condenses
verbose cross-platform boilerplate to shorter canonical forms across
34 files (26 skills, 5 references, 7 agents), saving ~3,675 bytes.

Three categories of boilerplate were condensed:

**2a — Question-tool boilerplate (26 files, 34 occurrences)**

The verbose pattern naming each platform's question tool with full
context strings — e.g. "`AskUserQuestion` in Claude Code,
`request_user_input` in Codex, `ask_user` in Gemini" — was condensed
to the shorter parenthetical form:
  (AskUserQuestion / request_user_input / ask_user)

Standard fallback sentences ("If no question tool is available,
present numbered options in chat and wait for the user's reply before
proceeding") were shortened to "Fallback: present numbered options
and wait for a reply." Unique fallback behaviors (e.g., frontend-
design's "assume partial mode", ce-setup's multiSelect instructions)
were preserved verbatim.

**2b — Native tool hints (10 files, 15 occurrences)**

Verbose native tool descriptions like "Use the native file-search/glob
tool (e.g., Glob in Claude Code)" were condensed to "Use native
file-search (e.g., Glob)". Tool Selection footer paragraphs in 5
research agents were shortened from ~200 chars to ~120 chars while
retaining the same instruction.

**2c — Repo-relative warnings and other boilerplate (2 files)**

Duplicate repo-relative path warnings in ce-brainstorm and ce-plan
were condensed. The primary IMPORTANT block was shortened in ce-plan
to reference the detailed Planning Rules section instead of repeating
the full explanation.

- Every condensed instance retains the same behavioral instruction:
  tool names, fallback behaviors, and cross-platform equivalents are
  all preserved. Only the verbose framing was removed.
- AGENTS.md (the source pattern) was not modified.
- Condensed forms survive `bun run convert --to codex` — verified by
  running the converter and confirming the short forms appear in the
  codex output.
- All 3 test failures in `bun test` are pre-existing (confirmed by
  running on the clean pre-change state).
- `bun run release:validate` passes.

Part of: Dead Content Elimination Audit (plan EveryInc#4)
Prior commits: c79f06d (Unit 1), 819d3c9 (Unit 3), 1facadb (Unit 6)
Apply four compression patterns (category-name, enumeration, example,
process) across 20 non-T1 review agents and 5 top skills. Graduated
three-phase execution with ablation validation at each phase.

Total: 90,771 bytes saved (33% of 273,888B instruction corpus).
Combined with EveryInc#19+EveryInc#27: ~127KB total agent+skill reduction.

Phase 1 (P1 category-name): 12,977B across 17 agents.
Phase 2 (P2+P4 enum+example): 21,565B across 10 files.
Phase 3 (P3 process): 57,812B across 11 files.

Ablation noise floor calibrated at 0.47 composite (identical-content
self-comparison), rendering the plan's 0.95 threshold unreachable
with single-run evaluations. All scores within noise floor range.
762/764 tests pass (2 pre-existing resolve-base.sh failures).
Extract deterministic shell recipes from three skills into co-located
Bash scripts, replacing inline shell blocks in SKILL.md with single-line
script invocations. This moves procedural logic out of the LLM context
window so it executes deterministically instead of being carried (and
re-tokenized) across every tool call in a session.

Scripts created/modified:
- git-clean-gone-branches: added `delete` subcommand to existing
  clean-gone script. SKILL.md Step 3 shrinks from 12 lines of inline
  shell (worktree check, force-remove, branch -D loop) to 3 lines
  referencing `bash scripts/clean-gone delete <branches>`.
- git-commit-push-pr: new scripts/resolve-context.sh consolidates
  the 4-fallback default-branch cascade (origin/HEAD -> gh repo view
  -> common names -> hardcoded) and the 4-priority base-branch/remote
  detection (PR metadata -> remote default -> gh -> common names) into
  one script invocation with optional short-circuit flags. SKILL.md
  Steps 1 and 6 now parse structured KEY:value output instead of
  embedding multi-step shell cascades.
- ce-review: new scripts/resolve-pr-base.sh handles the PR-path
  fork-safe remote resolution that was previously a 17-line inline
  shell block with template variables. Accepts --base and --base-repo
  as CLI arguments, outputs BASE:<sha> or ERROR:<message>. Standalone
  and branch paths remain unchanged (still use references/resolve-base.sh).

All scripts follow established conventions: set -euo pipefail,
structured text output (KEY:value), exit 0 with ERROR: prefix for
failures, bash 3.2+ compatible (no associative arrays, no bash 4+
features).

Test infrastructure:
- Extracted shared test helpers (gitEnv, runCommand, runGit, initRepo,
  commitFile, writeExecutable, createStubBin) from resolve-base-script
  tests into tests/helpers/setup-test-repo.ts.
- Added 18 golden-output tests across 3 new test files covering
  discovery, deletion, flag short-circuiting, fork resolution,
  error paths, and edge cases (shallow clones, missing remotes,
  worktree cleanup).

Net SKILL.md reduction: ~1,492 bytes across 3 skills. Below the plan's
8,050B projection because replacement invocation instructions consume
space, but the carrying cost improvement is the real win: deterministic
shell logic now executes outside the context window rather than being
re-tokenized on every message.

780/782 tests pass (2 pre-existing failures in resolve-base-script.test.ts
due to git 2.53.0 --no-tags fetch refspec behavior change, unrelated).
Add 4-column cartouche table (agent-name, trigger, output, focus) at
the top of Phase 1 listing all 6 agents ce-plan dispatches. Replace
5 inline Task dispatch call lines with lean references to agents by
name from the cartouche table. Surrounding orchestration logic stays
inline unchanged.
…able

Replace 4 separate pipe-delimited reviewer tables with a single
21-agent cartouche table. Refactor persona-catalog.md to retain only
detailed selection criteria and 5 selection rules, removing summary
tables now redundant with the cartouche. Use FQNs consistently in
both the cartouche and persona-catalog criteria entries.
Replace multi-paragraph activation criteria and agent list with a
single 7-agent cartouche table (2 always-on, 5 conditional). Extract
detailed criteria to references/persona-routing.md with bulk lookup
instruction and graceful fallback for read failures.
… tables

Extract Phase 1 inline task specs to references/research-tasks.md.
Consolidate Phase 3 routing table and Applicable Specialized Agents
catalog into a single 10-agent cartouche table, eliminating partial
overlap between the two listings.
Replace 8 inline sub-agent prompts (~4,806B) with an 8-row cartouche
table. Extract all prompts with audit tasks and output format templates
to references/audit-prompts.md. SKILL.md drops from 7,956B to 4,102B.
Add cartouche table listing all dispatched agents. Extract Phase 1
context scan prompt and Phase 2 ideation dispatch blocks to
references/dispatch-prompts.md. Keep conditional orchestration logic
inline. Description bytes drop from ~5,124B to ~2,023B.
Add Cartouche Format subsection to the Skill Compliance Checklist
documenting the 4-column table format, trigger field conventions,
extraction threshold for detailed criteria, and bulk lookup pattern.
…lures

Add explicit fallback instructions to ce-review, ce-compound, ce-ideate,
and agent-native-audit for when their extracted reference files cannot be
read. Matches the pattern already present in document-review and satisfies
the plan's System-Wide Impact error propagation requirement.
…nication patterns

Add _meta.return_tiers to document-review findings-schema.json documenting
evidence as the only detail-tier field (all other finding fields are merge-tier).

Add "Sub-Agent Communication Patterns" section to AGENTS.md documenting the
write-once dispatch + compact returns pattern for multi-agent skills.
…ce dispatch

Transform subagent-template.md into dispatch-context template with:
- Artifact write instruction (.context/ path with {run_id}/{reviewer_name}.json)
- Compact return contract (merge-tier fields only, evidence omitted)
- R9 fallback (full inline returns if artifact write fails)
- Updated read-only constraint permitting artifact writes
- Removed {document_content} slot (agents Read from document path)
- Variable reference table documenting pre-resolved vs per-agent variables
… SKILL.md

Replace inline dispatch with write-once dispatch context assembly:
- Generate run ID, assemble dispatch context (template + schema), write to disk
- Lean prompt per agent (~1.5KB) instead of full inlined template (~8-25KB)
- R8 fallback to inline dispatch if context write fails
- Replace @ inlines with backtick path references (~7.5KB saved per message)
- Add cleanup instruction after Phase 5 completion
- Pass run ID to synthesis pipeline for artifact-based evidence loading
…ed evidence

Add evidence batch-load step (3.3) before dedup to load evidence from
per-agent artifact files. Update dedup to union evidence from artifacts
using reviewer name + content fingerprint matching. Update headless output
to include evidence lines per finding. Add R9 fallback detection for
inline evidence when artifacts are missing. Fix step numbering and
cross-references after insertion.
…rn spec

Mark suggested_fix as "(when present)" in the subagent template compact
return field list to match the schema's _meta.return_tiers and the
synthesis validation step which correctly omits it from required fields.

Also mark plan as completed with all unit checkboxes checked.
Introduces taskmd as the project's task management system with:

- .taskmd.yaml: Project-level configuration defining scopes for all repo
  surfaces (cli, compound-engineering, coding-tutor, marketplace, tests,
  docs, ci, scripts), ULID-based ID generation, solo workflow mode, and
  worklog support.

- tasks/CLAUDE.md: Agent-facing instructions for working with taskmd,
  covering file format, CLI commands, task lifecycle (start/complete),
  dependency handling, phases, worklogs, and validation.

- tasks/TASKMD_SPEC.md: Trimmed specification reference documenting all
  frontmatter fields, status flow, verify checks, file organization
  conventions, phase configuration, and validation rules.

This enables structured task tracking for the token-improvements work
and provides agents with the context needed to create, manage, and
verify tasks within the repo's established scopes.
Extract inline content from agent-native-architecture (18.3KB -> 5.0KB),
orchestrating-swarms (35.6KB -> 5.0KB), and dspy-ruby (21.5KB -> 5.0KB)
into queryable reference files. Combined SKILL.md reduced from 75.4KB to
14.9KB (80% reduction). 19 new reference files created.
… justified

Evaluated decision point: prior phases reduced carrying cost but always-loaded
budget remains at 86%. Module unbundling would drop core-only to 57% but requires
namespace changes across ~20 files, 5 marketplace entries, and converter testing
for 10 targets. Deferred until budget nears capacity or platform adds native
module support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants