fix(ce-brainstorm): reduce token cost by extracting late-sequence content by tmchow · Pull Request #511 · EveryInc/compound-engineering-plugin

tmchow · 2026-04-05T18:20:28Z

Summary

Extract Phase 3 (requirements capture, 130 lines) and Phase 4 (handoff, 90 lines) from SKILL.md into references/ files loaded on demand via backtick path stubs
Extract visual communication guidance (26 lines) into its own conditional reference, following ce:plan's existing pattern
Deduplicate interaction rules that were restated verbatim in Phase 1.3
Update contract tests to verify stubs point to correct reference files

SKILL.md: 387 -> 173 lines (55% reduction, 24.2KB -> 11.8KB)

How the savings work

Phase 3 + Phase 4 make up 53% of the skill but are only needed after the interactive dialogue (Phases 0-2) completes. In a typical brainstorm, 8-17 turns happen before Phase 3 is relevant — each carrying that content in the system prompt for nothing.

The system prompt (where skill content lives) is carried in full on every API call and is never compressed. Extracting late-sequence content to reference files means it's only loaded via Read when actually needed, reducing the per-turn carrying cost during the interactive exploration phases.

Estimated savings per session

Scenario	Pre-Phase-3 Turns	Estimated Savings
Lightweight (9 turns)	~4	~13K tokens (30%)
Standard (19 turns)	~12	~35K tokens (39%)
Deep (26 turns)	~17	~49K tokens (35%)

After both references are loaded in the final few turns, the per-turn cost is roughly neutral — the savings are concentrated in the interactive phases where the most turns occur.

Benchmarking note

We ran eval comparisons but the evals methodology reads SKILL.md via the Read tool into conversation history. This is architecturally different from real skill invocation where SKILL.md is injected as system prompt. The evals confirmed quality parity (both versions produce equivalent brainstorm output) but cannot measure system prompt carrying cost reduction, which is where the savings come from. The theoretical model follows the same pattern validated in #489 (ce:plan) and #509 (document-review).

Test plan

bun test — 586 pass, 0 fail
bun run release:validate — metadata in sync
Contract tests verify stubs point to correct reference files
Contract tests verify behavioral guarantees in extracted files
Eval runs confirm quality parity between original and optimized versions

🤖 Generated with Claude Code

…tent Extract Phase 3 (requirements capture) and Phase 4 (handoff) into reference files loaded on demand. These phases comprise 53% of the skill but are only needed after the interactive dialogue completes. - SKILL.md: 387 -> 173 lines (55% reduction) - references/requirements-capture.md: document template, formatting, completeness checks - references/visual-communication.md: conditional diagram guidance - references/handoff.md: next-step options, dispatch logic, closing summaries - Deduplicate interaction rules restated in Phase 1.3 Follows the proven pattern from ce:plan (#489) and document-review (#509). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tmchow merged commit bdeb793 into main Apr 5, 2026
2 checks passed

github-actions Bot mentioned this pull request Apr 5, 2026

chore: release main #508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ce-brainstorm): reduce token cost by extracting late-sequence content#511

fix(ce-brainstorm): reduce token cost by extracting late-sequence content#511
tmchow merged 1 commit intomainfrom
tmchow/optimize-brainstorm

tmchow commented Apr 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How the savings work

Estimated savings per session

Benchmarking note

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented Apr 5, 2026 •

edited

Loading