fix(ce-brainstorm): reduce token cost by extracting late-sequence content#511
Merged
fix(ce-brainstorm): reduce token cost by extracting late-sequence content#511
Conversation
…tent Extract Phase 3 (requirements capture) and Phase 4 (handoff) into reference files loaded on demand. These phases comprise 53% of the skill but are only needed after the interactive dialogue completes. - SKILL.md: 387 -> 173 lines (55% reduction) - references/requirements-capture.md: document template, formatting, completeness checks - references/visual-communication.md: conditional diagram guidance - references/handoff.md: next-step options, dispatch logic, closing summaries - Deduplicate interaction rules restated in Phase 1.3 Follows the proven pattern from ce:plan (#489) and document-review (#509). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SKILL.mdintoreferences/files loaded on demand via backtick path stubsSKILL.md: 387 -> 173 lines (55% reduction, 24.2KB -> 11.8KB)
How the savings work
Phase 3 + Phase 4 make up 53% of the skill but are only needed after the interactive dialogue (Phases 0-2) completes. In a typical brainstorm, 8-17 turns happen before Phase 3 is relevant — each carrying that content in the system prompt for nothing.
The system prompt (where skill content lives) is carried in full on every API call and is never compressed. Extracting late-sequence content to reference files means it's only loaded via
Readwhen actually needed, reducing the per-turn carrying cost during the interactive exploration phases.Estimated savings per session
After both references are loaded in the final few turns, the per-turn cost is roughly neutral — the savings are concentrated in the interactive phases where the most turns occur.
Benchmarking note
We ran eval comparisons but the evals methodology reads SKILL.md via the
Readtool into conversation history. This is architecturally different from real skill invocation where SKILL.md is injected as system prompt. The evals confirmed quality parity (both versions produce equivalent brainstorm output) but cannot measure system prompt carrying cost reduction, which is where the savings come from. The theoretical model follows the same pattern validated in #489 (ce:plan) and #509 (document-review).Test plan
bun test— 586 pass, 0 failbun run release:validate— metadata in sync🤖 Generated with Claude Code