Skip to content

fix(ce-brainstorm): reduce token cost by extracting late-sequence content#511

Merged
tmchow merged 1 commit intomainfrom
tmchow/optimize-brainstorm
Apr 5, 2026
Merged

fix(ce-brainstorm): reduce token cost by extracting late-sequence content#511
tmchow merged 1 commit intomainfrom
tmchow/optimize-brainstorm

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 5, 2026

Summary

  • Extract Phase 3 (requirements capture, 130 lines) and Phase 4 (handoff, 90 lines) from SKILL.md into references/ files loaded on demand via backtick path stubs
  • Extract visual communication guidance (26 lines) into its own conditional reference, following ce:plan's existing pattern
  • Deduplicate interaction rules that were restated verbatim in Phase 1.3
  • Update contract tests to verify stubs point to correct reference files

SKILL.md: 387 -> 173 lines (55% reduction, 24.2KB -> 11.8KB)

How the savings work

Phase 3 + Phase 4 make up 53% of the skill but are only needed after the interactive dialogue (Phases 0-2) completes. In a typical brainstorm, 8-17 turns happen before Phase 3 is relevant — each carrying that content in the system prompt for nothing.

The system prompt (where skill content lives) is carried in full on every API call and is never compressed. Extracting late-sequence content to reference files means it's only loaded via Read when actually needed, reducing the per-turn carrying cost during the interactive exploration phases.

Estimated savings per session

Scenario Pre-Phase-3 Turns Estimated Savings
Lightweight (9 turns) ~4 ~13K tokens (30%)
Standard (19 turns) ~12 ~35K tokens (39%)
Deep (26 turns) ~17 ~49K tokens (35%)

After both references are loaded in the final few turns, the per-turn cost is roughly neutral — the savings are concentrated in the interactive phases where the most turns occur.

Benchmarking note

We ran eval comparisons but the evals methodology reads SKILL.md via the Read tool into conversation history. This is architecturally different from real skill invocation where SKILL.md is injected as system prompt. The evals confirmed quality parity (both versions produce equivalent brainstorm output) but cannot measure system prompt carrying cost reduction, which is where the savings come from. The theoretical model follows the same pattern validated in #489 (ce:plan) and #509 (document-review).

Test plan

  • bun test — 586 pass, 0 fail
  • bun run release:validate — metadata in sync
  • Contract tests verify stubs point to correct reference files
  • Contract tests verify behavioral guarantees in extracted files
  • Eval runs confirm quality parity between original and optimized versions

🤖 Generated with Claude Code

…tent

Extract Phase 3 (requirements capture) and Phase 4 (handoff) into
reference files loaded on demand. These phases comprise 53% of the
skill but are only needed after the interactive dialogue completes.

- SKILL.md: 387 -> 173 lines (55% reduction)
- references/requirements-capture.md: document template, formatting, completeness checks
- references/visual-communication.md: conditional diagram guidance
- references/handoff.md: next-step options, dispatch logic, closing summaries
- Deduplicate interaction rules restated in Phase 1.3

Follows the proven pattern from ce:plan (#489) and document-review (#509).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tmchow tmchow merged commit bdeb793 into main Apr 5, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant