fix(document-review): reduce token cost and latency#509
Merged
Conversation
The document-review skill dispatches 2-7 parallel reviewer agents, each receiving the full document + persona + schema + template. Token costs compound because the SKILL.md content is carried through every orchestrator turn during dispatch. This commit reduces per-run token consumption ~20-25% through six independent changes: Structural (reduces what every agent and the orchestrator carry): - Strip _meta commentary from findings-schema.json (-2KB per agent) -- duplicated guidance already in persona files and subagent template - Remove Variable Reference table from subagent-template.md -- documents template variables agents never see after substitution - Extract Phases 3-5 to references/synthesis-and-presentation.md -- late-sequence content (synthesis, presentation, next-action) not needed during the dispatch phase, following ce-plan's proven extraction pattern - Compress autofix_class rules in subagent template from ~2.7KB to ~800 bytes, preserving all decision logic Model tiering (reduces cost per token on checklist agents): - Set model: sonnet on security-lens, design-lens, scope-guardian -- these do structured checklist evaluation (attack surface inventory, dimensional 0-10 ratings, scope-vs-goals cross-referencing) where Sonnet performs at quality parity with Opus Adversarial focus (reduces redundant work): - Slim Quick/Standard depth to skip premise challenging and simplification pressure (covered by product-lens and scope-guardian), focusing on assumption surfacing and decision stress-testing -- the adversarial's unique contributions. Deep depth unchanged. Validated by dispatching all 8 agents across 3 test documents (small requirements, medium plan, large payment migration plan): 87-93% planted issue detection, zero false positives, correct auto/present classifications, valid JSON from all models including trimmed schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be6a099b46
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… absent The Quick/Standard depth slimming unconditionally skipped premise challenging and simplification pressure, assuming product-lens and scope-guardian always cover them. But those personas are conditional -- a medium technical plan without strategic claims or priority tiers wouldn't activate either, creating a coverage gap. Now the adversarial checks for the same document signals the orchestrator uses: include premise/simplification when the document lacks challengeable premise claims (product-lens signal) or explicit priority/scope structure (scope-guardian signal). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Merged
tmchow
added a commit
that referenced
this pull request
Apr 5, 2026
…tent Extract Phase 3 (requirements capture) and Phase 4 (handoff) into reference files loaded on demand. These phases comprise 53% of the skill but are only needed after the interactive dialogue completes. - SKILL.md: 387 -> 173 lines (55% reduction) - references/requirements-capture.md: document template, formatting, completeness checks - references/visual-communication.md: conditional diagram guidance - references/handoff.md: next-step options, dispatch logic, closing summaries - Deduplicate interaction rules restated in Phase 1.3 Follows the proven pattern from ce:plan (#489) and document-review (#509). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reduces document-review skill token consumption ~20-25% per run and agent cost ~15-25% through model tiering, without compromising finding quality.
The skill dispatches 2-7 parallel reviewer agents, each receiving the full document + persona + schema + template. The SKILL.md (32KB with inlined references) is carried through every orchestrator turn during dispatch, making content reduction compound across the session.
What changed
Structural token reduction (applies to every run):
_metacommentary from findings schema — duplicated guidance already in persona files and subagent template (-2KB per agent)references/synthesis-and-presentation.md— late-sequence content not needed during dispatch, loaded only after agents return (follows ce-plan's proven extraction pattern)Model tiering (reduces cost per token):
model: sonneton security-lens, design-lens, and scope-guardian — these do structured checklist work (attack surface inventory, dimensional ratings, scope-vs-goals matching) where Sonnet performs at quality parity with OpusAdversarial focus (reduces redundant work):
Measured impact
Validation
Dispatched all 8 reviewer agents across 3 test documents (small requirements doc, medium plan, large payment migration plan with auth/PCI/data migration) with known planted issues:
autofindings had correctsuggested_fix; allpresentfindings genuinely required judgment_metaremovalTest plan
bun test— 586 tests passbun run release:validate— 49 agents, 41 skills, 0 MCP servers/document-reviewinvocation on a real plan doc🤖 Generated with Claude Code