fix(document-review): reduce token cost and latency by tmchow · Pull Request #509 · EveryInc/compound-engineering-plugin

tmchow · 2026-04-05T05:41:59Z

Summary

Reduces document-review skill token consumption ~20-25% per run and agent cost ~15-25% through model tiering, without compromising finding quality.

The skill dispatches 2-7 parallel reviewer agents, each receiving the full document + persona + schema + template. The SKILL.md (32KB with inlined references) is carried through every orchestrator turn during dispatch, making content reduction compound across the session.

What changed

Structural token reduction (applies to every run):

Stripped _meta commentary from findings schema — duplicated guidance already in persona files and subagent template (-2KB per agent)
Removed Variable Reference table from subagent template — documents template variables agents never see post-substitution
Extracted Phases 3-5 to references/synthesis-and-presentation.md — late-sequence content not needed during dispatch, loaded only after agents return (follows ce-plan's proven extraction pattern)
Compressed autofix rules in subagent template from ~2.7KB to ~800 bytes, preserving all decision logic

Model tiering (reduces cost per token):

Set model: sonnet on security-lens, design-lens, and scope-guardian — these do structured checklist work (attack surface inventory, dimensional ratings, scope-vs-goals matching) where Sonnet performs at quality parity with Opus

Adversarial focus (reduces redundant work):

Slimmed Quick/Standard depth to skip premise challenging and simplification pressure (already covered by product-lens and scope-guardian), focusing on assumption surfacing and decision stress-testing — the adversarial's unique contributions. Deep depth unchanged.

Measured impact

File	Before	After	Reduction
SKILL.md	16.9KB	8.0KB	53%
findings-schema.json	5.5KB	3.5KB	36%
subagent-template.md	5.1KB	2.6KB	50%

Validation

Dispatched all 8 reviewer agents across 3 test documents (small requirements doc, medium plan, large payment migration plan with auth/PCI/data migration) with known planted issues:

Detection rate: 87-93% of planted issues caught (13/15 full match, 1 partial)
False positives: 0 across all agents
Autofix accuracy: All auto findings had correct suggested_fix; all present findings genuinely required judgment
Sonnet quality: Security-lens, design-lens, and scope-guardian on Sonnet produced findings indistinguishable in quality from Opus — comprehensive coverage, well-calibrated severity, zero false positives
Schema compliance: All agents returned valid JSON with all required fields despite _meta removal

Test plan

bun test — 586 tests pass
bun run release:validate — 49 agents, 41 skills, 0 MCP servers
findings-schema.json validates as clean JSON
Dispatched reviewer agents on 3 test docs and graded findings
Smoke test full /document-review invocation on a real plan doc

🤖 Generated with Claude Code

The document-review skill dispatches 2-7 parallel reviewer agents, each receiving the full document + persona + schema + template. Token costs compound because the SKILL.md content is carried through every orchestrator turn during dispatch. This commit reduces per-run token consumption ~20-25% through six independent changes: Structural (reduces what every agent and the orchestrator carry): - Strip _meta commentary from findings-schema.json (-2KB per agent) -- duplicated guidance already in persona files and subagent template - Remove Variable Reference table from subagent-template.md -- documents template variables agents never see after substitution - Extract Phases 3-5 to references/synthesis-and-presentation.md -- late-sequence content (synthesis, presentation, next-action) not needed during the dispatch phase, following ce-plan's proven extraction pattern - Compress autofix_class rules in subagent template from ~2.7KB to ~800 bytes, preserving all decision logic Model tiering (reduces cost per token on checklist agents): - Set model: sonnet on security-lens, design-lens, scope-guardian -- these do structured checklist evaluation (attack surface inventory, dimensional 0-10 ratings, scope-vs-goals cross-referencing) where Sonnet performs at quality parity with Opus Adversarial focus (reduces redundant work): - Slim Quick/Standard depth to skip premise challenging and simplification pressure (covered by product-lens and scope-guardian), focusing on assumption surfacing and decision stress-testing -- the adversarial's unique contributions. Deep depth unchanged. Validated by dispatching all 8 agents across 3 test documents (small requirements, medium plan, large payment migration plan): 87-93% planted issue detection, zero false positives, correct auto/present classifications, valid JSON from all models including trimmed schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be6a099b46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… absent The Quick/Standard depth slimming unconditionally skipped premise challenging and simplification pressure, assuming product-lens and scope-guardian always cover them. But those personas are conditional -- a medium technical plan without strategic claims or priority tiers wouldn't activate either, creating a coverage gap. Now the adversarial checks for the same document signals the orchestrator uses: include premise/simplification when the document lacks challengeable premise claims (product-lens signal) or explicit priority/scope structure (scope-guardian signal). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tmchow · 2026-04-05T06:00:01Z

@codex review

chatgpt-codex-connector · 2026-04-05T06:04:11Z

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…tent Extract Phase 3 (requirements capture) and Phase 4 (handoff) into reference files loaded on demand. These phases comprise 53% of the skill but are only needed after the interactive dialogue completes. - SKILL.md: 387 -> 173 lines (55% reduction) - references/requirements-capture.md: document template, formatting, completeness checks - references/visual-communication.md: conditional diagram guidance - references/handoff.md: next-step options, dispatch logic, closing summaries - Deduplicate interaction rules restated in Phase 1.3 Follows the proven pattern from ce:plan (#489) and document-review (#509). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread plugins/compound-engineering/agents/document-review/adversarial-document-reviewer.md Outdated

tmchow merged commit 9da73a6 into main Apr 5, 2026
2 checks passed

github-actions Bot mentioned this pull request Apr 5, 2026

chore: release main #508

Merged

tmchow mentioned this pull request Apr 5, 2026

fix(ce-brainstorm): reduce token cost by extracting late-sequence content #511

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(document-review): reduce token cost and latency#509

fix(document-review): reduce token cost and latency#509
tmchow merged 2 commits intomainfrom
tmchow/optimize-doc-review

tmchow commented Apr 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

tmchow commented Apr 5, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Apr 5, 2026

Summary

What changed

Measured impact

Validation

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

tmchow commented Apr 5, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant