Skip to content

fix(document-review): reduce token cost and latency#509

Merged
tmchow merged 2 commits intomainfrom
tmchow/optimize-doc-review
Apr 5, 2026
Merged

fix(document-review): reduce token cost and latency#509
tmchow merged 2 commits intomainfrom
tmchow/optimize-doc-review

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 5, 2026

Summary

Reduces document-review skill token consumption ~20-25% per run and agent cost ~15-25% through model tiering, without compromising finding quality.

The skill dispatches 2-7 parallel reviewer agents, each receiving the full document + persona + schema + template. The SKILL.md (32KB with inlined references) is carried through every orchestrator turn during dispatch, making content reduction compound across the session.

What changed

Structural token reduction (applies to every run):

  • Stripped _meta commentary from findings schema — duplicated guidance already in persona files and subagent template (-2KB per agent)
  • Removed Variable Reference table from subagent template — documents template variables agents never see post-substitution
  • Extracted Phases 3-5 to references/synthesis-and-presentation.md — late-sequence content not needed during dispatch, loaded only after agents return (follows ce-plan's proven extraction pattern)
  • Compressed autofix rules in subagent template from ~2.7KB to ~800 bytes, preserving all decision logic

Model tiering (reduces cost per token):

  • Set model: sonnet on security-lens, design-lens, and scope-guardian — these do structured checklist work (attack surface inventory, dimensional ratings, scope-vs-goals matching) where Sonnet performs at quality parity with Opus

Adversarial focus (reduces redundant work):

  • Slimmed Quick/Standard depth to skip premise challenging and simplification pressure (already covered by product-lens and scope-guardian), focusing on assumption surfacing and decision stress-testing — the adversarial's unique contributions. Deep depth unchanged.

Measured impact

File Before After Reduction
SKILL.md 16.9KB 8.0KB 53%
findings-schema.json 5.5KB 3.5KB 36%
subagent-template.md 5.1KB 2.6KB 50%

Validation

Dispatched all 8 reviewer agents across 3 test documents (small requirements doc, medium plan, large payment migration plan with auth/PCI/data migration) with known planted issues:

  • Detection rate: 87-93% of planted issues caught (13/15 full match, 1 partial)
  • False positives: 0 across all agents
  • Autofix accuracy: All auto findings had correct suggested_fix; all present findings genuinely required judgment
  • Sonnet quality: Security-lens, design-lens, and scope-guardian on Sonnet produced findings indistinguishable in quality from Opus — comprehensive coverage, well-calibrated severity, zero false positives
  • Schema compliance: All agents returned valid JSON with all required fields despite _meta removal

Test plan

  • bun test — 586 tests pass
  • bun run release:validate — 49 agents, 41 skills, 0 MCP servers
  • findings-schema.json validates as clean JSON
  • Dispatched reviewer agents on 3 test docs and graded findings
  • Smoke test full /document-review invocation on a real plan doc

🤖 Generated with Claude Code

The document-review skill dispatches 2-7 parallel reviewer agents, each
receiving the full document + persona + schema + template. Token costs
compound because the SKILL.md content is carried through every
orchestrator turn during dispatch.

This commit reduces per-run token consumption ~20-25% through six
independent changes:

Structural (reduces what every agent and the orchestrator carry):
- Strip _meta commentary from findings-schema.json (-2KB per agent) --
  duplicated guidance already in persona files and subagent template
- Remove Variable Reference table from subagent-template.md -- documents
  template variables agents never see after substitution
- Extract Phases 3-5 to references/synthesis-and-presentation.md --
  late-sequence content (synthesis, presentation, next-action) not needed
  during the dispatch phase, following ce-plan's proven extraction pattern
- Compress autofix_class rules in subagent template from ~2.7KB to
  ~800 bytes, preserving all decision logic

Model tiering (reduces cost per token on checklist agents):
- Set model: sonnet on security-lens, design-lens, scope-guardian --
  these do structured checklist evaluation (attack surface inventory,
  dimensional 0-10 ratings, scope-vs-goals cross-referencing) where
  Sonnet performs at quality parity with Opus

Adversarial focus (reduces redundant work):
- Slim Quick/Standard depth to skip premise challenging and
  simplification pressure (covered by product-lens and scope-guardian),
  focusing on assumption surfacing and decision stress-testing -- the
  adversarial's unique contributions. Deep depth unchanged.

Validated by dispatching all 8 agents across 3 test documents (small
requirements, medium plan, large payment migration plan): 87-93% planted
issue detection, zero false positives, correct auto/present
classifications, valid JSON from all models including trimmed schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be6a099b46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… absent

The Quick/Standard depth slimming unconditionally skipped premise
challenging and simplification pressure, assuming product-lens and
scope-guardian always cover them. But those personas are conditional --
a medium technical plan without strategic claims or priority tiers
wouldn't activate either, creating a coverage gap.

Now the adversarial checks for the same document signals the
orchestrator uses: include premise/simplification when the document
lacks challengeable premise claims (product-lens signal) or explicit
priority/scope structure (scope-guardian signal).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tmchow
Copy link
Copy Markdown
Collaborator Author

tmchow commented Apr 5, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@tmchow tmchow merged commit 9da73a6 into main Apr 5, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 5, 2026
tmchow added a commit that referenced this pull request Apr 5, 2026
…tent

Extract Phase 3 (requirements capture) and Phase 4 (handoff) into
reference files loaded on demand. These phases comprise 53% of the
skill but are only needed after the interactive dialogue completes.

- SKILL.md: 387 -> 173 lines (55% reduction)
- references/requirements-capture.md: document template, formatting, completeness checks
- references/visual-communication.md: conditional diagram guidance
- references/handoff.md: next-step options, dispatch logic, closing summaries
- Deduplicate interaction rules restated in Phase 1.3

Follows the proven pattern from ce:plan (#489) and document-review (#509).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant