Problem
Agents overstate findings and produce unchallenged analysis. In our squad, agents claimed 75-90% token savings when the real number was 20-55%. Without a challenger, wrong recommendations get implemented.
Proposal
Add an optional Challenger agent template to the framework:
- Charter template:
.squad/templates/agents/challenger.md — role is to verify claims, run counter-hypotheses, check math, flag confidence levels (Verified/Unverified/Contradicted)
- Spawn integration: Coordinator can auto-spawn challenger before any architecture decision or when a claim exceeds a threshold (e.g., 'saves 75%')
- Iterative retrieval pattern: Max 3 investigation cycles, must cite evidence for every verdict
- Output format: Per-claim verdict table with evidence links
Prior Art
Field-tested as 'Q' agent across 200+ issues. Caught: inflated metrics, fabricated config references, wrong bottleneck assumptions. False positive rate ~15% (challenged things that were actually correct).
Deliverables
Problem
Agents overstate findings and produce unchallenged analysis. In our squad, agents claimed 75-90% token savings when the real number was 20-55%. Without a challenger, wrong recommendations get implemented.
Proposal
Add an optional Challenger agent template to the framework:
.squad/templates/agents/challenger.md— role is to verify claims, run counter-hypotheses, check math, flag confidence levels (Verified/Unverified/Contradicted)Prior Art
Field-tested as 'Q' agent across 200+ issues. Caught: inflated metrics, fabricated config references, wrong bottleneck assumptions. False positive rate ~15% (challenged things that were actually correct).
Deliverables