-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or requestimpact: contractChanges stdin/stdout contract with Claude Code. High risk.Changes stdin/stdout contract with Claude Code. High risk.phase: 4-scoring-rebuildPhase 4: Scoring architecture (changes match shape)Phase 4: Scoring architecture (changes match shape)risk: mediumMedium risk — new subsystem or structural changeMedium risk — new subsystem or structural changetopic: scoringConfidence scoring, weighted aggregationConfidence scoring, weighted aggregation
Description
Summary
Add a numeric confidence score (0-100) to each trigger match, quantifying how likely the flagged text is an actual hallucination versus a legitimate usage.
Current Behavior
Trigger matches return { kind, evidence, offset } — binary detection with no severity gradation. A hedged speculation ("I think") and a confident false claim ("this is definitely caused by X") both produce the same output.
Proposed Behavior
Each match gets a confidence field (0-100):
- 90-100: Very high confidence this is a hallucination (e.g., "I think" + causal claim + no evidence)
- 70-89: High confidence (single strong pattern)
- 50-69: Moderate (pattern present but context suggests legitimate usage)
- Below 50: Suppressed (not reported unless verbose mode)
Scoring Factors
- Pattern strength: "I think" (weak) vs "this is caused by" (strong)
- Context density: Multiple patterns in the same paragraph → higher score
- Evidence proximity: Nearby tool output or file references → lower score
- Category stacking: Speculation + causality in same sentence → higher score
- Text length: Short decisive statements with speculation → higher score than long analytical text
Output Format Change
// Before
{ kind: 'speculation_language', evidence: 'I think', offset: 0 }
// After
{ kind: 'speculation_language', evidence: 'I think', offset: 0, confidence: 75 }Acceptance Criteria
- All trigger matches include a
confidencefield (0-100) - Scoring factors are configurable (weights per factor)
- Default threshold for reporting is configurable (default: 50)
- Tests validate scoring logic for representative examples
- Backward-compatible — existing consumers can ignore the new field
Related Issues
- feat: config file handling — cascading settings from multiple sources #10 (config file handling — thresholds and weights)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestimpact: contractChanges stdin/stdout contract with Claude Code. High risk.Changes stdin/stdout contract with Claude Code. High risk.phase: 4-scoring-rebuildPhase 4: Scoring architecture (changes match shape)Phase 4: Scoring architecture (changes match shape)risk: mediumMedium risk — new subsystem or structural changeMedium risk — new subsystem or structural changetopic: scoringConfidence scoring, weighted aggregationConfidence scoring, weighted aggregation