Skip to content

feat: confidence scoring (0-100) per trigger match #12

@Jamie-BitFlight

Description

@Jamie-BitFlight

Summary

Add a numeric confidence score (0-100) to each trigger match, quantifying how likely the flagged text is an actual hallucination versus a legitimate usage.

Current Behavior

Trigger matches return { kind, evidence, offset } — binary detection with no severity gradation. A hedged speculation ("I think") and a confident false claim ("this is definitely caused by X") both produce the same output.

Proposed Behavior

Each match gets a confidence field (0-100):

  • 90-100: Very high confidence this is a hallucination (e.g., "I think" + causal claim + no evidence)
  • 70-89: High confidence (single strong pattern)
  • 50-69: Moderate (pattern present but context suggests legitimate usage)
  • Below 50: Suppressed (not reported unless verbose mode)

Scoring Factors

  • Pattern strength: "I think" (weak) vs "this is caused by" (strong)
  • Context density: Multiple patterns in the same paragraph → higher score
  • Evidence proximity: Nearby tool output or file references → lower score
  • Category stacking: Speculation + causality in same sentence → higher score
  • Text length: Short decisive statements with speculation → higher score than long analytical text

Output Format Change

// Before
{ kind: 'speculation_language', evidence: 'I think', offset: 0 }

// After
{ kind: 'speculation_language', evidence: 'I think', offset: 0, confidence: 75 }

Acceptance Criteria

  • All trigger matches include a confidence field (0-100)
  • Scoring factors are configurable (weights per factor)
  • Default threshold for reporting is configurable (default: 50)
  • Tests validate scoring logic for representative examples
  • Backward-compatible — existing consumers can ignore the new field

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestimpact: contractChanges stdin/stdout contract with Claude Code. High risk.phase: 4-scoring-rebuildPhase 4: Scoring architecture (changes match shape)risk: mediumMedium risk — new subsystem or structural changetopic: scoringConfidence scoring, weighted aggregation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions