feat: confidence scoring (0-100) per trigger match

## Summary

Add a numeric confidence score (0-100) to each trigger match, quantifying how likely the flagged text is an actual hallucination versus a legitimate usage.

## Current Behavior

Trigger matches return `{ kind, evidence, offset }` — binary detection with no severity gradation. A hedged speculation ("I think") and a confident false claim ("this is definitely caused by X") both produce the same output.

## Proposed Behavior

Each match gets a `confidence` field (0-100):
- **90-100**: Very high confidence this is a hallucination (e.g., "I think" + causal claim + no evidence)
- **70-89**: High confidence (single strong pattern)
- **50-69**: Moderate (pattern present but context suggests legitimate usage)
- **Below 50**: Suppressed (not reported unless verbose mode)

## Scoring Factors

- **Pattern strength**: "I think" (weak) vs "this is caused by" (strong)
- **Context density**: Multiple patterns in the same paragraph → higher score
- **Evidence proximity**: Nearby tool output or file references → lower score
- **Category stacking**: Speculation + causality in same sentence → higher score
- **Text length**: Short decisive statements with speculation → higher score than long analytical text

## Output Format Change

```javascript
// Before
{ kind: 'speculation_language', evidence: 'I think', offset: 0 }

// After
{ kind: 'speculation_language', evidence: 'I think', offset: 0, confidence: 75 }
```

## Acceptance Criteria

- [ ] All trigger matches include a `confidence` field (0-100)
- [ ] Scoring factors are configurable (weights per factor)
- [ ] Default threshold for reporting is configurable (default: 50)
- [ ] Tests validate scoring logic for representative examples
- [ ] Backward-compatible — existing consumers can ignore the new field

## Related Issues

- #10 (config file handling — thresholds and weights)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: confidence scoring (0-100) per trigger match #12

Summary

Current Behavior

Proposed Behavior

Scoring Factors

Output Format Change

Acceptance Criteria

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: confidence scoring (0-100) per trigger match #12

Description

Summary

Current Behavior

Proposed Behavior

Scoring Factors

Output Format Change

Acceptance Criteria

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions