feat: sentence-level granularity and weighted multi-signal scoring

## Summary

Two related architectural improvements:

1. **Sentence-level granularity** — score each sentence independently instead of whole-response, enabling precise flagging ("sentence 3 of 7 has high hallucination risk")
2. **Weighted multi-signal aggregation** — combine scores from multiple detection categories into a single weighted score per sentence

## Current Behavior

- Detection operates at whole-response level
- Each trigger match is independent — no aggregation
- No sentence-level breakdown

## Proposed Behavior

### Sentence Splitting

```javascript
// Regex-based sentence splitter (zero dependencies)
const sentences = text.split(/(?<=[.!?])\s+/);
```

### Per-Sentence Scoring

Each sentence gets scored by all active detection categories:

```javascript
{
  sentence: "I think the issue is caused by a race condition.",
  index: 2,
  scores: {
    speculation_language: 0.8,  // "I think"
    causality_language: 0.7,    // "caused by" without evidence
    pseudo_quantification: 0.0,
    completeness_claim: 0.0,
  },
  aggregateScore: 0.68,  // weighted average
  label: "UNCERTAIN",     // GROUNDED < 0.30 < UNCERTAIN < 0.60 < HALLUCINATED
}
```

### Configurable Weights

```javascript
// Default weights (configurable via .hallucination-detectorrc.cjs)
const DEFAULT_WEIGHTS = {
  speculation_language: 0.25,
  causality_language: 0.30,
  pseudo_quantification: 0.15,
  completeness_claim: 0.20,
  fabricated_source: 0.10,  // if #18 implemented
};
```

### Three-Tier Labels

| Label | Score Range | Meaning |
|---|---|---|
| GROUNDED | < 0.30 | Sentence is evidence-based |
| UNCERTAIN | 0.30 - 0.60 | Mixed signals — may need verification |
| HALLUCINATED | > 0.60 | High confidence of hallucination |

## Acceptance Criteria

- [ ] Sentence splitting works for standard English text
- [ ] Per-sentence scoring for all active categories
- [ ] Weighted aggregation with configurable weights
- [ ] Three-tier labeling (GROUNDED / UNCERTAIN / HALLUCINATED)
- [ ] Backward-compatible — existing whole-response mode still works
- [ ] Tests cover sentence splitting edge cases and aggregation math

## Related Issues

- #12 (confidence scoring — this subsumes it with richer architecture)
- #10 (config — weights and thresholds configurable)

## References

- ObvioSpectre/hallucination-detector `scoring/aggregator.py` (weighted aggregation)
- ObvioSpectre/hallucination-detector `core/sentence_splitter.py` (regex splitter)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sentence-level granularity and weighted multi-signal scoring #21

Summary

Current Behavior

Proposed Behavior

Sentence Splitting

Per-Sentence Scoring

Configurable Weights

Three-Tier Labels

Acceptance Criteria

Related Issues

References

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Label	Score Range	Meaning
GROUNDED	< 0.30	Sentence is evidence-based
UNCERTAIN	0.30 - 0.60	Mixed signals — may need verification
HALLUCINATED	> 0.60	High confidence of hallucination

feat: sentence-level granularity and weighted multi-signal scoring #21

Description

Summary

Current Behavior

Proposed Behavior

Sentence Splitting

Per-Sentence Scoring

Configurable Weights

Three-Tier Labels

Acceptance Criteria

Related Issues

References

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions