-
Notifications
You must be signed in to change notification settings - Fork 0
Open
0 / 10 of 1 issue completedOpen
0 / 10 of 1 issue completed
Copy link
Labels
enhancementNew feature or requestNew feature or requestimpact: structuralChanges pipeline architecture. Medium-high risk. Tests may break.Changes pipeline architecture. Medium-high risk. Tests may break.phase: 4-scoring-rebuildPhase 4: Scoring architecture (changes match shape)Phase 4: Scoring architecture (changes match shape)risk: highHigh risk — contract change, deadlock potential, or external depsHigh risk — contract change, deadlock potential, or external depstopic: scoringConfidence scoring, weighted aggregationConfidence scoring, weighted aggregation
Description
Summary
Two related architectural improvements:
- Sentence-level granularity — score each sentence independently instead of whole-response, enabling precise flagging ("sentence 3 of 7 has high hallucination risk")
- Weighted multi-signal aggregation — combine scores from multiple detection categories into a single weighted score per sentence
Current Behavior
- Detection operates at whole-response level
- Each trigger match is independent — no aggregation
- No sentence-level breakdown
Proposed Behavior
Sentence Splitting
// Regex-based sentence splitter (zero dependencies)
const sentences = text.split(/(?<=[.!?])\s+/);Per-Sentence Scoring
Each sentence gets scored by all active detection categories:
{
sentence: "I think the issue is caused by a race condition.",
index: 2,
scores: {
speculation_language: 0.8, // "I think"
causality_language: 0.7, // "caused by" without evidence
pseudo_quantification: 0.0,
completeness_claim: 0.0,
},
aggregateScore: 0.68, // weighted average
label: "UNCERTAIN", // GROUNDED < 0.30 < UNCERTAIN < 0.60 < HALLUCINATED
}Configurable Weights
// Default weights (configurable via .hallucination-detectorrc.cjs)
const DEFAULT_WEIGHTS = {
speculation_language: 0.25,
causality_language: 0.30,
pseudo_quantification: 0.15,
completeness_claim: 0.20,
fabricated_source: 0.10, // if #18 implemented
};Three-Tier Labels
| Label | Score Range | Meaning |
|---|---|---|
| GROUNDED | < 0.30 | Sentence is evidence-based |
| UNCERTAIN | 0.30 - 0.60 | Mixed signals — may need verification |
| HALLUCINATED | > 0.60 | High confidence of hallucination |
Acceptance Criteria
- Sentence splitting works for standard English text
- Per-sentence scoring for all active categories
- Weighted aggregation with configurable weights
- Three-tier labeling (GROUNDED / UNCERTAIN / HALLUCINATED)
- Backward-compatible — existing whole-response mode still works
- Tests cover sentence splitting edge cases and aggregation math
Related Issues
- feat: confidence scoring (0-100) per trigger match #12 (confidence scoring — this subsumes it with richer architecture)
- feat: config file handling — cascading settings from multiple sources #10 (config — weights and thresholds configurable)
References
- ObvioSpectre/hallucination-detector
scoring/aggregator.py(weighted aggregation) - ObvioSpectre/hallucination-detector
core/sentence_splitter.py(regex splitter)
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestimpact: structuralChanges pipeline architecture. Medium-high risk. Tests may break.Changes pipeline architecture. Medium-high risk. Tests may break.phase: 4-scoring-rebuildPhase 4: Scoring architecture (changes match shape)Phase 4: Scoring architecture (changes match shape)risk: highHigh risk — contract change, deadlock potential, or external depsHigh risk — contract change, deadlock potential, or external depstopic: scoringConfidence scoring, weighted aggregationConfidence scoring, weighted aggregation