Skip to content

feat: sentence-level granularity and weighted multi-signal scoring #21

@Jamie-BitFlight

Description

@Jamie-BitFlight

Summary

Two related architectural improvements:

  1. Sentence-level granularity — score each sentence independently instead of whole-response, enabling precise flagging ("sentence 3 of 7 has high hallucination risk")
  2. Weighted multi-signal aggregation — combine scores from multiple detection categories into a single weighted score per sentence

Current Behavior

  • Detection operates at whole-response level
  • Each trigger match is independent — no aggregation
  • No sentence-level breakdown

Proposed Behavior

Sentence Splitting

// Regex-based sentence splitter (zero dependencies)
const sentences = text.split(/(?<=[.!?])\s+/);

Per-Sentence Scoring

Each sentence gets scored by all active detection categories:

{
  sentence: "I think the issue is caused by a race condition.",
  index: 2,
  scores: {
    speculation_language: 0.8,  // "I think"
    causality_language: 0.7,    // "caused by" without evidence
    pseudo_quantification: 0.0,
    completeness_claim: 0.0,
  },
  aggregateScore: 0.68,  // weighted average
  label: "UNCERTAIN",     // GROUNDED < 0.30 < UNCERTAIN < 0.60 < HALLUCINATED
}

Configurable Weights

// Default weights (configurable via .hallucination-detectorrc.cjs)
const DEFAULT_WEIGHTS = {
  speculation_language: 0.25,
  causality_language: 0.30,
  pseudo_quantification: 0.15,
  completeness_claim: 0.20,
  fabricated_source: 0.10,  // if #18 implemented
};

Three-Tier Labels

Label Score Range Meaning
GROUNDED < 0.30 Sentence is evidence-based
UNCERTAIN 0.30 - 0.60 Mixed signals — may need verification
HALLUCINATED > 0.60 High confidence of hallucination

Acceptance Criteria

  • Sentence splitting works for standard English text
  • Per-sentence scoring for all active categories
  • Weighted aggregation with configurable weights
  • Three-tier labeling (GROUNDED / UNCERTAIN / HALLUCINATED)
  • Backward-compatible — existing whole-response mode still works
  • Tests cover sentence splitting edge cases and aggregation math

Related Issues

References

  • ObvioSpectre/hallucination-detector scoring/aggregator.py (weighted aggregation)
  • ObvioSpectre/hallucination-detector core/sentence_splitter.py (regex splitter)

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestimpact: structuralChanges pipeline architecture. Medium-high risk. Tests may break.phase: 4-scoring-rebuildPhase 4: Scoring architecture (changes match shape)risk: highHigh risk — contract change, deadlock potential, or external depstopic: scoringConfidence scoring, weighted aggregation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions