-
Notifications
You must be signed in to change notification settings - Fork 0
Open
0 / 20 of 2 issues completedOpen
0 / 20 of 2 issues completed
Copy link
Labels
enhancementNew feature or requestNew feature or requestimpact: structuralChanges pipeline architecture. Medium-high risk. Tests may break.Changes pipeline architecture. Medium-high risk. Tests may break.phase: 3-sentence-infraPhase 3: Sentence splitting infrastructurePhase 3: Sentence splitting infrastructurerisk: mediumMedium risk — new subsystem or structural changeMedium risk — new subsystem or structural changetopic: sentence-processingSentence splitting, pairwise comparisonSentence splitting, pairwise comparison
Description
Summary
Detect self-contradictory statements within a single assistant response using negation polarity heuristics. When the assistant says "X is Y" in one sentence and "X is not Y" in another, flag it as a contradiction.
Technique (from ObvioSpectre/hallucination-detector)
Lightweight NLI approximation without requiring an NLI model:
- Split response into sentences
- For each sentence pair on the same topic:
- Check if one contains negation words and the other does not
- If so, flag as internal contradiction
Negation Words
const NEGATION_WORDS = ['not', 'no', 'never', "didn't", "isn't", "wasn't", "aren't", "won't", "can't", "doesn't", 'none', 'neither', 'nor', 'unable', 'lacks', 'failed'];Detection Logic (Regex-Adaptable)
Without embeddings, approximate "same topic" by checking for shared noun phrases or subjects:
// Pattern: "X is Y" ... "X is not Y"
// Pattern: "always X" ... "never X"
// Pattern: "X works" ... "X doesn't work"Why This Works
Self-contradiction is a strong hallucination signal — the assistant is confabulating rather than reasoning from consistent evidence. This catches a failure mode that none of our current four categories detect.
New Category
internal_contradiction — a sixth detection category.
Acceptance Criteria
- Sentence splitting implemented (regex: split on
.!?followed by whitespace) - Negation polarity detection for sentence pairs
- Handles common negation patterns beyond simple "not"
- Low false positive rate — only flags when sentences are on the same topic
- Tests with clear contradiction examples and non-contradiction controls
- Suppression for quoted text and code blocks
References
- ObvioSpectre/hallucination-detector
detectors/consistency.py - Negation-polarity heuristic approximates NLI contradiction detection
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestimpact: structuralChanges pipeline architecture. Medium-high risk. Tests may break.Changes pipeline architecture. Medium-high risk. Tests may break.phase: 3-sentence-infraPhase 3: Sentence splitting infrastructurePhase 3: Sentence splitting infrastructurerisk: mediumMedium risk — new subsystem or structural changeMedium risk — new subsystem or structural changetopic: sentence-processingSentence splitting, pairwise comparisonSentence splitting, pairwise comparison