Skip to content

feat: self-contradiction detection via negation polarity #19

@Jamie-BitFlight

Description

@Jamie-BitFlight

Summary

Detect self-contradictory statements within a single assistant response using negation polarity heuristics. When the assistant says "X is Y" in one sentence and "X is not Y" in another, flag it as a contradiction.

Technique (from ObvioSpectre/hallucination-detector)

Lightweight NLI approximation without requiring an NLI model:

  1. Split response into sentences
  2. For each sentence pair on the same topic:
    • Check if one contains negation words and the other does not
    • If so, flag as internal contradiction

Negation Words

const NEGATION_WORDS = ['not', 'no', 'never', "didn't", "isn't", "wasn't", "aren't", "won't", "can't", "doesn't", 'none', 'neither', 'nor', 'unable', 'lacks', 'failed'];

Detection Logic (Regex-Adaptable)

Without embeddings, approximate "same topic" by checking for shared noun phrases or subjects:

// Pattern: "X is Y" ... "X is not Y"
// Pattern: "always X" ... "never X"  
// Pattern: "X works" ... "X doesn't work"

Why This Works

Self-contradiction is a strong hallucination signal — the assistant is confabulating rather than reasoning from consistent evidence. This catches a failure mode that none of our current four categories detect.

New Category

internal_contradiction — a sixth detection category.

Acceptance Criteria

  • Sentence splitting implemented (regex: split on .!? followed by whitespace)
  • Negation polarity detection for sentence pairs
  • Handles common negation patterns beyond simple "not"
  • Low false positive rate — only flags when sentences are on the same topic
  • Tests with clear contradiction examples and non-contradiction controls
  • Suppression for quoted text and code blocks

References

  • ObvioSpectre/hallucination-detector detectors/consistency.py
  • Negation-polarity heuristic approximates NLI contradiction detection

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestimpact: structuralChanges pipeline architecture. Medium-high risk. Tests may break.phase: 3-sentence-infraPhase 3: Sentence splitting infrastructurerisk: mediumMedium risk — new subsystem or structural changetopic: sentence-processingSentence splitting, pairwise comparison

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions