-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or requestimpact: additiveAdds patterns to existing pipeline. Low risk.Adds patterns to existing pipeline. Low risk.phase: 3-sentence-infraPhase 3: Sentence splitting infrastructurePhase 3: Sentence splitting infrastructurerisk: lowLow risk — additive, backward-compatibleLow risk — additive, backward-compatibletopic: sentence-processingSentence splitting, pairwise comparisonSentence splitting, pairwise comparison
Description
Summary
Detect degenerate repetition (looping/collapse) in assistant output where near-identical sentences appear multiple times. This is a specific LLM failure mode indicating low-quality generation.
Technique (from ObvioSpectre/hallucination-detector)
Check for sentence pairs with high textual similarity (>95% overlap) as an indicator of looping.
Regex-Only Approximation
Without embeddings, detect repetition via:
- Split into sentences
- Normalize (lowercase, strip whitespace)
- Check for exact or near-exact duplicates (e.g., Levenshtein distance < 10% of sentence length, or longest common substring > 80%)
// Simple exact-duplicate detection
const sentences = text.split(/(?<=[.!?])\s+/);
const normalized = sentences.map(s => s.toLowerCase().trim());
const duplicates = normalized.filter((s, i) => normalized.indexOf(s) !== i);Why This Works
When an LLM loops or produces degenerate output, it repeats the same sentence or paragraph with minor variations. This is distinct from intentional repetition (e.g., numbered lists with similar structure) and is a strong signal of generation failure.
New Category
degenerate_repetition — detects looping/collapse in assistant output.
Acceptance Criteria
- Sentence-level duplicate detection
- Near-duplicate detection (not just exact matches)
- Suppression for intentional repetition (numbered lists, table rows, code)
- Configurable similarity threshold
- Tests with looping examples and legitimate repetition controls
References
- ObvioSpectre/hallucination-detector
detectors/consistency.py(similarity > 0.95 check)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestimpact: additiveAdds patterns to existing pipeline. Low risk.Adds patterns to existing pipeline. Low risk.phase: 3-sentence-infraPhase 3: Sentence splitting infrastructurePhase 3: Sentence splitting infrastructurerisk: lowLow risk — additive, backward-compatibleLow risk — additive, backward-compatibletopic: sentence-processingSentence splitting, pairwise comparisonSentence splitting, pairwise comparison