Skip to content

feat: fabricated source/citation detection (regex-based) #18

@Jamie-BitFlight

Description

@Jamie-BitFlight

Summary

Add regex-based detection of fabricated URLs, DOIs, and citation markers in assistant output. LLMs frequently hallucinate academic citations, fake URLs, and DOI numbers.

Technique (from ObvioSpectre/hallucination-detector)

Three regex patterns that detect common fabrication signals:

// URLs in output
const URL_PATTERN = /https?:\/\/(?:[-\w.]|(?:%[\da-fA-F]{2}))+/g;

// DOI patterns (academic citations)
const DOI_PATTERN = /\b10\.\d{4,9}\/[-._;()/:A-Z0-9]+\b/gi;

// Citation markers like [1], [Smith, 2024]
const CITATION_PATTERN = /\[(?:\d+|[A-Za-z]+(?:, \d{4})?)\]/g;

Detection Logic

  • If citations/DOIs/URLs are present in assistant output → elevated risk (LLMs frequently fabricate these)
  • If URL contains example.com, placeholder, or ... → definitely fabricated (score 1.0)
  • If DOIs present but not verifiable → flag as ungrounded
  • No citations → no flag (not all responses need citations)

Why This Works

LLMs are notorious for inventing plausible-looking academic citations, DOIs, and URLs. This is zero-cost, zero-dependency detection that catches a specific and well-documented hallucination failure mode.

New Category

fabricated_source — a fifth detection category alongside speculation, causality, pseudo-quantification, and completeness.

Acceptance Criteria

  • URL, DOI, and citation marker regex patterns implemented
  • Known-fake URL detection (example.com, placeholder patterns)
  • Suppression for URLs in code blocks and inline code
  • Tests cover positive and negative cases
  • Integrated into existing findTriggerMatches pipeline

References

  • ObvioSpectre/hallucination-detector detectors/source_check.py

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestimpact: additiveAdds patterns to existing pipeline. Low risk.phase: 1-additive-patternsPhase 1: Zero-risk additive regex patternsrisk: lowLow risk — additive, backward-compatibletopic: detection-patternsNew regex-based detection categoriestopic: source-credibilitySource/citation/attribution detection

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions