Skip to content

feat: cognitive bias detection from LLM financial reasoning research #15

@Jamie-BitFlight

Description

@Jamie-BitFlight

Summary

Incorporate detection patterns for cognitive biases identified in LLM reasoning, based on research documented in the exa-hallucination-detector repository ("Artificially Biased Intelligence" paper on LLM cognitive biases in financial reasoning).

Background

The research tested 48 LLMs across 11 bias families using prompt-pair experimental design. Several of these biases manifest as detectable linguistic patterns in assistant output.

Proposed Bias Detection Categories

1. Anchoring Bias

Pattern: Over-reliance on the first piece of information encountered.
Detection: When an estimate or recommendation closely mirrors an initial value mentioned in the conversation without independent analysis.
Example: User says "the timeout is around 30s" → assistant recommends "set timeout to 30 seconds" without checking actual requirements.

2. Framing Effect

Pattern: Different conclusions from the same data depending on how it's presented.
Detection: Conclusions that change based on whether metrics are framed as success vs failure rate.
Example: "95% of tests pass" → "looking good" vs "5% of tests fail" → "needs attention" — same data, different reaction.

3. Confirmation Bias

Pattern: Selectively citing evidence that supports a preexisting conclusion.
Detection: When assistant searches for and presents only supporting evidence while ignoring contradicting results from the same search.

4. Sunk Cost Reasoning

Pattern: Recommending continuation of a failing approach because of work already invested.
Detection: Phrases like "since we've already...", "given the effort put into...", "it would be wasteful to..."

5. Authority Bias

Pattern: Accepting claims because of the source rather than the evidence.
Detection: "According to [authority]..." without verification, "the official docs say..." when docs may be outdated.

Implementation Approach

Acceptance Criteria

  • At least 3 bias categories implemented with regex patterns
  • Each category has suppression rules to minimize false positives
  • Tests cover positive and negative cases for each bias
  • All disabled by default
  • Documentation describes each bias with examples

References

  • "Artificially Biased Intelligence" paper (in exa-hallucination-detector repo)
  • Research on LLM cognitive biases across 48 models and 11 bias families

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestimpact: additiveAdds patterns to existing pipeline. Low risk.phase: 8-opt-in-advancedPhase 8: Opt-in features requiring configrisk: lowLow risk — additive, backward-compatibletopic: detection-patternsNew regex-based detection categories

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions