-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Incorporate detection patterns for cognitive biases identified in LLM reasoning, based on research documented in the exa-hallucination-detector repository ("Artificially Biased Intelligence" paper on LLM cognitive biases in financial reasoning).
Background
The research tested 48 LLMs across 11 bias families using prompt-pair experimental design. Several of these biases manifest as detectable linguistic patterns in assistant output.
Proposed Bias Detection Categories
1. Anchoring Bias
Pattern: Over-reliance on the first piece of information encountered.
Detection: When an estimate or recommendation closely mirrors an initial value mentioned in the conversation without independent analysis.
Example: User says "the timeout is around 30s" → assistant recommends "set timeout to 30 seconds" without checking actual requirements.
2. Framing Effect
Pattern: Different conclusions from the same data depending on how it's presented.
Detection: Conclusions that change based on whether metrics are framed as success vs failure rate.
Example: "95% of tests pass" → "looking good" vs "5% of tests fail" → "needs attention" — same data, different reaction.
3. Confirmation Bias
Pattern: Selectively citing evidence that supports a preexisting conclusion.
Detection: When assistant searches for and presents only supporting evidence while ignoring contradicting results from the same search.
4. Sunk Cost Reasoning
Pattern: Recommending continuation of a failing approach because of work already invested.
Detection: Phrases like "since we've already...", "given the effort put into...", "it would be wasteful to..."
5. Authority Bias
Pattern: Accepting claims because of the source rather than the evidence.
Detection: "According to [authority]..." without verification, "the official docs say..." when docs may be outdated.
Implementation Approach
- Add as new detection categories alongside existing four
- Each bias gets its own regex pattern set + suppression rules
- All disabled by default (opt-in via config, see feat: config file handling — cascading settings from multiple sources #10)
- Lower default severity than core categories (info vs warning)
Acceptance Criteria
- At least 3 bias categories implemented with regex patterns
- Each category has suppression rules to minimize false positives
- Tests cover positive and negative cases for each bias
- All disabled by default
- Documentation describes each bias with examples
References
- "Artificially Biased Intelligence" paper (in exa-hallucination-detector repo)
- Research on LLM cognitive biases across 48 models and 11 bias families