Skip to content

Add anomaly detection — baseline comparison for acute deviations (#589)#606

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/anomaly-detection-589
Mar 17, 2026
Merged

Add anomaly detection — baseline comparison for acute deviations (#589)#606
erikdarlingdata merged 1 commit into
devfrom
feature/anomaly-detection-589

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

@erikdarlingdata erikdarlingdata commented Mar 17, 2026

Summary

  • New AnomalyDetector compares analysis window against 24-hour baseline
  • Detects CPU spikes (σ-based), wait spikes (ratio-based), blocking/deadlock spikes, I/O latency anomalies
  • Wired into AnalysisService pipeline between fact collection and scoring
  • Skips all detection when no baseline data exists (prevents false positives on new servers)

Detection types:

Type Signal Scoring
ANOMALY_CPU_SPIKE Peak CPU deviates from baseline mean 2σ=0.5, 4σ=1.0
ANOMALY_WAIT_{type} Wait type 5x+ increase or new 5x=0.5, 20x=1.0
ANOMALY_BLOCKING_SPIKE Blocking events 3x+ baseline 3x=0.5, 10x=1.0
ANOMALY_DEADLOCK_SPIKE Deadlocks 3x+ baseline 3x=0.5, 10x=1.0
ANOMALY_READ_LATENCY Read latency deviation 2σ=0.5, 4σ=1.0
ANOMALY_WRITE_LATENCY Write latency deviation 2σ=0.5, 4σ=1.0

Test scenarios added:

  • CPU spike anomaly (10% baseline → 95% spike)
  • Blocking spike anomaly (0 baseline → 50 events + 10 deadlocks)
  • Wait spike anomaly (minimal PAGEIOLATCH → 8M ms flood)

Test plan

  • dotnet build — 0 errors
  • dotnet test — 138 tests pass (131 existing + 7 new)
  • Live test against sql2022 with HammerDB data

Closes #589

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Integrated anomaly detection into the analysis pipeline to automatically identify CPU spikes, wait-time anomalies, blocking and deadlock situations, and I/O latency issues by comparing current performance against a 24-hour baseline.
  • Tests

    • Added comprehensive test coverage for anomaly detection scenarios including CPU spike, blocking spike, and wait spike anomalies with validation of detection accuracy and anomaly metadata.

New AnomalyDetector compares the analysis window against a 24-hour
baseline period to detect acute deviations. Detects:

- CPU spikes: peak CPU deviation from baseline mean (σ-based scoring)
- Wait spikes: wait types with 5x+ increase or new in analysis window
- Blocking spikes: blocking/deadlock counts 3x+ above baseline
- I/O latency anomalies: read/write latency deviation from baseline

Scoring: CPU/IO use standard deviation (2σ=0.5, 4σ=1.0). Waits use
ratio (5x=0.5, 20x=1.0). Blocking uses ratio (3x=0.5, 10x=1.0).

Global safety: skips all anomaly detection when no baseline data exists
(prevents everything looking anomalous on new servers). Uses strict
boundary exclusion to prevent analysis window data leaking into baseline.

Wired into AnalysisService pipeline between fact collection and scoring.
Tool recommendations added for all anomaly fact types.

Test scenarios: CPU spike anomaly (10% baseline → 95% spike), blocking
spike anomaly (0 baseline → 50 events), wait spike anomaly (minimal
PAGEIOLATCH baseline → 8M ms flood). 7 new tests, 138 total passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This change implements anomaly detection by comparing runtime metrics against a 24-hour baseline, detecting CPU spikes, wait-time anomalies, blocking/deadlock events, and I/O latency deviations. The AnomalyDetector integrates into the analysis pipeline, produces anomaly facts with scored severity, and includes comprehensive test coverage with data seeding utilities.

Changes

Cohort / File(s) Summary
Core Anomaly Detection
Lite/Analysis/AnomalyDetector.cs
New class implementing baseline-comparison anomaly detection for CPU, wait-time, blocking/deadlock, and I/O latency metrics. Uses 24-hour baseline with statistical thresholds (2σ for deviation, 3–5x for ratio) and returns list of Fact objects with detailed metadata.
Analysis Pipeline Integration
Lite/Analysis/AnalysisService.cs
Integrates AnomalyDetector into the main AnalyzeAsync flow; calls DetectAnomaliesAsync after fact collection and before scoring to enrich facts with anomaly detections.
Anomaly Scoring
Lite/Analysis/FactScorer.cs
Adds ScoreAnomalyFact private method to score CPU/latency anomalies via deviation-based scaling and wait/blocking anomalies via ratio-based scaling; includes "anomaly" in context sources for amplification.
Test Scenarios & Data Seeding
Lite.Tests/ScenarioTests.cs, Lite/Analysis/TestDataSeeder.cs
Adds seven scenario tests (CPU spike, blocking/deadlock spikes, wait floods) and supporting infrastructure: RunFullPipelineWithAnomaliesAsync helper, baseline window properties (BaselineStart, BaselineEnd), and range-seeding utilities (SeedCpuUtilizationInRangeAsync, SeedWaitStatsInRangeAsync).
Tool Mapping for Anomalies
Lite/Mcp/McpAnalysisTools.cs
Expands ToolRecommendations with new anomaly groups (ANOMALY_CPU, ANOMALY_WAIT, ANOMALY_BLOCKING, ANOMALY_IO); enhances GetForStoryPath to map dynamic anomaly-prefixed keys to corresponding tool groups.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: introducing anomaly detection with baseline comparison to identify acute deviations.
Linked Issues check ✅ Passed The implementation addresses all core requirements from #589: rolling baselines, spike detection via standard deviation thresholds, anomaly detection integrated into analysis pipeline, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing anomaly detection: detector class, pipeline integration, test additions, fact scoring, test data seeding, and MCP tool recommendations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/anomaly-detection-589
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Lite.Tests/ScenarioTests.cs (1)

434-464: Consider extracting shared logic between pipeline helpers.

RunFullPipelineWithAnomaliesAsync largely duplicates RunFullPipelineAsync (lines 335-360), differing only in the anomaly detection step (lines 447-450). Consider refactoring to reduce duplication:

♻️ Optional refactor to consolidate helpers
 private async Task<(List<AnalysisStory> Stories, Dictionary<string, Fact> Facts)> RunFullPipelineAsync(
-    Func<TestDataSeeder, Task> seedAction)
+    Func<TestDataSeeder, Task> seedAction, bool includeAnomalies = false)
 {
     await _duckDb.InitializeAsync();
     await _duckDb.InitializeAnalysisSchemaAsync();

     var seeder = new TestDataSeeder(_duckDb);
     await seedAction(seeder);

     var collector = new DuckDbFactCollector(_duckDb);
     var context = TestDataSeeder.CreateTestContext();
     var facts = await collector.CollectFactsAsync(context);

+    if (includeAnomalies)
+    {
+        var anomalyDetector = new AnomalyDetector(_duckDb);
+        var anomalies = await anomalyDetector.DetectAnomaliesAsync(context);
+        facts.AddRange(anomalies);
+    }
+
     var scorer = new FactScorer();
     scorer.ScoreAll(facts);

     var graph = new RelationshipGraph();
     var engine = new InferenceEngine(graph);
     var stories = engine.BuildStories(facts);

     var factsByKey = facts
         .Where(f => f.Severity > 0)
         .ToDictionary(f => f.Key, f => f);

     return (stories, factsByKey);
 }

-private async Task<(List<AnalysisStory> Stories, Dictionary<string, Fact> Facts)> RunFullPipelineWithAnomaliesAsync(
-    Func<TestDataSeeder, Task> seedAction)
-{
-    // ... duplicated code ...
-}
+private Task<(List<AnalysisStory> Stories, Dictionary<string, Fact> Facts)> RunFullPipelineWithAnomaliesAsync(
+    Func<TestDataSeeder, Task> seedAction)
+    => RunFullPipelineAsync(seedAction, includeAnomalies: true);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Lite.Tests/ScenarioTests.cs` around lines 434 - 464,
RunFullPipelineWithAnomaliesAsync duplicates most of RunFullPipelineAsync;
extract the shared pipeline steps (DuckDb.InitializeAsync /
InitializeAnalysisSchemaAsync, seeding via TestDataSeeder,
DuckDbFactCollector.CollectFactsAsync, FactScorer.ScoreAll,
RelationshipGraph/InferenceEngine.BuildStories and the final severity filter)
into a single helper method (e.g., RunFullPipelineCore or RunFullPipelineAsync
with an optional parameter or Func to inject anomaly detection), then have
RunFullPipelineWithAnomaliesAsync call that helper and only perform the
AnomalyDetector.DetectAnomaliesAsync step (and merging anomalies into facts) via
the injected behavior; update callers to use the consolidated helper to remove
the duplicated code in RunFullPipelineWithAnomaliesAsync and
RunFullPipelineAsync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Lite.Tests/ScenarioTests.cs`:
- Around line 434-464: RunFullPipelineWithAnomaliesAsync duplicates most of
RunFullPipelineAsync; extract the shared pipeline steps (DuckDb.InitializeAsync
/ InitializeAnalysisSchemaAsync, seeding via TestDataSeeder,
DuckDbFactCollector.CollectFactsAsync, FactScorer.ScoreAll,
RelationshipGraph/InferenceEngine.BuildStories and the final severity filter)
into a single helper method (e.g., RunFullPipelineCore or RunFullPipelineAsync
with an optional parameter or Func to inject anomaly detection), then have
RunFullPipelineWithAnomaliesAsync call that helper and only perform the
AnomalyDetector.DetectAnomaliesAsync step (and merging anomalies into facts) via
the injected behavior; update callers to use the consolidated helper to remove
the duplicated code in RunFullPipelineWithAnomaliesAsync and
RunFullPipelineAsync.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d7607e8d-11da-4e41-9e7d-83c44a955116

📥 Commits

Reviewing files that changed from the base of the PR and between 5c08471 and 3b43644.

📒 Files selected for processing (6)
  • Lite.Tests/ScenarioTests.cs
  • Lite/Analysis/AnalysisService.cs
  • Lite/Analysis/AnomalyDetector.cs
  • Lite/Analysis/FactScorer.cs
  • Lite/Analysis/TestDataSeeder.cs
  • Lite/Mcp/McpAnalysisTools.cs

@erikdarlingdata erikdarlingdata merged commit f1f3fe9 into dev Mar 17, 2026
5 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/anomaly-detection-589 branch April 9, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant