Add anomaly detection — baseline comparison for acute deviations (#589)#606
Conversation
New AnomalyDetector compares the analysis window against a 24-hour baseline period to detect acute deviations. Detects: - CPU spikes: peak CPU deviation from baseline mean (σ-based scoring) - Wait spikes: wait types with 5x+ increase or new in analysis window - Blocking spikes: blocking/deadlock counts 3x+ above baseline - I/O latency anomalies: read/write latency deviation from baseline Scoring: CPU/IO use standard deviation (2σ=0.5, 4σ=1.0). Waits use ratio (5x=0.5, 20x=1.0). Blocking uses ratio (3x=0.5, 10x=1.0). Global safety: skips all anomaly detection when no baseline data exists (prevents everything looking anomalous on new servers). Uses strict boundary exclusion to prevent analysis window data leaking into baseline. Wired into AnalysisService pipeline between fact collection and scoring. Tool recommendations added for all anomaly fact types. Test scenarios: CPU spike anomaly (10% baseline → 95% spike), blocking spike anomaly (0 baseline → 50 events), wait spike anomaly (minimal PAGEIOLATCH baseline → 8M ms flood). 7 new tests, 138 total passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis change implements anomaly detection by comparing runtime metrics against a 24-hour baseline, detecting CPU spikes, wait-time anomalies, blocking/deadlock events, and I/O latency deviations. The AnomalyDetector integrates into the analysis pipeline, produces anomaly facts with scored severity, and includes comprehensive test coverage with data seeding utilities. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Lite.Tests/ScenarioTests.cs (1)
434-464: Consider extracting shared logic between pipeline helpers.
RunFullPipelineWithAnomaliesAsynclargely duplicatesRunFullPipelineAsync(lines 335-360), differing only in the anomaly detection step (lines 447-450). Consider refactoring to reduce duplication:♻️ Optional refactor to consolidate helpers
private async Task<(List<AnalysisStory> Stories, Dictionary<string, Fact> Facts)> RunFullPipelineAsync( - Func<TestDataSeeder, Task> seedAction) + Func<TestDataSeeder, Task> seedAction, bool includeAnomalies = false) { await _duckDb.InitializeAsync(); await _duckDb.InitializeAnalysisSchemaAsync(); var seeder = new TestDataSeeder(_duckDb); await seedAction(seeder); var collector = new DuckDbFactCollector(_duckDb); var context = TestDataSeeder.CreateTestContext(); var facts = await collector.CollectFactsAsync(context); + if (includeAnomalies) + { + var anomalyDetector = new AnomalyDetector(_duckDb); + var anomalies = await anomalyDetector.DetectAnomaliesAsync(context); + facts.AddRange(anomalies); + } + var scorer = new FactScorer(); scorer.ScoreAll(facts); var graph = new RelationshipGraph(); var engine = new InferenceEngine(graph); var stories = engine.BuildStories(facts); var factsByKey = facts .Where(f => f.Severity > 0) .ToDictionary(f => f.Key, f => f); return (stories, factsByKey); } -private async Task<(List<AnalysisStory> Stories, Dictionary<string, Fact> Facts)> RunFullPipelineWithAnomaliesAsync( - Func<TestDataSeeder, Task> seedAction) -{ - // ... duplicated code ... -} +private Task<(List<AnalysisStory> Stories, Dictionary<string, Fact> Facts)> RunFullPipelineWithAnomaliesAsync( + Func<TestDataSeeder, Task> seedAction) + => RunFullPipelineAsync(seedAction, includeAnomalies: true);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Lite.Tests/ScenarioTests.cs` around lines 434 - 464, RunFullPipelineWithAnomaliesAsync duplicates most of RunFullPipelineAsync; extract the shared pipeline steps (DuckDb.InitializeAsync / InitializeAnalysisSchemaAsync, seeding via TestDataSeeder, DuckDbFactCollector.CollectFactsAsync, FactScorer.ScoreAll, RelationshipGraph/InferenceEngine.BuildStories and the final severity filter) into a single helper method (e.g., RunFullPipelineCore or RunFullPipelineAsync with an optional parameter or Func to inject anomaly detection), then have RunFullPipelineWithAnomaliesAsync call that helper and only perform the AnomalyDetector.DetectAnomaliesAsync step (and merging anomalies into facts) via the injected behavior; update callers to use the consolidated helper to remove the duplicated code in RunFullPipelineWithAnomaliesAsync and RunFullPipelineAsync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Lite.Tests/ScenarioTests.cs`:
- Around line 434-464: RunFullPipelineWithAnomaliesAsync duplicates most of
RunFullPipelineAsync; extract the shared pipeline steps (DuckDb.InitializeAsync
/ InitializeAnalysisSchemaAsync, seeding via TestDataSeeder,
DuckDbFactCollector.CollectFactsAsync, FactScorer.ScoreAll,
RelationshipGraph/InferenceEngine.BuildStories and the final severity filter)
into a single helper method (e.g., RunFullPipelineCore or RunFullPipelineAsync
with an optional parameter or Func to inject anomaly detection), then have
RunFullPipelineWithAnomaliesAsync call that helper and only perform the
AnomalyDetector.DetectAnomaliesAsync step (and merging anomalies into facts) via
the injected behavior; update callers to use the consolidated helper to remove
the duplicated code in RunFullPipelineWithAnomaliesAsync and
RunFullPipelineAsync.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d7607e8d-11da-4e41-9e7d-83c44a955116
📒 Files selected for processing (6)
Lite.Tests/ScenarioTests.csLite/Analysis/AnalysisService.csLite/Analysis/AnomalyDetector.csLite/Analysis/FactScorer.csLite/Analysis/TestDataSeeder.csLite/Mcp/McpAnalysisTools.cs
Summary
AnomalyDetectorcompares analysis window against 24-hour baselineAnalysisServicepipeline between fact collection and scoringDetection types:
ANOMALY_CPU_SPIKEANOMALY_WAIT_{type}ANOMALY_BLOCKING_SPIKEANOMALY_DEADLOCK_SPIKEANOMALY_READ_LATENCYANOMALY_WRITE_LATENCYTest scenarios added:
Test plan
dotnet build— 0 errorsdotnet test— 138 tests pass (131 existing + 7 new)Closes #589
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests