Skip to content

Skip ring-buffer records missing SystemHealth values in CPU collector (#989)#990

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/989-cpu-collector-null
May 22, 2026
Merged

Skip ring-buffer records missing SystemHealth values in CPU collector (#989)#990
erikdarlingdata merged 1 commit into
devfrom
feature/989-cpu-collector-null

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

Fixes #989. Some RING_BUFFER_SCHEDULER_MONITOR records lack a complete SystemHealth block, so the ProcessUtilization / SystemIdle XML values extract as NULL.

The Dashboard collector inserts into NOT NULL columns, so a single malformed record fails the whole INSERT atomically. Nothing is ever inserted → @max_sample_time stays NULL → every run rescans the full 7-day window and re-hits the same bad records → the collector never recovers.

Changes

  • install/18_collect_cpu_utilization_stats.sql — extract ProcessUtilization/SystemIdle once via CROSS APPLY, then filter out records where either is NULL. Valid rows now insert, @max_sample_time advances, recovery is immediate (not a 7-day wait).
  • Lite/Services/RemoteCollectorService.Cpu.cs — same CROSS APPLY + NULL filter. Lite's DuckDB columns are nullable so it never hard-failed, but it stored NULL samples that skew the CPU chart.

Dropped malformed records rather than ISNULL(..., 0) (the reporter's suggestion): a fabricated 0 reads as a real "0% CPU" sample and misleads the charts; a record with no SystemHealth block is not a CPU reading at all.

Test plan

  • Deployed fixed proc to SQL2022 — EXEC collect.cpu_utilization_stats_collector @debug=1 runs clean.
  • Ring-buffer audit on SQL2022: 256 records, 0 NULL — filter drops nothing valid on a healthy server.
  • Synthetic test: a record with an empty <SystemHealth/> is filtered out while a valid 42/50 record passes.
  • Lite builds clean.

🤖 Generated with Claude Code

…#989)

Some RING_BUFFER_SCHEDULER_MONITOR records lack a complete SystemHealth
block, so the ProcessUtilization / SystemIdle XML values extract as NULL.
The Dashboard collector inserts into NOT NULL columns, so a single bad
record fails the whole INSERT atomically. Nothing is ever inserted, so
@max_sample_time stays NULL, every run rescans the full 7-day window and
re-hits the same bad records — the collector never recovers.

- install/18: extract ProcessUtilization/SystemIdle once via CROSS APPLY,
  filter out records where either is NULL. Valid rows now insert,
  @max_sample_time advances, recovery is immediate.
- Lite RemoteCollectorService.Cpu.cs: same CROSS APPLY + NULL filter.
  Lite's DuckDB columns are nullable so it never hard-failed, but it
  stored NULL samples that skew the CPU chart.

Chose to drop malformed records rather than ISNULL(...,0): a fabricated
0 reads as a real "0% CPU" sample and misleads the charts; a record
with no SystemHealth block is not a CPU reading at all.

Verified: installed against SQL2022, collector runs clean; synthetic
test confirms a record with an empty <SystemHealth/> is filtered out
while a valid record passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 56a5e74 into dev May 22, 2026
6 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/989-cpu-collector-null branch May 22, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant