Skip to content

[BUG] cpu_utilization_stats_collector fails and does not recover #989

@ghauan

Description

@ghauan

Component

Full Dashboard

Performance Monitor Version

2.10.0

SQL Server Version

sql server 2025

Windows Version

windows server 2019

Describe the Bug

On one of my servers, the cpu_utilization_stats_collector keeps failing and does not recover.

Steps to Reproduce

Not sure what the root cause is, so this will probably not happen on all servers, just a special case on my server. The problem comes when running EXEC [collect].[cpu_utilization_stats_collector]. It fails because sqlserver_cpu_utilization and other_process_cpu_utilization returns NULL and cannot be inserted to the table

Expected Behavior

OK data should be put into the table.

Actual Behavior

No data after the rows with NULL data is inserted to table.

Error Messages / Log Output

Running: EXEC [collect].[cpu_utilization_stats_collector] gives this error:
Msg 50000, Level 16, State 1, Procedure collect.cpu_utilization_stats_collector, Line 187 [Batch Start Line 2]
Error in CPU utilization stats collector: Cannot insert the value NULL into column 'sqlserver_cpu_utilization', table 'PerformanceMonitor.collect.cpu_utilization_stats'; column does not allow nulls. INSERT fails.

Screenshots

Using the following code from the procedure (slightly edited) I got the rows from the screenshot below, showing that two rows have NULL-values:

DECLARE  @max_sample_time datetime2(7) = NULL,
        @current_ms_ticks bigint =
        (
            SELECT
                osi.ms_ticks
            FROM sys.dm_os_sys_info AS osi
        ),
        @start_time datetime2(7) = SYSDATETIME();
SELECT
            @max_sample_time = MAX(cus.sample_time)
        FROM collect.cpu_utilization_stats AS cus;

SELECT @max_sample_time AS MaxSampleTime, @current_ms_ticks AS CurrentMsTicks, @start_time AS StartTime;

        /*
        Collect CPU utilization data from ring buffers
        Only collects samples newer than the most recent sample we have
        On first run (NULL max_sample_time), looks back 7 days to populate initial data
        Avoids duplicate collection of same ring buffer events
        */

        SELECT
            sample_time =
                DATEADD
                (
                    SECOND,
                    -((@current_ms_ticks - t.timestamp) / 1000),
                    @start_time
                ),
            sqlserver_cpu_utilization =
                t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'integer'),
            other_process_cpu_utilization =
                CASE
                    WHEN (100 -
                          t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'integer') -
                          t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'integer')) < 0
                    THEN 0
                    ELSE 100 -
                         t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'integer') -
                         t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'integer')
                    
                END
        FROM
        (
            SELECT
                dorb.timestamp,
                record =
                    CONVERT(xml, dorb.record)
            FROM sys.dm_os_ring_buffers AS dorb
            WHERE dorb.ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
        ) AS t
        WHERE DATEADD
        (
            SECOND,
            -((@current_ms_ticks - t.timestamp) / 1000),
            @start_time
        ) > ISNULL(@max_sample_time, DATEADD(DAY, -7, @start_time))
        ORDER BY
            t.timestamp /*DESC*/
        OPTION(RECOMPILE);
Image Image

Additional Context

I am not sure how to investigate why the two rows got NULL-values from the dm_os_ring_buffers-query, but using ISNULL around the query, I was able replace the NULLs with 0 . That might be a possible fix?

sqlserver_cpu_utilization =
    ISNULL(t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'integer'),0),
other_process_cpu_utilization =
    CASE
        WHEN ISNULL((100 -
              t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'integer') -
              t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'integer')), -1) < 0
        THEN 0
        ELSE 100 -
             t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'integer') -
             t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'integer')
        
    END

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions